Bug 107317 - Fix HtmlFont::HtmlFilter to not lose tabs
Summary: Fix HtmlFont::HtmlFilter to not lose tabs
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: utils (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-21 03:46 UTC by ulatekh
Modified: 2018-08-20 21:58 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Patch to fix bug (1.31 KB, patch)
2018-07-21 03:46 UTC, ulatekh
Details | Splinter Review

Description ulatekh 2018-07-21 03:46:43 UTC
Created attachment 140749 [details] [review]
Patch to fix bug

I'm about to use pdftohtml to extract information from PDFs and organize the results into a database, so I had a chance to dig through the code.

I've had a long-standing problem with qpdfview (which uses poppler) sometimes copying text out of PDFs incorrectly -- the text copies, but all of the spaces are missing. After reproducing it with a PDF, I tracked the problem down to the PDF using tabs where it probably should have used spaces. The patch fixes HtmlFont::HtmlFilter() to convert incoming tabs to spaces, instead of removing the whitespace completely.

There are probably other places in the code where the fix in this patch could be applied, e.g. when copying text in qpdfview.
Comment 1 GitLab Migration User 2018-08-20 21:58:09 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/138.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.