Bug 56293 - Incorrect positioning of text in PDFTOHTML
Summary: Incorrect positioning of text in PDFTOHTML
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: pdftohtml (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-22 20:17 UTC by no1ce
Modified: 2018-08-21 10:42 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
pdf file inhibiting this behavior (265.83 KB, text/plain)
2012-10-22 20:17 UTC, no1ce
Details

Description no1ce 2012-10-22 20:17:34 UTC
Created attachment 68923 [details]
pdf file inhibiting this behavior

PDFTOHTML converts text positions on certain PDF documents incorrect. Attached is a document in which this happens. 

The following logic explains this further:
The size of an image of the first page is 1024x1408. The text "Brief article" which can be seen highlighted should be positioned 19% from the top as seen here:
http://imageshack.us/a/img526/6343/textshiftedpdf1.png

Poppler outputs this text with the following data when using pdftohtml -xml
<text top="409" left="447" width="80" height="15" font="0">Brief article</text>

The dimensions of this page according to poppler taken from the same xml file:
<page number="1" position="absolute" top="0" left="0" height="1488" width="1063">

This would give us that the text should be according to poppler be positioned:
409/1488=0.27=27% which is clearly wrong. 

No other warning messages or errors were noted when converting this document
Comment 1 Thomas Freitag 2012-10-27 10:52:15 UTC
I'm not an expert on text positions, but I'm wondering why a lot of tools based on poppler, i.e. okular, are able to highlight the text on searching if You're true and the positions are wrong...
Comment 2 GitLab Migration User 2018-08-21 10:42:58 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/342.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.