Bug 54504 - possibly incorrect calculation of indexes of PopplerTextAttributes
Summary: possibly incorrect calculation of indexes of PopplerTextAttributes
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: glib frontend (show other bugs)
Version: unspecified
Hardware: All All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-04 18:27 UTC by hkotry
Modified: 2012-11-24 12:43 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
sample pdf for testing (30.63 KB, application/pdf)
2012-09-04 18:27 UTC, hkotry
Details
check if words end with spaces (3.39 KB, patch)
2012-11-23 12:19 UTC, Jason Crain
Details | Splinter Review

Description hkotry 2012-09-04 18:27:49 UTC
Created attachment 66623 [details]
sample pdf for testing

The indexes in a PopplerTextAttributes struct don't seem to map correctly to the respective characters in the string returned by poppler_page_get_text().

Trivial test program to reproduce the unexpected behavior can be found here: https://gist.github.com/3624542; test file attached.
Comment 1 Jason Crain 2012-11-23 12:19:25 UTC
Created attachment 70475 [details] [review]
check if words end with spaces

poppler_page_get_text_layout and poppler_page_get_text_attributes assume that each word ends with a space or newline, causing them to become mismatched from the text.  This patch adds a check to TextWord::getSpaceAfter.

This fixes the problem for this file, but there are still other situations where the indexes can become mismatched because of the way TextSelectionDumper::getText aligns tables.  I don't know how to fix that unless you want to modify poppler_page_get_text to return something simpler, instead of calling TextPage::getSelectionText to get the physical layout.
Comment 2 Carlos Garcia Campos 2012-11-24 12:43:48 UTC
Sorry for the delay reviewing this, patch looks great, I've just pushed it to git master (with some minor changes of coding style). Thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.