Summary: | Wrong text extracted from attached example: 2012 extracted as 2512 | ||
---|---|---|---|
Product: | poppler | Reporter: | Alon Levy <alevy> |
Component: | general | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED INVALID | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | wrong text extraction example: first page bottom left, copy year 2012 -> 2512 |
The text is wrong in the document, open it with Adobe Reader and you will see that it also not 2012 Isn't it possible that Acrobat is wrong as well? I've confirmed what you said (haven't tested adobe until now). Anyway thanks for looking into this. Alon It could, but i doubt it, creation of files whose text is correctly extracteable is not a given and some pdf creators don't put enough care on making it work. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 64556 [details] wrong text extraction example: first page bottom left, copy year 2012 -> 2512 See the first page of the attached pdf. While it is in hebrew (which is surely related to the bug), you don't need to understand hebrew - the hebrew text is actually fine, the problem is with anything numeric. The bottom left of the first page has this text (typing the text from watching the correctly *rendered* document in evince): 2012 בפברואר 15 However, copying the text with the cursor and pasting produces the following text: 15 בפברואר 2512 Clearly the text is the same, the numbers are different for the year.