Bug 52406 - Wrong text extracted from attached example: 2012 extracted as 2512
Summary: Wrong text extracted from attached example: 2012 extracted as 2512
Status: RESOLVED INVALID
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-23 16:55 UTC by Alon Levy
Modified: 2012-07-23 19:47 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
wrong text extraction example: first page bottom left, copy year 2012 -> 2512 (428.40 KB, application/pdf)
2012-07-23 16:55 UTC, Alon Levy
Details

Description Alon Levy 2012-07-23 16:55:47 UTC
Created attachment 64556 [details]
wrong text extraction example: first page bottom left, copy year 2012 -> 2512

See the first page of the attached pdf. While it is in hebrew (which is surely related to the bug), you don't need to understand hebrew - the hebrew text is actually fine, the problem is with anything numeric.

The bottom left of the first page has this text (typing the text from watching the correctly *rendered* document in evince):

2012
בפברואר
15

However, copying the text with the cursor and pasting produces the following text:
15 בפברואר 2512

Clearly the text is the same, the numbers are different for the year.
Comment 1 Albert Astals Cid 2012-07-23 17:19:53 UTC
The text is wrong in the document, open it with Adobe Reader and you will see that it also not 2012
Comment 2 Alon Levy 2012-07-23 18:41:51 UTC
Isn't it possible that Acrobat is wrong as well? I've confirmed what you said (haven't tested adobe until now). Anyway thanks for looking into this.

Alon
Comment 3 Albert Astals Cid 2012-07-23 19:47:31 UTC
It could, but i doubt it, creation of files whose text is correctly extracteable is not a given and some pdf creators don't put enough care on making it work.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.