Bug 48012 - cannot extract text
Summary: cannot extract text
Status: RESOLVED DUPLICATE of bug 78145
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-28 13:38 UTC by Jakub Wilk
Modified: 2015-03-04 08:53 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
the test case (74.21 KB, application/pdf)
2012-03-28 13:38 UTC, Jakub Wilk
Details
pdftotext output (4.96 KB, text/plain)
2012-03-28 13:38 UTC, Jakub Wilk
Details

Description Jakub Wilk 2012-03-28 13:38:00 UTC
Created attachment 59174 [details]
the test case

pdftotext correctly extracts Cyrillic part of the attached PDF; however, it outputs garbage instead of the Latin part.

I can search through the Latin text in Adobe Reader, so the PDF itself is OK (or at least not helplessly bad).

$ pdftotext -v
pdftotext version 0.18.4
Copyright 2005-2011 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2004 Glyph & Cog, LLC
Comment 1 Jakub Wilk 2012-03-28 13:38:31 UTC
Created attachment 59175 [details]
pdftotext output
Comment 2 Jason Crain 2015-03-04 08:53:48 UTC

*** This bug has been marked as a duplicate of bug 78145 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.