48012 – cannot extract text

Bug 48012 - cannot extract text

Summary: cannot extract text

Status:	RESOLVED DUPLICATE of bug 78145

Alias:	None

Product:	poppler
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	poppler-bugs
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-03-28 13:38 UTC by Jakub Wilk
Modified:	2015-03-04 08:53 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
the test case (74.21 KB, application/pdf) 2012-03-28 13:38 UTC, Jakub Wilk	Details
pdftotext output (4.96 KB, text/plain) 2012-03-28 13:38 UTC, Jakub Wilk	Details
View All

Description Jakub Wilk 2012-03-28 13:38:00 UTC

Created attachment 59174 [details]
the test case

pdftotext correctly extracts Cyrillic part of the attached PDF; however, it outputs garbage instead of the Latin part.

I can search through the Latin text in Adobe Reader, so the PDF itself is OK (or at least not helplessly bad).

$ pdftotext -v
pdftotext version 0.18.4
Copyright 2005-2011 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2004 Glyph & Cog, LLC

Comment 1 Jakub Wilk 2012-03-28 13:38:31 UTC

Created attachment 59175 [details]
pdftotext output

Comment 2 Jason Crain 2015-03-04 08:53:48 UTC


*** This bug has been marked as a duplicate of bug 78145 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.