hi, pdftotext fails to extract text from specific pdf (see attachment). exit status is 0 and no warnings or errors are reported. the output file contains only 99 page break characters (0x0c). i am sure the pdf contains text because when is save the document using acrobat reader as text then plenty of text is extracted and saved. i can also view the document on linux (centos 5.5 and fedora core 14) using evince without problems. i tried the following versions, all gave the same result. poppler 0.5.4 centos 5.5 x86_64 poppler 0.14.5 fedora fc14 x86_64 poppler git fedora fc14 x86_64 best regards ulrich
PDF upload failed, so please use this url: http://share.obvsg.at/Bug-35468.pdf
Created attachment 57451 [details] [review] support identity-h ToUnicode The problem is all the ToUnicode maps are /Identity-H. The attached patch should fix this.
Fixed with a different commit but similar in idea http://cgit.freedesktop.org/poppler/poppler/commit/?id=30446bdd7e202eed88d131e04477c76861fd145c
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.