pdftotext fails to extract text from specific pdf (see attachment).
exit status is 0 and no warnings or errors are reported.
the output file contains only 99 page break characters (0x0c).
i am sure the pdf contains text because when is save the document
using acrobat reader as text then plenty of text is extracted and saved.
i can also view the document on linux (centos 5.5 and fedora core 14)
using evince without problems.
i tried the following versions, all gave the same result.
poppler 0.5.4 centos 5.5 x86_64
poppler 0.14.5 fedora fc14 x86_64
poppler git fedora fc14 x86_64
PDF upload failed, so please use this url:
Created attachment 57451 [details] [review]
support identity-h ToUnicode
The problem is all the ToUnicode maps are /Identity-H. The attached patch should fix this.
Fixed with a different commit but similar in idea http://cgit.freedesktop.org/poppler/poppler/commit/?id=30446bdd7e202eed88d131e04477c76861fd145c