Created attachment 134881 [details] Sample file When I use pdftotext with the attached sample file I get no usable text. When looking at the file with a hex editor, I can see that the text is available as UTF-16BE *without* BOM. The display with xpdf is fine. Tested with version 0.48.0 (Debian Stable) and 0.57.0 (Debian Testing).
Additional note: $ java -jar pdfbox-app-2.0.7.jar ExtractText 2004.pdf Extracts the text but issues some warnings: Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init> WARNUNG: Invalid ToUnicode CMap in font JRLFSC+Segoe UI,Bold-Identity-H Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init> WARNUNG: Invalid ToUnicode CMap in font EUPBOV+Arial Unicode MS-Identity-H Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init> WARNUNG: Invalid ToUnicode CMap in font VRSAOT+Arial Unicode MS,Bold-Identity-H Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init> WARNUNG: Invalid ToUnicode CMap in font FAMOVB+Segoe UI-Identity-H
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/332.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.