Created attachment 20170 [details] Zhang Peng's PDF; it contains a chinese font With the attached PDF, (supplied by Zhang Peng for another purpose) http://lists.freedesktop.org/archives/poppler/2008-November/004216.html pdftohtml fails to set the <title> tags correctly, resulting in invalid UTF8 bytes <FE><FF> . Within the "Document Outline" section, both entries start this way, with the first being followed by more garbage. This can be seen at the URL stated for this bug report: http://www.maths.mq.edu.au/~ross/poppler/ZhangPeng/readme.html (You may need to set the encoding manually to UTF8.) Facts: ----- The document contains chinese characters, with the following font info: <</Subtype/Type0 /DescendantFonts 33 0 R /BaseFont/AdobeSongStd-Light /Encoding/UniGB-UCS2-H /Type/Font>> There is no embedded CMap resource: > pdffonts readme.pdf name type emb sub uni object ID ------------------------------------ ----------------- --- --- --- --------- AdobeSongStd-Light CID Type 0 no no no 32 0 Observations: ----------- (see also http://lists.freedesktop.org/archives/poppler/2008-November/004220.html) pdftotext worked fine for me, both with Poppler v0.8.2 and Poppler v0.10.0 However there were problems with readme.pdf when using other software. e.g., Adobe Reader v8.1.0 and v9.0.0 both showed just blank pages; Adobe Acrobat Pro v8.1.2 displayed the PDF just fine Preview (MacOS X, v10.4.11) displayed the PDF just fine pdftohtml translated the PDF to a 2-page HTML, with frames *but* there were some errors.
Will be fixed in poppler 0.17.2
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.