(Tested with poppler 0.10.6.) pdfinfo does not properly encode Unicode characters outside the BMP: $ locale charmap UTF-8 $ wget -q 'http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;att=1;bug=525309' -O utf16nonbmp.pdf $ pdfinfo utf16nonbmp.pdf | iconv -f UTF-8 -t UTF-32 >/dev/null iconv: illegal input sequence at position 16
Created attachment 57386 [details] [review] pdfinfo - decode surrogate pairs Patch to fix.
Adrian the math in your patch was wrong, i've commited a fixed version. Thanks for finding the lead!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.