Bug 23075 - pdfinfo can produce invalid UTF-8
Summary: pdfinfo can produce invalid UTF-8
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-01 06:34 UTC by Jakub Wilk
Modified: 2012-02-21 15:04 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
pdfinfo - decode surrogate pairs (1.23 KB, patch)
2012-02-21 04:12 UTC, Adrian Johnson
Details | Splinter Review

Description Jakub Wilk 2009-08-01 06:34:38 UTC
(Tested with poppler 0.10.6.)

pdfinfo does not properly encode Unicode characters outside the BMP:

$ locale charmap
UTF-8

$ wget -q 'http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;att=1;bug=525309' -O utf16nonbmp.pdf

$ pdfinfo utf16nonbmp.pdf | iconv -f UTF-8 -t UTF-32 >/dev/null
iconv: illegal input sequence at position 16
Comment 1 Adrian Johnson 2012-02-21 04:12:50 UTC
Created attachment 57386 [details] [review]
pdfinfo - decode surrogate pairs

Patch to fix.
Comment 2 Albert Astals Cid 2012-02-21 15:04:54 UTC
Adrian the math in your patch was wrong, i've commited a fixed version. Thanks for finding the lead!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.