Bug 76971

Summary:	Problem with non-BMP Unicode characters
Product:	poppler	Reporter:	Behdad Esfahbod <freedesktop>
Component:	general	Assignee:	poppler-bugs <poppler-bugs>
Status:	RESOLVED WORKSFORME	QA Contact:
Severity:	normal
Priority:	medium	CC:	freedesktop
Version:	unspecified
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Attachments:	Sample document

Description Behdad Esfahbod 2014-04-02 23:53:16 UTC

Created attachment 96814 [details]
Sample document

Attached PDF is generated by cairo from printing a gedit document with one character: U+1D780.  Here it is in text: "𝞀".  This is an example of what we call "non-BMP" Unicode character.  Ie. one that has a code > 0xFFFF.  Ie, it doesn't fit in two bytes, which means it doesn't in one UTF-16 codepoint.

Printing the attached PDF from evince to a PDF file fails.  Evince generates the following cairo error:

  cairo context error: input string not valid UTF-8

I think what's happening is that someone somewhere in the poppler chain is not handling UTF-16 surrogate pairs.  Or some other mishandling.

Comment 1 Behdad Esfahbod 2014-04-03 00:09:29 UTC

Humm.  I'm told by others that this is probably fixed in latest version already.  I'm testing on Ubuntu 12.04.  Feel free if it works for you.

Comment 2 Albert Astals Cid 2014-04-03 22:09:35 UTC

Works for me, please try in something that is not 2 years old next time :-)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.