Bug 76971 - Problem with non-BMP Unicode characters
Summary: Problem with non-BMP Unicode characters
Status: RESOLVED WORKSFORME
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-02 23:53 UTC by Behdad Esfahbod
Modified: 2014-04-03 22:09 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Sample document (10.00 KB, text/plain)
2014-04-02 23:53 UTC, Behdad Esfahbod
Details

Description Behdad Esfahbod 2014-04-02 23:53:16 UTC
Created attachment 96814 [details]
Sample document

Attached PDF is generated by cairo from printing a gedit document with one character: U+1D780.  Here it is in text: "𝞀".  This is an example of what we call "non-BMP" Unicode character.  Ie. one that has a code > 0xFFFF.  Ie, it doesn't fit in two bytes, which means it doesn't in one UTF-16 codepoint.

Printing the attached PDF from evince to a PDF file fails.  Evince generates the following cairo error:

  cairo context error: input string not valid UTF-8

I think what's happening is that someone somewhere in the poppler chain is not handling UTF-16 surrogate pairs.  Or some other mishandling.
Comment 1 Behdad Esfahbod 2014-04-03 00:09:29 UTC
Humm.  I'm told by others that this is probably fixed in latest version already.  I'm testing on Ubuntu 12.04.  Feel free if it works for you.
Comment 2 Albert Astals Cid 2014-04-03 22:09:35 UTC
Works for me, please try in something that is not 2 years old next time :-)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.