Summary: |
Problem with non-BMP Unicode characters |
Product: |
poppler
|
Reporter: |
Behdad Esfahbod <freedesktop> |
Component: |
general | Assignee: |
poppler-bugs <poppler-bugs> |
Status: |
RESOLVED
WORKSFORME
|
QA Contact: |
|
Severity: |
normal
|
|
|
Priority: |
medium
|
CC: |
freedesktop
|
Version: |
unspecified | |
|
Hardware: |
Other | |
|
OS: |
All | |
|
Whiteboard: |
|
i915 platform:
|
|
i915 features:
|
|
Attachments: |
Sample document
|
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 96814 [details] Sample document Attached PDF is generated by cairo from printing a gedit document with one character: U+1D780. Here it is in text: "𝞀". This is an example of what we call "non-BMP" Unicode character. Ie. one that has a code > 0xFFFF. Ie, it doesn't fit in two bytes, which means it doesn't in one UTF-16 codepoint. Printing the attached PDF from evince to a PDF file fails. Evince generates the following cairo error: cairo context error: input string not valid UTF-8 I think what's happening is that someone somewhere in the poppler chain is not handling UTF-16 surrogate pairs. Or some other mishandling.