Bug 103023 - Fix handling of UTF16-LE annotations
Summary: Fix handling of UTF16-LE annotations
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-28 13:46 UTC by Christophe Fergeau
Modified: 2018-08-21 11:11 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Test case (55.76 KB, application/pdf)
2017-09-28 13:46 UTC, Christophe Fergeau
Details
Proposed patches (3.01 KB, patch)
2017-09-28 13:46 UTC, Christophe Fergeau
Details | Splinter Review

Description Christophe Fergeau 2017-09-28 13:46:15 UTC
Created attachment 134538 [details]
Test case

The 'unicode' annotation in the attached test case does not render properly. I added it through the default mail application on my iOS11 iphone.
I trade that down to _poppler_goo_string_to_utf8() which assumes UTF16 strings will always be big endian, while in my test file, a little endian UTF16 string is used.
I've fixed this by adding 2 new methods to GooString (hasBigEndianBOM() hasLittleEndianBOM()), but all users of GooString::hasUnicodeMarker() should probably be audited and handle both types of UTF16 strings unless the pdf specs mandates big endian strings. Since I'm not familiar at all with the PDF format, I haven't tried to address this yet.
Comment 1 Christophe Fergeau 2017-09-28 13:46:54 UTC
Created attachment 134539 [details] [review]
Proposed patches
Comment 2 Albert Astals Cid 2017-09-28 15:19:43 UTC
-1 the spec (at least 1.7, don't have the ISO one at hand right now) specifically mentions UTF-16BE

Your test case also displays "wrong" here in the old Adobe Reader 9.5.5 for Linux (haven't tried the newer windows versions).

Will try the newer windows versions later and read the new ISO spec but for now it seems you should report a bug to Apple.
Comment 3 Jose Aliste 2017-09-28 16:43:32 UTC
I looked into ISO 32000-1 and it's the same, only UTF-16BE is allowed. I saw a draft of ISO 32000-2 (a.k.a. PDF 2.0) and also only UTF-16BE is allowed.
Comment 4 Christophe Fergeau 2017-09-29 13:22:04 UTC
(In reply to Albert Astals Cid from comment #2)
> Your test case also displays "wrong" here in the old Adobe Reader 9.5.5 for
> Linux (haven't tried the newer windows versions).
> 
> Will try the newer windows versions later and read the new ISO spec but for
> now it seems you should report a bug to Apple.

I'll try to take a closer look at the file content/the spec/..., but latest Adobe Reader on Windows (Adobe Acrobat Reader DC 2017.012.20093) displays the annotation (same result as with my patch).
Comment 5 Albert Astals Cid 2017-12-27 23:53:23 UTC
So yes, it seems there's broken generators out there and Adobe shows them correctly so we should probably do the same.

About your patches I don't like that hasBigEndianBOM is just hasUnicodeMarker with a different name, don't do that.
Comment 6 GitLab Migration User 2018-08-21 11:11:12 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/558.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.