Bug 91058 - Unicode strings saved as literal strings
Summary: Unicode strings saved as literal strings
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-22 14:00 UTC by Marek Kasik
Modified: 2018-08-21 11:07 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Testing form (3.16 KB, text/plain)
2015-06-22 14:00 UTC, Marek Kasik
Details
Save Unicode strings as hexadecimal strings (1.21 KB, patch)
2015-06-22 14:01 UTC, Marek Kasik
Details | Splinter Review

Description Marek Kasik 2015-06-22 14:00:34 UTC
Created attachment 116655 [details]
Testing form

When saving the attached PDF form, the string entered into the text field is saved as a literal string and not as a hexadecimal string which causes problems when viewing it in acroread.
I believe that such strings should be saved as hexadecimal strings.
I used "šč" string for testing.
Comment 1 Marek Kasik 2015-06-22 14:01:09 UTC
Created attachment 116656 [details] [review]
Save Unicode strings as hexadecimal strings

This patch fixes the mentioned problem for me.
Comment 2 Marek Kasik 2015-06-22 14:22:35 UTC
Looking more on this, it is possible that the problem is the font specified in the PDF (LiberationSans with FirstChar 32 and LastChar 255) which doesn't includes the mentioned characters.
I've filled this because acroread is able to show the string correctly when it is stored as hexadecimal string, so maybe it is still worth to apply?
Comment 3 Albert Astals Cid 2015-07-02 22:24:09 UTC
Yes, this is simply a duplicate of the multiple "can't save non ascii text on forms" bugs we have. And the gut feeling is that the bug is that we're simply doing something wrong with the fonts, like setting ascii encoding or something, never had enough time to look at this.
Comment 4 Marek Kasik 2015-07-23 13:02:46 UTC
The problem is actually in PDF specification. It doesn't specify how to deal with non-ascii text in forms. All simple fonts (chapter 5.5) use 8bit codes which is not enough so they can not be used generally (including all the 14 base fonts).

One possibility here seems to use a CID font.

Or try to go beyond the 8bit constraint and try to find whether the font you use has the the glyphs for the Unicode characters you use when rendering text.

Btw, this comment summarises font problems arising when changing a text in PDF quite well: http://stackoverflow.com/a/15973614.

Btw2, Adobe Reader stores text which includes non-ascii characters as unicode hexadecimal strings and ascii only text as normal strings. But it also changes font in the PDF to a CID font in the non-ascii case.
Comment 5 GitLab Migration User 2018-08-21 11:07:49 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/523.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.