Bug 103492 - Wrong encoding when filling out a PDF form
Summary: Wrong encoding when filling out a PDF form
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-27 19:09 UTC by Thomas Dreibholz
Modified: 2018-08-21 11:09 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Screenshot: filling in text into a field of the form (252.67 KB, image/png)
2017-10-27 19:29 UTC, Thomas Dreibholz
Details
Screenshot: filling in text into a field of the form (268.28 KB, image/png)
2017-10-27 19:31 UTC, Thomas Dreibholz
Details
Screenshot: result after activating another field of the form (268.55 KB, image/png)
2017-10-27 19:32 UTC, Thomas Dreibholz
Details

Description Thomas Dreibholz 2017-10-27 19:09:19 UTC
Evince and Okular (based on poppler) uses the wrong encoding when filling out a PDF form.

How to reproduce:
- Get the official Chinese Visa Application Form from http://www.china-embassy.org/eng/visas/fd/W020130830801798289342.pdf
- Open it in Evince
- Fill in a name (e.g. "Smith"). The entered text is displayed correctly.
- Click into another filed
- The previously entered name is displayed in wrong characters (wrong encoding used?). E.g. "Smith" becomes "4NJUI".
- Saving and loading the PDF (with the entered text) also results in displaying wrong characters
- Clicking into the name filed results in displaying the correct name ("Smith")

=> It seems that somewhere in Evince (or libpoppler?) the wrong encoding is used for displaying non-active input fields.

Tested Ubuntu versions:
- Ubuntu 16.04
- Ubuntu 17.10
Comment 1 Thomas Dreibholz 2017-10-27 19:29:36 UTC
Created attachment 135133 [details]
Screenshot: filling in text into a field of the form
Comment 2 Thomas Dreibholz 2017-10-27 19:31:43 UTC
Created attachment 135134 [details]
Screenshot: filling in text into a field of the form
Comment 3 Thomas Dreibholz 2017-10-27 19:32:31 UTC
Created attachment 135135 [details]
Screenshot: result after activating another field of the form
Comment 4 Thomas Dreibholz 2017-10-27 19:43:32 UTC
The PDF form contains a couple of fonts -- some are embedded, some are not -- with a couple of different encodings. May be something in libpoppler goes wrong with the font encoding handling for the form's fields?


$ pdffonts W020130830801798289342.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
ZJWNJQ+SimSun                        CID TrueType      Identity-H       yes yes yes   1780  0
AASELS+TimesNewRoman,Bold            CID TrueType      Identity-H       yes yes yes   1785  0
JEIVZQ+SimSun                        CID TrueType      Identity-H       yes yes yes   1787  0
HRUUFF+SimSun                        CID TrueType      Identity-H       yes yes yes   1789  0
TimesNewRomanPS-BoldItalicMT         TrueType          WinAnsi          no  no  no    1791  0
TimesNewRomanPSMT                    TrueType          WinAnsi          no  no  no    1793  0
Times-Roman                          Type 1            Custom           no  no  no    1689  0
AdobeSongStd-Light                   CID Type 0        UniGB-UTF16-H    no  no  no    1692  0
SimHei                               CID TrueType      UniGB-UTF16-H    no  no  no    1693  0
SimSun                               CID TrueType      UniGB-UTF16-H    no  no  no    1694  0
TimesNewRoman                        TrueType          WinAnsi          no  no  no    1695  0
MicrosoftYaHei                       CID TrueType      UniGB-UTF16-H    no  no  no    1717  0
MicrosoftYaHei,Bold                  CID TrueType      UniGB-UTF16-H    no  no  no    1718  0
NSimSun                              CID TrueType      UniGB-UTF16-H    no  no  no    1719  0
AdobeSongStd-Light                   CID Type 0        UniGB-UTF16-H    no  no  no    1772  0
ZJWNJQ+SimSun                        CID TrueType      Identity-H       yes yes yes    285  0
TimesNewRomanPSMT                    TrueType          WinAnsi          no  no  no     289  0
ZJWNJQ+SimSun                        CID TrueType      Identity-H       yes yes yes    372  0
QGJLNI+CambriaMath                   CID TrueType      Identity-H       yes yes yes    377  0
TimesNewRomanPSMT                    TrueType          WinAnsi          no  no  no     379  0
AdobeSongStd-Light                   CID Type 0        UniGB-UTF16-H    no  no  no     361  0
TimesNewRomanPS-BoldItalicMT         TrueType          WinAnsi          no  no  no     398  0
KozMinPr6N-Regular                   CID Type 0        UniJIS-UTF16-H   no  no  no     410  0
Comment 5 GitLab Migration User 2018-08-21 11:09:34 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/539.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.