Bug 103127

Summary: shows wrong characters
Product: poppler Reporter: Jason Crain <jason>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: 03-DynamickeModelyAChyby-LJ-2014.PDF - shows wrong characters
rendering of page 1 in pdftocairo
Patch to allow CID identity font

Description Jason Crain 2017-10-06 17:51:44 UTC
Created attachment 134711 [details]
03-DynamickeModelyAChyby-LJ-2014.PDF - shows wrong characters

Forwarding from AntonĂ­n Dach at https://bugzilla.gnome.org/788605

------------------------------

My teacher uses some wacky "Amyuni PDF Converter" on word document, to get his material in pdf, usually Evince has no trouble displaying these but I have sample PDF that is completely messed up but fine in firefox pdf viewer.

One non embedded font is used as well and I will append it here as well.

I am using Evince 3.24.1 build for Manjaro distro.

Maybe it has something to do with the font MTextra.

------------------------------

I can confirm that this file is not working with evince, pdftocairo, or pdftoppm. It does show correctly in Firefox, mupdf, and Chromium on Linux, and Adobe Reader on Windows. It does not appear to have anything to do with the MTExtra font as the submitter speculated, but it's likely related to the PDF not embedding any fonts.
Comment 1 Jason Crain 2017-10-06 17:54:44 UTC
Created attachment 134712 [details]
rendering of page 1 in pdftocairo

Attached is the rendering I get from pdftocairo.
Comment 2 Adrian Johnson 2017-10-06 23:51:48 UTC
I don't know why people keep using buggy pdf printers when Word has a perfectly good save as pdf feature builtin that preserves all the hyperlinks.

The pdf fails to render with Adobe Reader on Linux. Since it works on other viewers on Linux I had a quick look to see if there is anything we can do.

The problem is the fonts are all non-embedded CID TrueType with Identity encoding. The fonts do have ToUnicode maps which I assume is what the other viewers are using to map character codes to a substitute font.

When loading a substitute CID font, GfxCIDFont::getCodeToGIDMap() is called to map the character codes to the substitute font glyphs.

At line 2252 we bail out

  if (getCollection()->cmp("Adobe-Identity") == 0) return NULL;

I commented out this line and the PDF seems to render fine (as it is not in English a can't be 100% certain).

I don't know if this is a fix or a hack. I have not spent any time investigating  the implications of commenting out this line.
Comment 3 Albert Astals Cid 2017-10-20 15:13:36 UTC
Adrian, can you attach the patch (want to make sure i'm not trying the wrong change)

And once you do that i'll try to run a regtest and see if something breaks
Comment 4 Adrian Johnson 2017-10-21 10:23:35 UTC
Created attachment 134971 [details] [review]
Patch to allow CID identity font
Comment 5 Albert Astals Cid 2017-10-22 21:07:49 UTC
This regresses the rendering of
https://bugs.freedesktop.org/attachment.cgi?id=54089
Comment 6 GitLab Migration User 2018-08-21 10:53:27 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/410.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.