Bug 103127 - shows wrong characters
Summary: shows wrong characters
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-06 17:51 UTC by Jason Crain
Modified: 2018-08-21 10:53 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
03-DynamickeModelyAChyby-LJ-2014.PDF - shows wrong characters (599.40 KB, application/pdf)
2017-10-06 17:51 UTC, Jason Crain
Details
rendering of page 1 in pdftocairo (233.14 KB, image/png)
2017-10-06 17:54 UTC, Jason Crain
Details
Patch to allow CID identity font (481 bytes, patch)
2017-10-21 10:23 UTC, Adrian Johnson
Details | Splinter Review

Description Jason Crain 2017-10-06 17:51:44 UTC
Created attachment 134711 [details]
03-DynamickeModelyAChyby-LJ-2014.PDF - shows wrong characters

Forwarding from Antonín Dach at https://bugzilla.gnome.org/788605

------------------------------

My teacher uses some wacky "Amyuni PDF Converter" on word document, to get his material in pdf, usually Evince has no trouble displaying these but I have sample PDF that is completely messed up but fine in firefox pdf viewer.

One non embedded font is used as well and I will append it here as well.

I am using Evince 3.24.1 build for Manjaro distro.

Maybe it has something to do with the font MTextra.

------------------------------

I can confirm that this file is not working with evince, pdftocairo, or pdftoppm. It does show correctly in Firefox, mupdf, and Chromium on Linux, and Adobe Reader on Windows. It does not appear to have anything to do with the MTExtra font as the submitter speculated, but it's likely related to the PDF not embedding any fonts.
Comment 1 Jason Crain 2017-10-06 17:54:44 UTC
Created attachment 134712 [details]
rendering of page 1 in pdftocairo

Attached is the rendering I get from pdftocairo.
Comment 2 Adrian Johnson 2017-10-06 23:51:48 UTC
I don't know why people keep using buggy pdf printers when Word has a perfectly good save as pdf feature builtin that preserves all the hyperlinks.

The pdf fails to render with Adobe Reader on Linux. Since it works on other viewers on Linux I had a quick look to see if there is anything we can do.

The problem is the fonts are all non-embedded CID TrueType with Identity encoding. The fonts do have ToUnicode maps which I assume is what the other viewers are using to map character codes to a substitute font.

When loading a substitute CID font, GfxCIDFont::getCodeToGIDMap() is called to map the character codes to the substitute font glyphs.

At line 2252 we bail out

  if (getCollection()->cmp("Adobe-Identity") == 0) return NULL;

I commented out this line and the PDF seems to render fine (as it is not in English a can't be 100% certain).

I don't know if this is a fix or a hack. I have not spent any time investigating  the implications of commenting out this line.
Comment 3 Albert Astals Cid 2017-10-20 15:13:36 UTC
Adrian, can you attach the patch (want to make sure i'm not trying the wrong change)

And once you do that i'll try to run a regtest and see if something breaks
Comment 4 Adrian Johnson 2017-10-21 10:23:35 UTC
Created attachment 134971 [details] [review]
Patch to allow CID identity font
Comment 5 Albert Astals Cid 2017-10-22 21:07:49 UTC
This regresses the rendering of
https://bugs.freedesktop.org/attachment.cgi?id=54089
Comment 6 GitLab Migration User 2018-08-21 10:53:27 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/410.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.