Bug 101624

Summary: Special chars in a MacRoman encoded font are displayed wrong
Product: poppler Reporter: Thomas Freitag <Thomas.Freitag>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED DUPLICATE QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: pdf with this problem
Use unicode cmap if it exists

Description Thomas Freitag 2017-06-28 11:32:38 UTC
Created attachment 132299 [details]
pdf with this problem

In the attached PDF there are two embedded fonts. Both should use the MacRomanEncoding and both font files have a cmap for MacRoman. But these cmaps seem to be wrong. On the other hand they both also have a unicode cmap, but poppler says in its decision tree:

  //    1a. If the PDF font specified MacRomanEncoding and the
  //        TrueType font has a Macintosh Roman cmap, use it, and
  //        reverse map the char names through MacRomanEncoding to
  //        get char codes.
  //    1b. If the PDF font is not symbolic or the PDF font is not
  //        embedded, and the TrueType font has a Microsoft Unicode
  //        cmap or a non-Microsoft Unicode cmap, use it, and use the
  //        Unicode indexes, not the char codes.

mupdf, acrobat and ghostscript are displaying the PDF correctly, xpdf and poppler not.

mupdf displays it correctly because it uses the LAST cmap in the font file which fits, and this is always the unicode one. If I force mupdf to use the FIRST (changing the code), it displays the PDF like poppler.
Comment 1 Thomas Freitag 2017-06-28 11:36:01 UTC
Created attachment 132300 [details] [review]
Use unicode cmap if it exists

If I change the decision tree and check first if a unicode cmap exists and if so, use it, everything works fine. This patch changes the decision tree.
Comment 2 Thomas Freitag 2017-07-14 12:38:43 UTC
A regression test with my (old) PDF Suite finds only one regression:
- dc-15-muentz-WP.f9.pdf
This PDF is defect but uses MacRomanEncoding, and the difference is that a bullet sign is now shown with my patch in the bulleted list where it is missing before.

So my conclusion: at least the patch doesn't make anything worse, but it solves my problem.
Comment 3 Albert Astals Cid 2017-07-31 15:30:44 UTC
Maybe makes sense to do what calibre does and use the last one? Do you think that maybe that would also fix https://bugs.freedesktop.org/show_bug.cgi?id=101855 ?
Comment 4 Albert Astals Cid 2017-07-31 15:36:36 UTC
and by calibre i mean mupdf, sorry my brain broke
Comment 5 Thomas Freitag 2017-08-01 13:01:40 UTC
*** This bug has been marked as a duplicate of bug 101855 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.