Bug 101624 - Special chars in a MacRoman encoded font are displayed wrong
Summary: Special chars in a MacRoman encoded font are displayed wrong
Status: RESOLVED DUPLICATE of bug 101855
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-28 11:32 UTC by Thomas Freitag
Modified: 2017-08-01 13:01 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
pdf with this problem (522.19 KB, application/pdf)
2017-06-28 11:32 UTC, Thomas Freitag
Details
Use unicode cmap if it exists (2.01 KB, patch)
2017-06-28 11:36 UTC, Thomas Freitag
Details | Splinter Review

Description Thomas Freitag 2017-06-28 11:32:38 UTC
Created attachment 132299 [details]
pdf with this problem

In the attached PDF there are two embedded fonts. Both should use the MacRomanEncoding and both font files have a cmap for MacRoman. But these cmaps seem to be wrong. On the other hand they both also have a unicode cmap, but poppler says in its decision tree:

  //    1a. If the PDF font specified MacRomanEncoding and the
  //        TrueType font has a Macintosh Roman cmap, use it, and
  //        reverse map the char names through MacRomanEncoding to
  //        get char codes.
  //    1b. If the PDF font is not symbolic or the PDF font is not
  //        embedded, and the TrueType font has a Microsoft Unicode
  //        cmap or a non-Microsoft Unicode cmap, use it, and use the
  //        Unicode indexes, not the char codes.

mupdf, acrobat and ghostscript are displaying the PDF correctly, xpdf and poppler not.

mupdf displays it correctly because it uses the LAST cmap in the font file which fits, and this is always the unicode one. If I force mupdf to use the FIRST (changing the code), it displays the PDF like poppler.
Comment 1 Thomas Freitag 2017-06-28 11:36:01 UTC
Created attachment 132300 [details] [review]
Use unicode cmap if it exists

If I change the decision tree and check first if a unicode cmap exists and if so, use it, everything works fine. This patch changes the decision tree.
Comment 2 Thomas Freitag 2017-07-14 12:38:43 UTC
A regression test with my (old) PDF Suite finds only one regression:
- dc-15-muentz-WP.f9.pdf
This PDF is defect but uses MacRomanEncoding, and the difference is that a bullet sign is now shown with my patch in the bulleted list where it is missing before.

So my conclusion: at least the patch doesn't make anything worse, but it solves my problem.
Comment 3 Albert Astals Cid 2017-07-31 15:30:44 UTC
Maybe makes sense to do what calibre does and use the last one? Do you think that maybe that would also fix https://bugs.freedesktop.org/show_bug.cgi?id=101855 ?
Comment 4 Albert Astals Cid 2017-07-31 15:36:36 UTC
and by calibre i mean mupdf, sorry my brain broke
Comment 5 Thomas Freitag 2017-08-01 13:01:40 UTC

*** This bug has been marked as a duplicate of bug 101855 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.