101624 – Special chars in a MacRoman encoded font are displayed wrong

Bug 101624 - Special chars in a MacRoman encoded font are displayed wrong

Summary: Special chars in a MacRoman encoded font are displayed wrong

Status:	RESOLVED DUPLICATE of bug 101855

Alias:	None

Product:	poppler
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	poppler-bugs
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2017-06-28 11:32 UTC by Thomas Freitag
Modified:	2017-08-01 13:01 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
pdf with this problem (522.19 KB, application/pdf) 2017-06-28 11:32 UTC, Thomas Freitag	Details
Use unicode cmap if it exists (2.01 KB, patch) 2017-06-28 11:36 UTC, Thomas Freitag	Details \| Splinter Review
View All

Description Thomas Freitag 2017-06-28 11:32:38 UTC

Created attachment 132299 [details]
pdf with this problem

In the attached PDF there are two embedded fonts. Both should use the MacRomanEncoding and both font files have a cmap for MacRoman. But these cmaps seem to be wrong. On the other hand they both also have a unicode cmap, but poppler says in its decision tree:

  //    1a. If the PDF font specified MacRomanEncoding and the
  //        TrueType font has a Macintosh Roman cmap, use it, and
  //        reverse map the char names through MacRomanEncoding to
  //        get char codes.
  //    1b. If the PDF font is not symbolic or the PDF font is not
  //        embedded, and the TrueType font has a Microsoft Unicode
  //        cmap or a non-Microsoft Unicode cmap, use it, and use the
  //        Unicode indexes, not the char codes.

mupdf, acrobat and ghostscript are displaying the PDF correctly, xpdf and poppler not.

mupdf displays it correctly because it uses the LAST cmap in the font file which fits, and this is always the unicode one. If I force mupdf to use the FIRST (changing the code), it displays the PDF like poppler.

Comment 1 Thomas Freitag 2017-06-28 11:36:01 UTC

Created attachment 132300 [details] [review]
Use unicode cmap if it exists

If I change the decision tree and check first if a unicode cmap exists and if so, use it, everything works fine. This patch changes the decision tree.

Comment 2 Thomas Freitag 2017-07-14 12:38:43 UTC

A regression test with my (old) PDF Suite finds only one regression:
- dc-15-muentz-WP.f9.pdf
This PDF is defect but uses MacRomanEncoding, and the difference is that a bullet sign is now shown with my patch in the bulleted list where it is missing before.

So my conclusion: at least the patch doesn't make anything worse, but it solves my problem.

Comment 3 Albert Astals Cid 2017-07-31 15:30:44 UTC

Maybe makes sense to do what calibre does and use the last one? Do you think that maybe that would also fix https://bugs.freedesktop.org/show_bug.cgi?id=101855 ?

Comment 4 Albert Astals Cid 2017-07-31 15:36:36 UTC

and by calibre i mean mupdf, sorry my brain broke

Comment 5 Thomas Freitag 2017-08-01 13:01:40 UTC


*** This bug has been marked as a duplicate of bug 101855 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.