Created attachment 132299 [details] pdf with this problem In the attached PDF there are two embedded fonts. Both should use the MacRomanEncoding and both font files have a cmap for MacRoman. But these cmaps seem to be wrong. On the other hand they both also have a unicode cmap, but poppler says in its decision tree: // 1a. If the PDF font specified MacRomanEncoding and the // TrueType font has a Macintosh Roman cmap, use it, and // reverse map the char names through MacRomanEncoding to // get char codes. // 1b. If the PDF font is not symbolic or the PDF font is not // embedded, and the TrueType font has a Microsoft Unicode // cmap or a non-Microsoft Unicode cmap, use it, and use the // Unicode indexes, not the char codes. mupdf, acrobat and ghostscript are displaying the PDF correctly, xpdf and poppler not. mupdf displays it correctly because it uses the LAST cmap in the font file which fits, and this is always the unicode one. If I force mupdf to use the FIRST (changing the code), it displays the PDF like poppler.
Created attachment 132300 [details] [review] Use unicode cmap if it exists If I change the decision tree and check first if a unicode cmap exists and if so, use it, everything works fine. This patch changes the decision tree.
A regression test with my (old) PDF Suite finds only one regression: - dc-15-muentz-WP.f9.pdf This PDF is defect but uses MacRomanEncoding, and the difference is that a bullet sign is now shown with my patch in the bulleted list where it is missing before. So my conclusion: at least the patch doesn't make anything worse, but it solves my problem.
Maybe makes sense to do what calibre does and use the last one? Do you think that maybe that would also fix https://bugs.freedesktop.org/show_bug.cgi?id=101855 ?
and by calibre i mean mupdf, sorry my brain broke
*** This bug has been marked as a duplicate of bug 101855 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.