[self-compiled poppler 0.16.2] Please have a look at this document: http://diwww.epfl.ch/w3lsp/publications/typography/frsa.pdf Ligatures like `fi' or `fl' are mapped incorrectly to `Æ' or `Ø', and accents like the acute over `e' is mapped to `¬'. BTW, I don't mention the horrible spacing and kerning :-) acroread displays this document just fine.
Uses a font that is not in your system (Optima) and is not embedded in the PDF. Nothing we can do, either install your font on the system or configure a proper font substitution in fontconfig.
Hmm. Then how do you explain that acroread gets it right? For display, it replaces Optima with Adobe Sans MM on my GNU/Linux box. I can imagine that Adobe maintains a database of common fonts and how they should be substituted... At least this would be a solution to this problem.
Yes, Adobe maintains a database of substitutions, in linux the tool for that is called fontconfig (i guess you already knew that ;-)) if you set it correctly and it should work (reopen it if it does not work). It is out of scope for us in poppler maintaining that database matches.
Sorry to say, but your solution is not acceptable for Joe User. Fiddling with fontconfig is extremely difficult since there are no GUI programs (that I'm aware of) which allow the necessary manipulations needed to resolve the problem. Another complication is that the frequently used Optima fonts are not freely available, and the URWClassico clone isn't either, as far as I know. On the other hand, your solution works which surprises me: <alias binding="same"> <family>Optima</family> <accept> <family>URWClassico</family> </accept> </alias> I've expected that the ligature issues remain, but obviously URWClassico has the right encoding vector. The same is true if I use, say, `Century Schoolbook L' as a replacement. However, it fails if I try a TrueType font like `Liberation Sans'. How shall Joe User know this? For me, this is indeed a very good reason to maintain a database for resolving cmap issues – fontconfig can't do this job for poppler.
This is clearly the responsibility of fontconfig, and poppler can't (in general) solve the problem, because it depends on an arbitrary installed set of fonts on each machine. If there is missing tools for fontconfig, then that is a bug in fontconfig, not in poppler.
Hmm, hmm, I need to be persistent here, since we are probably miscommunicating. Are you sure that this is the responsibility of fontconfig? I doubt it. It seems that poppler directly accesses the encoding vector of a Type 1 font instead of relying proper cmap access as given with, say, FreeType. Note that the problem is not a general font selection issue but how fonts get processed – poppler apparently accesses Type 1 fonts and TTFs differently, otherwise there wouldn't be a problem if I substitute the Type 1 font with a TTF. And THIS is definitely not a fontconfig issue. Am I missing something? BTW, I won't play the game with resetting the bug status from `resolved' to `reopened'. :-)
Let's start with the fact that PDF with non standard non embedded fonts are by definition non interoperable. As far as i know this is what we do, we know we have to render glyph 3 of a font called Optima, so ask fontconfig for it and then go and render glyph 3. If what we got is not Optima, well it might work or it might not.
My knowledge of PDF internals is small: Do you access fonts in SFNT format by index also, bypassing the cmap?
(In reply to comment #8) > My knowledge of PDF internals is small: Do you access fonts in SFNT format by > index also, bypassing the cmap? I was imprecise: How does the PCF specification mandate glyph access in SFNT fonts?
Not that i really know much either :-D As far as i know we don't bypass the cmap, but we use the cmap specified by the pdf file not by the font.
(In reply to comment #9) > I was imprecise: How does the PCF specification mandate glyph access in SFNT > fonts? You can read all about it in Section 9.6.6 of the PDF Reference [1]. Character encoding is the worst part of the PDF standard, probably because it is something the started with only supporting Type 1 fonts in PDF 1.0 and has been extended in each revision to support additional font types while maintaining backwards compatibility. In the frsa.pdf file it is using text strings with 8-bit characters. [A-Za-z0-9] are all encoded with the standard ASCII values. The character used for the fi ligature is encoded as character 174. The font dictionary for the font used by most of the text (including the fi ligature) specifies a Type 1 font named "Optima". The font dictionary overrides the font encoding in the font with a custom encoding. In PDF, Type 1 font encodings are keyed by glyph name. The dictionary maps character 174 to glyph name "/AE". Since this font is not embedded in the PDF this is all the information the PDF viewer has to work with. [1] http://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/adobe_supplement_iso32000.pdf
Thanks. Would it be possible to make poppler try to substitute a Type 1 font with another Type 1 font, and a TTF with a TTF? As my tests have shown, this can improve the result.
Created attachment 43699 [details] [review] encoding fix I had another look at the PDF file to see why it works when the substitute font is Type 1 but not when it is TrueType. There are two non-embedded fonts in the PDF named "Optima". One has a modified encoding while the other does not. I was looking at the wrong font. The PDF file does correctly map the character used for the fi ligature to the glyph name "/fi". The problem is a bug in poppler. When Optima is substituted with a TrueType font, Gfx8BitFont::getCodeToGIDMap() is called to get the mapping from the PDF character codes to the font GID. This function assumes that the font in the PDF is always a TrueType font. When the PDF font is a Type 1 font it needs to go through the glyph names in the enc array to map them to the TrueType GID numbers. It is not doing this when no /Encoding is defined in the Type 1 font dictionary. The attached patch seems to fix this. I have not done much testing with it or done enough analysis of the code to be confident that this work for all cases without causing regressions. But so far it has worked on the PDF files I have tested the patch with.
Glad to know that my persistence has helped identify a bug :-) Thanks for working on this!
Will be in poppler 0.16.3
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.