Summary: | FreeText annotation ignores font | ||
---|---|---|---|
Product: | poppler | Reporter: | Phil <phil.ayres> |
Component: | glib frontend | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | haxtibal, m.weghorn, oliver.sander |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Annotation example
Patch to make AnnotFreeText::parseAppearanceString extract the font name Patch to generate font tags and get font name and create the FreeText Annotations with the Base 14 fonts |
Thanks for you report! You're right. The only case where poppler renders FreeText font correct is when the annotation was generated by an external PDF tool that embedded an appearance stream (/AP) into annotation dictionaries. That's not the case with your attached file. If there's no /AP, poppler generates in-memory appearance dynamically where the font is ignored. Basically, to select a font in the absence of /AP, these things need to be done: * If no /AP is there, the appearance string /DA determines the font. /DA must have at least an operator Tf to select font name and size in points. * (optional) If we want to fine tune a font, /DA can have some more operators, e.g. Tc, Tw, Tz, TL, Ts, Tr. * poppler must dynamically construct an appearance stream that respects /DA. * The chosen font name must have a corresponding entry in the global catalogs AcroForm resource dictionary, see poppler/Catalog.h, Catalog::getAcroForm. * If font is not a member of Base14 fonts, the font must be embedded into the document. Poppler ignores the font at multiple places: * AnnotFreeText::parseAppearanceString only extracts fontsize and fontcolor from appearanceString. But not the font name. * AnnotFreeText::generateFreeTextAppearance dynamically generates in-memory appearance stream into Annot::appearStreams where font is hardcoded as /AnnotDrawFont = Helvetica, Type0, WinAnsiEncoding. * If you're using poppler Qt5 frontend to create a FreeText annotation, TextAnnotationPrivate::toAppearanceString constructs a wrong /DA string GooString::format("/Invalid_font {0:d} Tf", font.pointSize()). That "Invalid_font" is passed to AnnotFreeText Ctor and ends up in AnnotFreeText::dict["DA"] and member AnnotFreeText::appearanceString. * Page::addAnnotation and AnnotFreeText::AnnotFreeText don't provide a path to add a font entry to the catalogs AcroForm resource dictionary, nor to embedded a font in case if it's not a Base14 font * cpp and glib frontend don't let you create FreeText annotations at all (really? or am I missing something) Currently we're running an Okular GSoC project [0] that's probably affected by this bug. Maybe we can contribute to solve it, but can't promise. [0] https://summerofcode.withgoogle.com/projects/#6053164340477952 Created attachment 139467 [details] [review] Patch to make AnnotFreeText::parseAppearanceString extract the font name At least the first issue seems easy enough. Attached is a patch that makes AnnotFreeText::parseAppearanceString also extract the font name. After that I am stuck already. createAnnotDrawFont hard-wires font name, subtype, and encoding. I got the font name now, but where do I get subtype and encoding from? Thanks Oliver! I've got another patch pending based on yours. It already works somewhat. But there's something to clarify ahead: Attachment 103449 [details] seems broken. If that's true we can't use it as reference. Annot /DA has a Tf operand '/Rufscript'. << % ... /DA (/Rufscript 18 Tf 0 0 0.5 rg ) /Subtype /FreeText >> But there's no resource named '/Rufscript' in the font entry of the default resource dictionary. More over, there's no default resource dictionary at all: 3 0 obj % This is "Interactive Form Dictionary", aka AcroForm << % There's no /DR (default resource dictionary) in here! /Fields [ 5 0 R ] /SigFlags 3 >> endobj Obviously the PDF composer (prawnpdf [0]) thought it would be sufficient to write the name of the font as Tf operand. But that's not true. You need to name a font resource entry as operand, not the name of a font itself. The font entry can have an arbitrary name. It's /BaseFont in font dictionary which decides what font program to use. Some standard excerpts that make me believe I'm right: PDF 32000-1:2008 12.5.6.6 Free Text Annotations: "The default appearance string that shall be used in formatting the text (see 12.7.3.3, “Variable Text”)" PDF 32000-1:2008 12.7.3.3 Variable Text: "The specified font value shall match a resource name in the Font entry of the default resource dictionary" I checked the output of LaTex pdfcomment package [1] and found it misbehaves in a similar way to attachment 103449 [details]. Two different composers misbehaving leaves me in doubt if I'm right about the non conformance. Phil says Adobe Reader shows the right font. That's not true for me. When I open attachment 103449 [details] in Adobe Reader 10, they show a fallback font instead of Rufscript. Can anyone confirm this? To go on we have to clarify: - Is it really out-of-spec if we don't find DAs font tag in the default resource font dictionary, or is it just me misunderstanding the standard? - If it is really out-of-spec, shall we consider some heuristics to search the best font anyway? E.g. search in other resource dictionaries then the default one (e.g. page resource dictionaries), or use font tag as /BaseFont. - Or shall we be strict, use some simple default logic in poppler and tell folks at [0] and [1] about their bug? [0] https://github.com/prawnpdf/prawn [1] https://bitbucket.org/kleberj/pdfcomment/wiki/Home I can confirm that Adobe Reader 9 (on Ubuntu) continues to show the correct font. In Adobe Reader on Android the font is not shown. I unfortunately don't have access to a Windows machine at this moment to try Adobe 10. (In reply to Tobias Deiminger from comment #3) > Thanks Oliver! I've got another patch pending based on yours. It already > works somewhat. But there's something to clarify ahead: > > Attachment 103449 [details] seems broken. If that's true we can't use it as > reference. > > Annot /DA has a Tf operand '/Rufscript'. > << > % ... > /DA (/Rufscript 18 Tf 0 0 0.5 rg ) > /Subtype /FreeText > >> > > But there's no resource named '/Rufscript' in the font entry of the default > resource dictionary. More over, there's no default resource dictionary at > all: > > 3 0 obj % This is "Interactive Form Dictionary", aka AcroForm > << > % There's no /DR (default resource dictionary) in here! > /Fields [ > 5 0 R > ] > /SigFlags 3 > >> > endobj > > Obviously the PDF composer (prawnpdf [0]) thought it would be sufficient to > write the name of the font as Tf operand. But that's not true. You need to > name a font resource entry as operand, not the name of a font itself. The > font entry can have an arbitrary name. It's /BaseFont in font dictionary > which decides what font program to use. > > Some standard excerpts that make me believe I'm right: > PDF 32000-1:2008 12.5.6.6 Free Text Annotations: "The default appearance > string that shall be used in formatting the text (see 12.7.3.3, “Variable > Text”)" > PDF 32000-1:2008 12.7.3.3 Variable Text: "The specified font value shall > match a resource name in the Font entry of the default resource dictionary" > > I checked the output of LaTex pdfcomment package [1] and found it misbehaves > in a similar way to attachment 103449 [details]. Two different composers > misbehaving leaves me in doubt if I'm right about the non conformance. > > Phil says Adobe Reader shows the right font. That's not true for me. When I > open attachment 103449 [details] in Adobe Reader 10, they show a fallback > font instead of Rufscript. Can anyone confirm this? > > To go on we have to clarify: > - Is it really out-of-spec if we don't find DAs font tag in the default > resource font dictionary, or is it just me misunderstanding the standard? > - If it is really out-of-spec, shall we consider some heuristics to search > the best font anyway? E.g. search in other resource dictionaries then the > default one (e.g. page resource dictionaries), or use font tag as /BaseFont. > - Or shall we be strict, use some simple default logic in poppler and tell > folks at [0] and [1] about their bug? What you're describing seems like what i fixed in https://cgit.freedesktop.org/poppler/poppler/commit/?id=8821c04f36cb737776cd9077a46f1a9f86ca54e7 but not sure if that patch helps for non Forms, maybe not, but you could get inspired by it? (In reply to Albert Astals Cid from comment #5) > What you're describing seems like what i fixed in > https://cgit.freedesktop.org/poppler/poppler/commit/ > ?id=8821c04f36cb737776cd9077a46f1a9f86ca54e7 but not sure if that patch > helps for non Forms, maybe not, but you could get inspired by it? Thanks Albert. Falling back to form DA may indeed be a viable path. But it won't help for attachment 103449 [details] as it has no DA in AcroForm either. A bug report of yours at https://bugs.scribus.net/view.php?id=5385 seems actually very related. Scribus made the same mistake as attachment 103449 [details], namley giving font descriptor instead of font resource as Tf operand. Scribus devs have confirmed and fixed it. For attachment 103449 [details] it's similar but just the special case of freetext annotation. We could try to report bugs to prawnpdf and pdfcomment as you did it those days (todo: check latest version of both projects). Additionally we could aim for a workaround in poppler, but that maybe isn't worth the effort or is even contradictory because we then support buggy pdf writers. What would you say? Despite potential bugs in external projects, it still holds that poppler doesn't support custom fonts for freetext in general and we should implement it. I take inspiration from AnnotWidget for how to lookup and use resources from the default resource directory. If prawnpdf and pdfcomment really create pdf files that don't follow the specification then I think that we should inform them about it. However, that won't save us from the fact that there are plenty of malformed pdf files out there that won't go away. We should think about how poppler could cope with these files. If the RufScript font is not where it is supposed to be, can we look for it elsewhere? (In reply to oliver.sander from comment #7) > If prawnpdf and pdfcomment really create pdf files that don't follow the > specification then I think that we should inform them about it. Yes, if we have the time to spare, we should > However, > that won't save us from the fact that there are plenty of malformed pdf > files out there that won't go away. Correct, moreover if other viewers render them correctly > We should think about how poppler could > cope with these files. If the RufScript font is not where it is supposed to > be, can we look for it elsewhere? We can, but at that stage of dealing with a broken file, i would say let's do the minimum needed to make it work, if that involves just going to some default font, we can do that, probably easier, and then we can try to iterate/improve on it if needed/time is available :) So your proposal would be: use the correct font if it exists in the correct location, otherwise do the same as previously? ... that is certainly a good first step, but I am afraid we will have to think about a better solution right away: The example file that comes with this very bug will not be rendered better than before (yes, because it is malformed, but still). (In reply to oliver.sander from comment #9) > So your proposal would be: use the correct font if it exists in the correct > location, otherwise do the same as previously? I'm not sure what you mean with "as previously", i mean "use a default font". (In reply to Phil from comment #0) > generating the PDF with Ruby Prawn gem, and > adding the annotations with Ruby Origami PDF gem. Should have read that more carefully, sorry. So it is origami [0], not prawnpdf, who needs to setup default appearance and fonts in default resources correctly. Or even the origami user, depending on how low-level the API is. @Phil: From a very brief look at [1] it seems like if Origami users want to set a font for FreeText, they are intended to edit DA as raw string. Is it? If so, it would probably also be your own responsibility to invent a font descriptor, create a related font resource and finally add the resource to the font sub dictionary of default resource dictionary in InteractiveForm [2]. [0] https://github.com/gdelugre/origami [1] https://github.com/gdelugre/origami/blob/98ea557af7aa9e926aac564bf89e6e0ead4a1a5e/lib/origami/annotations.rb#L309 [2] https://github.com/gdelugre/origami/blob/16c6fffd433efd2ef9d7d56795912c8dc9a38cf3/lib/origami/acroform.rb#L114 > I'm not sure what you mean with "as previously", i mean "use a default font".
This is what poppler does right now: It uses a default font. Namely Helvetica, hard-wired in createAnnotDrawFont. That's why I wrote "as previously".
Created attachment 140969 [details] [review] Patch to generate font tags and get font name and create the FreeText Annotations with the Base 14 fonts This patch is generated as per my experiment in Poppler and is a workaround for the Base 14/standard fonts. It gets rid of the "Invalid_font" tags and generates meaningful font tags. Secondly, the font name as per the Base 14 font names is exactly generated from the QFont and the font name is set in the font dictionary inside createAnnotDrawFont. The default is set to "Helvetica". What this patch can do is when the Okular program is in the memory, you can try different base-14 fonts for the typewriter annotation but if you save the PDF doc and then quit it, the fonts will be set to "Helvetica". Imho this experimental patch can be extended and modified to write the font dictionary for the base 14 fonts in the document and to generate freetext appearance based on the DA and DR entries and the font dict. The second follows the embedded fonts. (In reply to Dileep Sankhla from comment #14) > Created attachment 140969 [details] [review] [review] > Patch to generate font tags and get font name and create the FreeText > Annotations with the Base 14 fonts > > This patch is generated as per my experiment in Poppler and is a workaround > for the Base 14/standard fonts. It gets rid of the "Invalid_font" tags and > generates meaningful font tags. Secondly, the font name as per the Base 14 > font names is exactly generated from the QFont and the font name is set in > the font dictionary inside createAnnotDrawFont. The default is set to > "Helvetica". What this patch can do is when the Okular program is in the > memory, you can try different base-14 fonts for the typewriter annotation > but if you save the PDF doc and then quit it, the fonts will be set to > "Helvetica". > Imho this experimental patch can be extended and modified to write the font > dictionary for the base 14 fonts in the document and to generate freetext > appearance based on the DA and DR entries and the font dict. The second > follows the embedded fonts. This patch was formed over the font color patch here: https://bugs.freedesktop.org/attachment.cgi?id=140963 (In reply to Dileep Sankhla from comment #14) > This patch is generated as per my experiment in Poppler and is a workaround > for the Base 14/standard fonts. Thanks Dileep. The patch has several problems, but it's good you sent it, it helps narrowing down a solution. > It gets rid of the "Invalid_font" tags and > generates meaningful font tags. Tags have to be unique in the AcroForm->DR->Font dictionary. Your patch can't ensure this, because you did no lookup to DR. There might already be a "/DejSe" tag existing, coming from another PDF tool for another font. We must not overwrite that tag. Your patch may even collide with its own tags, because for example QFont("Liberation Mono,12,-1,5,50,0,0,0,0,0,Regular") and QFont("Liberation Mono,12,-1,5,75,1,0,0,0,0,Bold Italic") both become "LiberMo", but they're different fonts. We often see PDF docs where font tags are counted like this, F1, F2, F3,... or R1, R2, R3... I believe that's a reasonable scheme. No need to make the tag sound like the name, but just use the next free number to make it unique. > Secondly, the font name as per the Base 14 > font names is exactly generated from the QFont and the font name is set in > the font dictionary inside createAnnotDrawFont. No, your generated names are neither Base14 nor PostScript names, but quite arbitrary. Your new method createFontTagandName() generates font names and tags from QFonts with results like this: QFont("DejaVu Serif,12,-1,5,50,0,0,0,0,0,Book") => Tag="DejSe", Name="DejaVu-Book" QFont("FreeSans,12,-1,5,63,0,0,0,0,0,Bold") => Tag="Free", Name="FreeSans-Bold" QFont("Liberation Mono,12,-1,5,50,0,0,0,0,0,Regular") => Tag="LiberMo", Name="Liberation" QFont("Liberation Mono,12,-1,5,75,1,0,0,0,0,Bold Italic") => Tag="LiberMo", Name="Liberation-BoldItalic" That generated names really don't refer to Base14 fonts. In the standard, there's a list of exactly 14 Type1 base fonts: [Times-Roman, Helvetica, Courier, Symbol, Times-Bold, Helvetica-Bold, Courier-Bold, ZapfDingbats, Times-Italic, Helvetica-Oblique, Courier-Oblique, Times-BoldItalic, Helvetica-BoldOblique, Courier-BoldOblique] That's it. None of DejaVu-Book, FreeSans-Bold, Liberation, Liberation-BoldItalic is listed here, so they're not Base14 fonts. The names are also not PostScript names. You can check for PostScript name like $ fc-scan --format "%{postscriptname}\n" /usr/share/fonts/truetype/dejavu/DejaVuSerif.ttf DejaVuSerif So the PostScript name for QFont("DejaVu Serif,12,-1,5,50,0,0,0,0,0,Book") would have been "DejaVuSerif", not "DejaVu-Book". We can use non-Base14 fonts, it's just harder to set them up. It's important to use the real PostScript name then, and to provide encoding and a font descriptor. And, for platform independence, non-Base14 font programs should be embedded into the PDF. Embedding is optional: If font is not embedded, a reader will do font substitution with available system fonts by available metrics. I think we should start by querying fontconfig in poppler to get font type and PostScript name, and then write it to SubType and BaseName. In a second step, we should try to get the encoding right and to setup a font descriptor. I guess most readers have heuristics to cope without a font descriptor, but standard says it's required. If we managed that too, we can finally research for font embedding. > The default is set to > "Helvetica". What this patch can do is when the Okular program is in the > memory, you can try different base-14 fonts for the typewriter annotation > but if you save the PDF doc and then quit it, the fonts will be set to > "Helvetica". I've not looked into that yet. Eventually we should generate a DA that's the same in memory and on disk. > Imho this experimental patch can be extended and modified to write the font > dictionary for the base 14 fonts in the document and to generate freetext > appearance based on the DA and DR entries and the font dict. The second > follows the embedded fonts. Yes, both needs to be done. (In reply to Tobias Deiminger from comment #16) Thank you Tobias for the quick review but I must say that it was an experimental (or workaround) patch for the base-14 fonts and it is missing in dealing with the acroform's /DR entry. The font names that I have created in the function are only applicable for the base-14 fonts if you choose a base-14 font from KfontRequester in Okular. Then the names will be meaningful postscripts base-14 font names like Times-BoldItalic, Courier-BoldOblique, etc. The other font names it does generate are meaningless here. I know the font tags that I have tried to generate are not very well applied as I didn't focus on the DR entry which was the hardest part for me and I'm going to write about that too (and this experiment) in my blog post. But I think in this GSoC, we can demonstrate different base-14 fonts only in Okular by choosing and applying different standard fonts in the different FreeText annotations. It won't work for rest other fonts and the behavior is undefined. After Akademy and GSoC, we can work on the remaining goals of font family implementation but at least in this GSoC, I'm glad that I experimented with the base-14 fonts and have also created a gif displaying the output in my status report :) (In reply to Dileep Sankhla from comment #17) > The font names that I have created in the function are only applicable for > the base-14 fonts if you choose a base-14 font from KfontRequester in > Okular. Then the names will be meaningful postscripts base-14 font names > like Times-BoldItalic, Courier-BoldOblique, etc. The other font names it > does generate are meaningless here. Afaikt, Times, Helvetica and Courier are proprietary fonts that are not available on a typical Linux system. So, without further measures, I can't select a Base14 font from KFontRequester. How did you do it? > It won't work for rest other fonts and the behavior is > undefined. Yes, but that could indeed be a good experiment. We can manually figure out the true postscript names of some system fonts (using fontconfig, or ttfdump), and then hardcode the postscript name into the resource generated by createAnnotDrawFont. Like fontDict->add(copyString("BaseFont"), Object(objName, "DejaVuSerif")); fontDict->add(copyString("Subtype"), Object(objName, "TrueType")); That would be a good starting point to debug into popplers font lookup code, and to extend the fake font resource with encoding and font descriptor and learn how things get processed. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/395. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 103449 [details] Annotation example I have been creating PDFs dynamically, with FreeText and signature annotations. When shown in Evince, FreeText annotations generally show as I formatted them, but they ignore a specified True Type font. Adobe Reader with the same document shows the font correctly. Note: I originally submitted this bug with Evince, not realizing that Poppler was the PDF frontend for the application. Hopefully I made it to the right place this time. In the attached example, the yellow block is a FreeText annotation. The font which appears in the first line of the document as "Test this font" is also applied to the FreeText annotation. As you can see, the default Helvetica font is what is actually displayed in the yellow annotation, when it should match the font in the "Test this font" line. Other than that, font size, color and position seems to be applied correctly. Not a horrible bug, but when adding annotations as I am to go with signatures, the lack of the TT font makes things look a little clunky. BTW, I'm running up to date Ubuntu, Evince 3.10, and generating the PDF with Ruby Prawn gem, and adding the annotations with Ruby Origami PDF gem. Let me know if I can provide any additional information.