Summary: | "8" shown instead of "x" inside checkbox when converting LibreOffice-generated form to PostScript | ||
---|---|---|---|
Product: | poppler | Reporter: | Michael Weghorn <m.weghorn> |
Component: | utils | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | m.weghorn |
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Sample form generated by LibreOffice
Form with ticked checkbox and run through "mutool clean" Result after running "pdftops" on the file Patch to allow font tags other than "ZaDb" for ZapfDingbats Patch to allow font tags other than "ZaDb" for ZapfDingbats |
Created attachment 140725 [details]
Form with ticked checkbox and run through "mutool clean"
Created attachment 140726 [details]
Result after running "pdftops" on the file
This is the resulting PostScript file that shows the "8" instead of the "x". I can reproduce with current git master (as of Poppler 0.67, commit 20d89699b35397f23352d0e60a3e19da2ce6b410).
Created attachment 140727 [details] [review] Patch to allow font tags other than "ZaDb" for ZapfDingbats As the 'pdftops' output indicates, there's a problem with the ZapfDingbats font. Poppler expects "ZaDb" to be used as the font tag, and replaces anything else with this. The LibreOffice generated form however uses "ZaDi" instead. As far as I understand it so far, either LibreOffice uses an invalid name or Poppler makes invalid assumptions. While the PDF specification does use "ZaDb" in its own example (in section 12.7.4.2.3), I did not find any place where it speaks of what tag has to be used, so it appears to be the latter case (but maybe I have missed something in the PDF spec or anywhere else). The attached patch makes Poppler accept other tags for the ZapfDingbats font as well. I'd be happy about feedback on the patch or other clarifications. (I'm far from being a Poppler expert...) Created attachment 140728 [details] [review] Patch to allow font tags other than "ZaDb" for ZapfDingbats (updated patch to use correct email address) (In reply to Michael Weghorn from comment #3) > The attached patch makes Poppler accept other tags for the ZapfDingbats font > as well. > > I'd be happy about feedback on the patch or other clarifications. (I'm far > from being a Poppler expert...) Hi Michael! I'm also far from being poppler expert, but I'd like to confirm your approach. From standards perspective it doesn't matter if the tag is named /ZaDB or /ZaDi or /whatever. What actually matters is: The tag needs a corresponding entry in a resource font sub dictionary. From popplers perspective, we sometimes set forceZapfDingbats = true. Like with the example document, where AnnotAppearanceBuilder::drawText is reached via drawFormField => drawFormFieldButton [case formButtonCheck] => drawText. Whenever forceZapfDingbats == true, the appearance Tf operand must match our fake font resource that we hardcoded named "ZaDB". If the original DA Tf operand was different, we need to replace it with "ZaDB". Your patch ensures this, if I got it right, and therefore it's a good patch:) We're having a similar discussion atm. here [0], and also here [1], because of our current GSoC project. Maybe you have a look, esp. at the UML diagram [2] that shows the relationship of the different font objects and give your two cents if we're on the right track. Btw., your attached PDF document is actually strange because it has a /DR entry in the Annot dictionary, which is not specified for Widget Annotation Dictionaries. At least not in PDF 1.7 32000-1:8. The /DR entry is meant to be in the global AcroForm dictionary. Has this changed in PDF 2.0? [0] https://bugs.freedesktop.org/show_bug.cgi?id=81748. [1] https://cgit.kde.org/scratch/dileepsankhla/okular-gsoc2018-typewriter.git/tree/bugs/poppler_81748 [2] https://cgit.kde.org/scratch/dileepsankhla/okular-gsoc2018-typewriter.git/plain/bugs/poppler_81748/font_object_graph.dia Hi Tobias, thanks for your reply with all the additional information and sorry for the delay in responding. (In reply to Tobias Deiminger from comment #5) > From popplers perspective, we sometimes set forceZapfDingbats = true. Like > with the example document, where AnnotAppearanceBuilder::drawText is reached > via drawFormField => drawFormFieldButton [case formButtonCheck] => drawText. > Whenever forceZapfDingbats == true, the appearance Tf operand must match our > fake font resource that we hardcoded named "ZaDB". If the original DA Tf > operand was different, we need to replace it with "ZaDB". > > Your patch ensures this, if I got it right, and therefore it's a good patch:) It doesn't really. E.g. for the example document, the font resource is no longer replaced with the fake one. It was before (i.e. without the patch), but the font resource was not found. Now, the original font resource with the "ZaDi" tag is used -- but if I understand you correctly, this might not be desirable if Poppler relies on the "ZaDb" being used at other places for the 'forceZapfDingbats' case... Should I rather have a look why the "ZaDb" one is not found (like indicated by the pdftops output: "Syntax Error: Unknown font tag 'ZaDb'")? I'll try to have a closer look at all the points you mentioned sometime soon, but only have limited time available at the moment, so can't really say when that will be. (In reply to Michael Weghorn from comment #6) > > Your patch ensures this, if I got it right, and therefore it's a good patch:) > > It doesn't really. E.g. for the example document, the font resource is no > longer replaced with the fake one. It was before (i.e. without the patch), > but the font resource was not found. Now, the original font resource with > the "ZaDi" tag is used -- but if I understand you correctly, this might not > be desirable if Poppler relies on the "ZaDb" being used at other places for > the 'forceZapfDingbats' case... Was on the wrong track too (I messed up with the return value of GooString::cmp). Now I think the original code without patch is fine already, at least wrt my above assertions. I've just learned new things about poppler. When printing into PDF, poppler obviously removes widget annotations, and replaces them with simple Content items. Guess this is required because PostScript doesn't support annotations? Anyway, AnnotAppearanceBuilder is then no longer responsible for displaying the "8" in the printout. The original simple_form_CHECKBOX_TICKED_CLEANED.pdf contains a widget annotation, representing the check button: 3 0 obj << /Type /Annot /Subtype /Widget /DR << /Font << /ZaDi 4 0 R >> >> /DA (0.13725 0.14901 0.15294 rg /ZaDi 0 Tf) /MK << /CA (8) >> >> endobj Notably, /CA is string "8" ("the widget annotation's normal caption which shall be displayed when it is not interacting with the user"). Now, when printed, the Widget object is gone. My decompressed printout.pdf instead contains this: 5 0 obj << /Contents 6 0 R ... >> 6 0 obj << /Length 376 >> stream /R8 10.9815 Tf [(8)-77]TJ ... % shortened ET 10 0 obj << /BaseFont /DZBPUO+F1348788328_100000 /Encoding /WinAnsiEncoding /FirstChar 56 /FontDescriptor 11 0 R /LastChar 56 /Subtype /Type1 /Type /Font /Widths [ 600 ] >> 11 0 obj << /Ascent 616 /AvgWidth 600 /CapHeight 616 /CharSet (/eight) /Descent -15 /Flags 65569 /FontBBox [ 0 -15 493 616 ] /FontFile3 12 0 R /FontName /DZBPUO+F1348788328_100000 /ItalicAngle 0 /MaxWidth 600 /MissingWidth 600 /StemV 73 /Type /FontDescriptor >> endobj 12 0 obj << /Subtype /Type1C /Length 396 >> ... % embedded font here Here, the TJ ("show text") operator writes string "8". The "8" got copied from MK CA of the very original document simple_form.pdf. Tf selects font /R8. R8 maps to Font Dictionary obj 10 0. This is an embedded font that has only one character 56 defined. 56 is ASCII for "8". So the "8" sign appears on screen/printout, and that's exactly what the PDF wants to happen. I'm not sure who we should accuse then. Maybe the software that originally wrote "8" into /CA? Does this longish post make sense at all? > Should I rather have a look why the "ZaDb" one is not found (like indicated > by the pdftops output: "Syntax Error: Unknown font tag 'ZaDb'")? I believe the Syntax Error is unrelated to the problem. But would be interesting where it originates anyway. Ah, I see... character 56 (="8" in ASCII) is a "cross symbol X" in the zapf dingbats font. So it makes some sense to have [(8)-77]TJ in the printed variant. Sadly the embedded font became Nimbus Mono PS, which has no cross symbol at 56, and "8" is drawn as digit. I could not yet discover the place where the [(8)-77]TJ gets formed. An obvious location to generate the stream is AnnotAppearanceBuilder::drawText, but I debugged it and it produces slightly different content q BT 0.13725 0.14901 0.15294 rg /ZaDb 11.00 Tf 1 0 0 1 2.43 1.55 Tm (8) Tj ET Q Maybe AnnotAppearanceBuilder::drawText is used, and there is some post processing that I'm not aware of? Michael, do you know? Anyway there seems to be a fundamental problem. All the Annotation classes dynamically generate in-memory appearance streams and may depend on in-memory resources. If we simply take this generated appearance streams and write them into a PDF file for printing, then dependent in-memory resources like the fake font are missing. We would have to write the resource objects to the PDF too, but that's not yet done. In your patch you prefer an existing zapf dingbats font over the in-memory fake font which works then. If we had a document with no zapf dingbat font and no CA defined, then GooString checkMark("3") will be used (see AnnotAppearanceBuilder::drawFormFieldButton) and we get the same bug again, is it? I have had a closer look at some aspects now. (In reply to Tobias Deiminger from comment #5) > We're having a similar discussion atm. here [0], and also here [1], because > of our current GSoC project. Maybe you have a look, esp. at the UML diagram > [2] that shows the relationship of the different font objects and give your > two cents if we're on the right track. Thanks for mentioning these, there's lots of helpful information. The UML diagram looks good to me. (I just realized that not all members are shown for all types, e.g. the 'CapHeight' member for the 'Font descriptor' is not mentioned, and some of the font dictionary members mentioned in section 9.6.2 in the PDF spec, but that may be intentional.) > Btw., your attached PDF document is actually strange because it has a /DR > entry in the Annot dictionary, which is not specified for Widget Annotation > Dictionaries. At least not in PDF 1.7 32000-1:8. The /DR entry is meant to > be in the global AcroForm dictionary. Has this changed in PDF 2.0? I also can't find a specification for the the '/DR' for the Annot dictionary in the PDF 1.7 spec. I don't know about PDF 2.0, but at a quick glance, the corresponding code in LibreOffice has been there for a long time, so I doubt it's related to any newer PDF standard. However (as far as I can see), the behaviour is still the same after manually removing the '/DR' entry from the Annot dictionary (object '3 0 obj'). (The AcroForm dictionary also specifies the font in its '/DR' entry.) (In reply to Tobias Deiminger from comment #8) > I could not yet discover the place where the [(8)-77]TJ gets formed. An > obvious location to generate the stream is AnnotAppearanceBuilder::drawText, > but I debugged it and it produces slightly different content > q > BT > 0.13725 0.14901 0.15294 rg /ZaDb 11.00 Tf 1 0 0 1 2.43 1.55 Tm > (8) Tj > ET > Q > > Maybe AnnotAppearanceBuilder::drawText is used, and there is some post > processing that I'm not aware of? Michael, do you know? Just to be sure: Are you using the "Print to File (PDF)" option from Okular to print to PDF? (I can reproduce the behaviour when doing so.) In this case, Okular first generates a PostScript file using Poppler's PSConverter, and then runs `ps2pdf` on that file (s. method `FilePrinter::doPrintFiles` in `core/fileprinter.cpp`, therefore the related PDF code should be be formed in that conversion done by Ghostscript (with `ps2pdf` being a Ghostscript tool). Therefore, two conversions are actually involved (PDF -> PS -> PDF). > > Anyway there seems to be a fundamental problem. All the Annotation classes > dynamically generate in-memory appearance streams and may depend on > in-memory resources. If we simply take this generated appearance streams and > write them into a PDF file for printing, then dependent in-memory resources > like the fake font are missing. We would have to write the resource objects > to the PDF too, but that's not yet done. > > In your patch you prefer an existing zapf dingbats font over the in-memory > fake font which works then. If we had a document with no zapf dingbat font > and no CA defined, then GooString checkMark("3") will be used (see > AnnotAppearanceBuilder::drawFormFieldButton) and we get the same bug again, > is it? Yes, I think the problem reappears then. So if I understand correctly, what should be done is to write the objects currently only created in-memory to the PDF document and this would solve the problem for both cases (the original document and the case you describe here). Still, one aspect that I currently haven't understood is why `forceZapfDingbats` is always set to 'true' whenever a checkbox is drawn via `AnnotAppearanceBuilder::drawFormFieldButton [case formButtonCheck]`. Do you know why? My (maybe naive) expectation without further examination would have been that an explicitly specified font is used if there is any, rather than always forcing ZapfDingbats (using the interactive form dicts `DR` entry as specified in Section 12.7.2 of the PDF 1.7 spec, table 218). In that case, I'd currently see two cases that could be distinguished: 1) If the document supplies proper information and resources for the font, those should be used (e.g. as with the given sample document here). 2) Otherwise ZapfDingbats is used and all required resources are saved in the document as well. Does this make sense or did I miss any reason for using ZapfDingbats unconditionally? (like one ould never want anything else than ZapfDinbats's '8' (check mark) in a checkbox anyway) As far as I understand, the visual result would be the same for implementing a solution for either 1) or 2) for the given sample document (since ZapfDingbats is used in both cases), but other documents might behave differently. Please also let me know in case I missed to reply to any other question or aspect you mentioned. Another interesting thing I realized is that using Poppler's 'pdftocairo' results in a PDF file that has the check mark shown properly (even though the same warning about the unknown font tag is being shown); command: $ pdftocairo -pdf simple_form_CHECKBOX_TICKED_CLEANED.pdf fromCairo.pdf Syntax Error: Unknown font tag 'ZaDb' I haven't had a closer look at this so far. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/541. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 140724 [details] Sample form generated by LibreOffice Converting a LibreOffice-generated PDF form with a ticked checkbox to PostScript leads to an "8" being shown inside the check box rather than the expected "x" sign. Steps to reproduce: 1) Open attached PDF form "simple_form.pdf" in Okular 2) tick the checkbox 3) print (either to a real printer or use "Print to File (PDF)") 4) Look at the output/printout Result: An "8" is shown inside of the checkbox that has been ticked. Expected result: The same checkmark ("x") as displayed in Okular is shown inside the checkbox on the printout. This can also be reproduced by directly calling 'pdftops' on a PDF form saved after ticking the checkbox: $ pdftops simple_form_CHECKBOX_TICKED_CLEANED.pdf Syntax Error: Unknown font tag 'ZaDb' Syntax Error: Unknown font tag 'ZaDb' (In addition to ticking the checkbox, the document has been run through 'mutool clean' to make analysis easier.)