Bug 83108 - Not enough randomness when converting to PDF
Summary: Not enough randomness when converting to PDF
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version: 4.2.4.2 release
Hardware: Other Linux (All)
: medium normal
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard: BSA
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-26 17:48 UTC by hyper_ch
Modified: 2014-10-29 05:35 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description hyper_ch 2014-08-26 17:48:37 UTC
Problem description: 

LO seems to have a lack of randomness for glyphs when exporting to PDF.

I did create a few shell scripts that make PDF management simpler. In one of those scripts, stamp documents with numbers:

I can select a bunch of PDFs in Dolphin. Then I get prompted to enter a starting number. Then the shell script goes through each pdf, it splits the PDF up into single pages. Then it uses a .odt file and replaces in there placeholders for document number an page number. The page number of course increases on each page, while the document number increases from PDF to PDF.

That .odt gets then converted to a PDF page and is then put over each single page, creating a new pdf. In the end, all pages for one document get combined together again.

This is repeated through all documents and in the end, they are then also merged again.

However in the resulting PDF I noticed, that the numbering for the documents is sometimes wrong. With each font that I tried always at the same page. E.g. when I use Arial, then document 14 starts showing up as document 10 in the combined PDF, while as single PDF it's all fine.

I asked in the #ghostscript channel for help for what the problem could be. Chrisl explained it to me like this:


So, this is a regular problem that comes up with the PDFs exported from Libre/OpenOffice. When you use a font to display, for example, a string like "Page 33", it would be wasteful to embed the entire Arial font in the PDF just for that, so applications "subset" fonts, so the only include the descriptions for those seven glyphs. Space is a glyph, defined for each font. The exact way that the subsets are stored varies, so I'll go with a trivial example.....

Take the first glyph "P" - the letter P is ascii code 80. So it would also be wasteful to include 79 glyph "slots" just to get to index 80 for P. Applications will therefore use a custom encoding so that (massively simplifying things!) the glyph "P" will actually be in glyph slot 1 (slot zero is special). The "a" will be slot 2 etc.....

So, in the next document, the first thing printed in Arial is actually "This page is intentionally left blank" - the application would put the glyph "T" in index 0. So, when you send the two PDFs to Ghostcript's pdfwrite device, it gets the Arial font, with the "P" in index 0 for the first PDF, then it gets Arial for the second document - but it already has an instance of Arial defined, so uses that. It gets a reference to index 0, which is already occupied, and thus does not need populated.

[...]

Bear with me, I'm getting to why it's a LibreOffice problem :-)

Now, Adobe document a mechanism to prevent that kind of clash, which is  that such font subsets should have a unique, random six letter prefix added to the name. The problem is that LibreOffice always use the same seed to create these prefixes for each document. So, for example, the prefixes are always of the pattern AAAAAA+Arial, AAAAAB+Arial..... so they are unique within the *current* document but not sufficiently unique to give protection from this kind of clash.



So, from what I gather LO has too little randomness when creating PDFs and that can then lead to collissions.

Because of, Chrisl suggested, that I first convert to .gs and then to PDF. Ever since implementing this change ( https://github.com/sjau/pdfForts/commit/5dc4c86d741abe9fff21ff37c7539c0d749eadac#diff-02b2731ca50711cd69dc2155179b6710 ) it now works as it should.

So I assume chrisl is right and that's an LO PDF export problem.

Operating System: Ubuntu
Version: 4.2.4.2 release
Comment 1 tommy27 2014-10-04 13:01:53 UTC
try 4.2.6.2 or 4.3.2.2 and tell if issue persists
Comment 2 hyper_ch 2014-10-04 13:19:30 UTC
(In reply to tommy27 from comment #1)
> try 4.2.6.2 or 4.3.2.2 and tell if issue persists

Still same behaviour on

Version: 4.2.6.3
Build ID: 420m0(Build:3)

when removing that gs patch. Randomness still lacks.
Comment 3 Robinson Tryon (qubit) 2014-10-26 21:28:57 UTC
(In reply to hyper_ch from comment #0)
> I did create a few shell scripts that make PDF management simpler. In one of
> those scripts, stamp documents with numbers:
> ... 
> However in the resulting PDF I noticed, that the numbering for the documents
> is sometimes wrong. With each font that I tried always at the same page.
> E.g. when I use Arial, then document 14 starts showing up as document 10 in
> the combined PDF, while as single PDF it's all fine.

Hiya,
Are all of the shell scripts you use up in your github repo?

Please list the steps to reproduce the problem here in a comment; that will make it easier for us to confirm the bug.

(Please change the status back to UNCONFIRMED when you're done)

Thanks,
--R
Comment 4 hyper_ch 2014-10-26 21:32:50 UTC
In the github repo there's a lot of different scripts to manipulate PDFs. Just use the stampPDF one.

In Dolphin select a bunch of PDFs, then right-click to get context menu, select there Actions -> Stamp documents and created PDF

That's all there is there.
Comment 5 AaronPeterson 2014-10-26 23:45:42 UTC
I'm new to the project,  Spending a bit of time looking into this. Intriguing.  It appears that we should use the glyph numbers that are actually used from the font to generate the hash for the internal font identifier...

I know that solidworks has major issues when opening documents with the same name...

If there was a continual counter that never reset, there could be collisions with documents made in different installations...


The glyph usage would give us a beneficial collisions.

It sounds like you(the submitter) are more capable than me at looking up desired behavior, which might include copying behavior from GhostScript?
Comment 6 AaronPeterson 2014-10-29 05:07:58 UTC
Have you tried setting some of the options for the PDF? There is an option for tagged.  (I still haven't found the source code related to the PDF creation)

I have every reason to believe that this bug is real, but that it is a very low priority, since you have found a workaround, and it only shows up when interacting with other programs that other people are not likely to do..  Although as a document wrangler myself, fixing it in Libre makes the most sense.
Comment 7 hyper_ch 2014-10-29 05:35:08 UTC
Have you actually tried it out so that it could be set to confirmed?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.