Bug 96529

Summary: PDFDoc::saveIncrementalUpdate() saves the trailer dict twice
Product: poppler Reporter: Jakub Alba <jakubalba>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED NOTABUG QA Contact:
Severity: normal    
Priority: medium CC: jakubalba
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: reproduce.cpp
in.pdf
suggestion

Description Jakub Alba 2016-06-14 19:35:01 UTC
Created attachment 124534 [details]
reproduce.cpp

For some reason PDFDoc::saveIncrementalUpdate() saves the trailer dict twice. If you comment out the lines from "Dict *trailerDict = [...]" to "delete trailerDict;", then it saves the trailer dict once. It appears that it leaves the *old* one (because tail output is without Unicode-caused asterisks).

Steps to reproduce:
1. Compile the attached cpp file. (It requires giving compiler the path to poppler core headers with the -I flag.)
2. Run the compiled program in a directory with an example PDF file called "in.pdf". (I will provide one in the next attachment.)
3. Run "tail -n 20" on the output pdfs. From that it's clear that the problem is somewhere in PDFDoc::saveIncrementalUpdate().
Comment 1 Jakub Alba 2016-06-14 19:35:45 UTC
Created attachment 124535 [details]
in.pdf
Comment 2 Jakub Alba 2016-06-14 22:23:43 UTC
Created attachment 124538 [details] [review]
suggestion

So this is what I came up with so far. I'm not really satisfied with it, because it still leaves the following just before the trailer dict:
"xref
0 1
0000000000 65535 f"

So this patch is more of a suggestion for others who would like to work on this bug than a complete solution.
Comment 3 Jakub Alba 2016-06-16 21:31:01 UTC
My current understanding of PDFDoc::saveIncrementalUpdate() is that it prints the original file in whole first. Then it appends the new versions of updated objects, writes another xref table and another trailer dict. I don't think this is permitted by the PDF reference. In the PDF reference v7 there is a figure (3.2) at page 91 showing that a PDF file is divided into 1. header, 2. body, 3. cross-reference table, 4. trailer. So I don't think it's ok to print anything after the trailer. Correct me if I'm wrong.

If I'm right, I may write another function like PDFDoc::getEndXRefTablePos() (added by my suggested patch) which would find the end of PDF "body". PDFDoc::saveIncrementalUpdate() would then append all the updated objects and every time (no matter if there are any updated objects) would append xref table and trailer dict (not copy it from the original file).

Feel free to correct me about any possible misunderstandings.
Comment 4 Jakub Alba 2016-06-16 22:04:55 UTC
A bit more reading of PDF reference and:
"In an incremental update, any new or changed objects are appended to the file, a
cross-reference section is added, and a new trailer is inserted."
So this seems ok. Sorry for the trouble. But to make it a bit clearer: why is the "0000000000 65535 f" (always) added (line 942 in poppler/PDFDoc.cc)?
Comment 5 Thomas Freitag 2016-06-17 08:45:15 UTC
(In reply to Jakub Kucharski from comment #4)
> A bit more reading of PDF reference and:
> "In an incremental update, any new or changed objects are appended to the
> file, a
> cross-reference section is added, and a new trailer is inserted."
> So this seems ok. Sorry for the trouble. But to make it a bit clearer: why
> is the "0000000000 65535 f" (always) added (line 942 in poppler/PDFDoc.cc)?

You should read the spec carefully, then You will find the following sentence:

The first entry in the table (object number 0) shall always be free and shall have a generation number of 65,535;

And if You look at the code in PDFDoc.cc You will encounter that it is just done and then always if a new XRef() object is created to ensure this.

And it seems as if You haven't understand the PDFWriteMode parameter of saveAs. If it is writeForceRewrite it will completely rewrite the complete PDF and You will find just one trailerDict and xref section, otherwise it uses an incremental update and just appends modified/new objects, of course the new trailer dict and the xref section!
Comment 6 Jakub Alba 2016-06-17 09:02:22 UTC
(In reply to Thomas Freitag from comment #5)
> (In reply to Jakub Kucharski from comment #4)
> > A bit more reading of PDF reference and:
> > "In an incremental update, any new or changed objects are appended to the
> > file, a
> > cross-reference section is added, and a new trailer is inserted."
> > So this seems ok. Sorry for the trouble. But to make it a bit clearer: why
> > is the "0000000000 65535 f" (always) added (line 942 in poppler/PDFDoc.cc)?
> 
> You should read the spec carefully, then You will find the following
> sentence:
> 
> The first entry in the table (object number 0) shall always be free and
> shall have a generation number of 65,535;
> 
> And if You look at the code in PDFDoc.cc You will encounter that it is just
> done and then always if a new XRef() object is created to ensure this.

In this case, shouldn't line 967 in poppler/PDFDoc.cc be "if (uxref->getNumObjects() == 1) {" instead of "if (uxref->getNumObjects() == 0) {"? It seems to me that at this point uxref->getNumObjects() will never be 0 given the prior adding (line 942) of the first entry with generation number 65,535.

> And it seems as if You haven't understand the PDFWriteMode parameter of
> saveAs. If it is writeForceRewrite it will completely rewrite the complete
> PDF and You will find just one trailerDict and xref section, otherwise it
> uses an incremental update and just appends modified/new objects, of course
> the new trailer dict and the xref section!

That I've already understood.

Thanks for the help. And sorry for the trouble.
Comment 7 Thomas Freitag 2016-06-17 09:47:15 UTC
(In reply to Jakub Kucharski from comment #6)
> 
> In this case, shouldn't line 967 in poppler/PDFDoc.cc be "if
> (uxref->getNumObjects() == 1) {" instead of "if (uxref->getNumObjects() ==
> 0) {"? It seems to me that at this point uxref->getNumObjects() will never
> be 0 given the prior adding (line 942) of the first entry with generation
> number 65,535.
> 

You are probably true. I haven't used this function if I haven't changed anything, and I'm not the author of this part of code ;-)
Comment 8 Jakub Alba 2016-06-17 11:46:03 UTC
(In reply to Thomas Freitag from comment #7)
> You are probably true. I haven't used this function if I haven't changed
> anything, and I'm not the author of this part of code ;-)

I'm not accusing you of anything. ;-)
Reported as Bug 96561.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.