Bug 27450

Summary: fails to save PDF form data properly when PDF has object streams
Product: poppler Reporter: Carlos Garcia Campos <carlosgc>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: ejb, el.cameleon.1
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Carlos Garcia Campos 2010-04-04 06:26:10 UTC
Bug forwarded from Evince: https://bugzilla.gnome.org/show_bug.cgi?id=614740

"I haven't tried reproducing this in a newer version of evince (since my debian
unstable system has gnome 2.28), but the problem is easy to reproduce and test.

evince saves PDF form data by appending to the PDF, which is a perfectly valid
way to do it, but it makes a few mistakes in appending to the file when object
streams are used.  The effect is that the resulting file loads into evince
without the form data, and Adobe Reader can't open the file at all.  This bug
report uses qpdf (http://qpdf.sourceforge.net) to check and manipulate the PDF
file.  qpdf is available in Debian and Ubuntu or can be downloaded from
sourceforge.  It has only pcre and zlib as external dependencies.

The first attached pdf file (form1.pdf) can be downloaded from here:

http://www.soest.hawaii.edu/gg/isotope_biogeochem/Samplerequest.htm

This file contains no object streams even though it is a PDF 1.5 file.  Filling
in the form and saving it works fine.  The resulting file is appended, and the
/Size field of the trailer dictionary is set properly to 1 more than the
highest numbered object.  Everything is fine.  The file is attached as
form1-saved.pdf.

Now consider the same file with object streams.  You can get this with

qpdf --object-streams=generate form1.pdf form2.pdf

This time, there are several problems.  For one thing, the /Size field in the
new trailer dictionary is wrong: it is equal to the highest object number
instead of one above it.  If you run qpdf --check form2-saved.pdf, you get

WARNING: /home/ejb/Documents/form2-saved.pdf: reported number of objects (237)
inconsistent with actual number of objects (238)

When you open the file with evince, you get lots of errors about referencing
invalid or non-existent objects, and the file opens without the form data. 
This happens even if you manually edit the file to change /Size to 238.

The xref table is also pretty messed up.  The generation numbers look to be the
original object stream offset values from the original PDF.  In
form2-saved.pdf, observe lines like

0000053156 00064 n

in the xref table and corresponding objects like 66 64 obj.  If you manually
change all the generation numbers to 0 in both the xref table and in the PDF
file themselves, the file is now correct and the saved form data is now
accessible.

So whatever is generating the append data needs to be updated to support object
streams and understand the meanings of the fields in the xref stream,
apparently.

My manually repaired file is form2-fixed.pdf.

I will attach the five pdf files momentarily."

I confirm it's reproducible with current git master. Original bug report contains attachments to test cases.
Comment 1 Carlos Garcia Campos 2010-04-07 12:15:48 UTC
Fixed in git master. Thanks for reporting.
Comment 2 Vincent 2012-01-09 13:13:32 UTC
Hi,
I still see that problem on Ubuntu 11.10 (Document Viewer 3.2.1 and poppler/cairo 0.16.7).

Could you tell me if I should open a new bug or if the bugs is fixed in another release of poppler?
Comment 3 Vincent 2012-01-09 13:18:41 UTC
(In reply to comment #1)
> Fixed in git master. Thanks for reporting.

Just to make my previous comment clearer: which version of poppler has the fix?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.