Bug 62985

Summary: endstream detection
Product: poppler Reporter: Thomas Freitag <Thomas.Freitag>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: endstream detection and scanSpecialFlags correction

Description Thomas Freitag 2013-04-01 09:10:47 UTC
Created attachment 77269 [details] [review]
endstream detection and scanSpecialFlags correction

1. During porting poppler to Java I made a mistake in the "<objnum> 0 obj<length>" pattern detection so that it fails. Therefore it ran into the endstream search, and at least with bug-poppler16579.pdf this doesn't work correctly: the shift(-1) with the used token mechanism in Lexer isn't correct for a binary data stream. If there is i.e. a "(" without corresponding ")" in the binary data, which of course can happen and happens in that pdf, shift(-1) skips the searched endstream and can therefore in worst case reach the end-of-file. Therefore I implemented a shift("endstream") in Java, which I now port back to C++, or in other words "There and Back Again" :-)

You can test it with bug-poppler16579.pdf if You just change temporary 

	  if (longNumber <= INT_MAX && longNumber >= INT_MIN && *end_ptr == '\0') {

in XRef.cc to

	  if (gFalse && longNumber <= INT_MAX && longNumber >= INT_MIN && *end_ptr == '\0') {

2. The small change in XRef.cc was another point I detected during the Java port: if You save a PDF with defect xref offsets, the 

readXRefUntil(-1 /* read all xref sections */, &xrefStreamObjNums)

in XRef::scanSpecialFlags() will destroy the already reconstructed entries table, but this means that any modification which the user did in the meantime get lost. This can be tested i.e. with bug168518.pdf.

The attached patch solves this two issues.
Comment 1 Albert Astals Cid 2013-04-04 22:45:19 UTC
Not sure i understand what to test for point 2, saving or opening?
Comment 2 Thomas Freitag 2013-04-05 06:53:58 UTC
(In reply to comment #1)
> Not sure i understand what to test for point 2, saving or opening?

Saving: Open the PDF in okular and modify it. All changes should be lost if You save it then (if okular doesn't save the PDF automatically after opening because it detects that the PDF is broken).
Comment 3 Albert Astals Cid 2013-04-06 18:44:33 UTC
So i see this as two separate patches, the first one that improves the shifting seems to me like an extra layer of security and would be enough in master while the second fixes saving of some files and should go to 0.22 too.

Agreed?
Comment 4 Thomas Freitag 2013-04-06 18:51:49 UTC
(In reply to comment #3)
> So i see this as two separate patches, the first one that improves the
> shifting seems to me like an extra layer of security and would be enough in
> master while the second fixes saving of some files and should go to 0.22 too.
> 
> Agreed?

Even if I don't think that shift(-1) delivers correct results in a binary stream and so it's not only an improvement, I agree to put it not into 0.22 because we don't have any bug reports which this would fix.
So it's okay for me.
Comment 5 Albert Astals Cid 2013-04-06 21:23:04 UTC
Pushed

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.