Bug 56318 - Unable to read cross-reference streams with 8-byte offsets
Summary: Unable to read cross-reference streams with 8-byte offsets
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-23 10:01 UTC by Vincent
Modified: 2013-01-27 03:14 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Sample PDF document illustrating the problem (9.72 KB, text/plain)
2012-10-23 10:01 UTC, Vincent
Details
allow a limit of 8 for 'W' entries (347 bytes, patch)
2012-10-29 11:20 UTC, Thomas Freitag
Details | Splinter Review
Read 8 byte offsets (1.67 KB, patch)
2013-01-26 06:26 UTC, Adrian Johnson
Details | Splinter Review

Description Vincent 2012-10-23 10:01:30 UTC
Created attachment 68945 [details]
Sample PDF document illustrating the problem

Section 3.4.7 of the PDF 1.5 Reference doesn't specify any limit on the size of entries in a cross-reference stream. So the 'W' entry in a cross-reference stream dictionary should be allowed to have a value of e.g. [1 8 2].

When trying to display the attached document, evince throws the following error:
Error: PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table

Ghostscript is able, however, to convert the document to PostScript (or back to PDF, for that matter), and Adobe Reader displays it fine.

There are the following lines of code, starting from line 654 in  poppler/XRef.cc from Poppler 0.20.5 (the latest stable version downloaded from the Poppler website):
    if (w[i] < 0 || w[i] > 4) {
      goto err1;
    }
The w[i] > 4 seems unwarranted and there should be no upper limit (or, at least, a limit of 8).
Comment 1 Thomas Freitag 2012-10-27 09:55:48 UTC
I think, You're true, even if You refer to an outdated specification. But also the actual specification (ISO 32000, PDF version 1.7) doesn't say anything else in chapter 7.5.8.2.
Are You able to create a patch?
Comment 2 Thomas Freitag 2012-10-29 11:20:55 UTC
Created attachment 69219 [details] [review]
allow a limit of 8 for 'W' entries

Okay, seems as if You're not able or willing, so here a patch to limit the 'W' entries to 8.
Comment 3 Vincent 2012-10-29 15:47:47 UTC
(In reply to comment #2)
> Created attachment 69219 [details] [review] [review]
> allow a limit of 8 for 'W' entries
> 
> Okay, seems as if You're not able or willing, so here a patch to limit the
> 'W' entries to 8.

Thanks for your patch. I thought more work would be needed to store the offset in a 64bit integer, which would be way beyond my non-existent capabilities of C++ developer. Can a Guint store 8-byte integers?

Lacking of a C/C++ dev environment, I can't test this patch.

Vincent
Comment 4 Thomas Freitag 2012-10-31 10:26:51 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > Created attachment 69219 [details] [review] [review] [review]
> > allow a limit of 8 for 'W' entries
> > 
> > Okay, seems as if You're not able or willing, so here a patch to limit the
> > 'W' entries to 8.
> 
> Thanks for your patch. I thought more work would be needed to store the
> offset in a 64bit integer, which would be way beyond my non-existent
> capabilities of C++ developer. Can a Guint store 8-byte integers?

It depends on the machine: in a 32 bit environment Guint has only 32 bit, so the upper 32 bit will be moved out during reading the xref stream section. So in a 32 bit enviroment a PDF with size greater than 4 GByte wouldn't be properly read :-).
In a 64 bit environment Guint has the necessary 64 bit, but even there: has anyone a PDF with more than 4 GByte? Than we should implement the LARGEFILE_ACCESS :-)

Cheers,
Thomas

> 
> Lacking of a C/C++ dev environment, I can't test this patch.
> 
> Vincent
Comment 5 Albert Astals Cid 2012-11-18 22:21:00 UTC
Thomas are you sure Guint is 64 bits? That's not true on my my Linux 64 bits box, is that a windows thingie?

Anyway yes, we need support for big files, there's already one bug for that #44085
Comment 6 Thomas Freitag 2012-11-19 11:00:16 UTC
(In reply to comment #5)
> Thomas are you sure Guint is 64 bits? That's not true on my my Linux 64 bits
> box, is that a windows thingie?

Oh, oh, of course You're true: int has 32 bits in 32 bit AND 64 bis environments (with the compiler and architecture I know). I mixed that already up (gaving int the wrong size) when had to change 16-bit-libraries so that they could run in a 32-bit environment. Long long ago, but really a déjà vu.

So what to do with this PDF, where the high 32 bits are zero? Should we test for overflow with an error message "Large files not supported" but accept it when the high 32 bit are zero? Wait that somebody implements large file support? Or just commit the patch knowing that we run into problems with really large files?
Comment 7 Albert Astals Cid 2012-11-19 20:59:10 UTC
* test for overflow with an error message "Large files not supported" but accept it when the high 32 bit are zero?

This seems the easy short-term solution for at least being able to open this file, do you want to give it a try?
Comment 8 Adrian Johnson 2013-01-26 06:26:22 UTC
Created attachment 73668 [details] [review]
Read 8 byte offsets

This patch fixes the problem by reading 8 byte offsets and printing an error if the offset is too large for Goffset.
Comment 9 Albert Astals Cid 2013-01-26 12:26:22 UTC
Makes sense, i hope this situation is not too common though. Do you know which systems wouldn't have a 64 bits goffset?
Comment 10 Adrian Johnson 2013-01-26 12:45:18 UTC
(In reply to comment #9)
> Makes sense, i hope this situation is not too common though. Do you know
> which systems wouldn't have a 64 bits goffset?

I expect all 64-bit systems would support large files. On 32-bit systems configure checks for large file support unless --disable-largefile is specified.
 
I don't know which systems don't support large files. There may be 32-bit systems 
(eg embedded) that are unlikely to ever see large pdfs where it makes sense to disable large file support to save a few bytes of memory and cycles of cpu time.
Comment 11 Albert Astals Cid 2013-01-26 12:50:45 UTC
Ok, as said makes sense, so feel free to commit (i guess the important part of this bug was actually fixed by your large file support)
Comment 12 Adrian Johnson 2013-01-27 03:14:32 UTC
Pushed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.