Bug 6500 - page display failure for some JBIG2 PDFs
page display failure for some JBIG2 PDFs
Status: RESOLVED FIXED
Product: poppler
Classification: Unclassified
Component: general
unspecified
x86 (IA32) Linux (All)
: high normal
Assigned To: poppler-bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-04-05 08:47 UTC by paul walmsley
Modified: 2006-04-05 11:20 UTC (History)
0 users

See Also:


Attachments
A PDF demonstrating the bug (2.03 MB, application/pdf)
2006-04-05 08:50 UTC, paul walmsley
Details
patch for this bug (1.81 KB, patch)
2006-04-05 08:51 UTC, paul walmsley
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description paul walmsley 2006-04-05 08:47:10 UTC
Some PDF generation software packages produce PDFs with JBIG2 pages that fail to
render with poppler, xpdf, and the Apple PDF Previewer; but which display
correctly with Adobe Acrobat Reader.

The pages which fail cause poppler to generate these messages:

Error (....): Unknown segment type in JBIG2 stream
Error (....): Unexpected EOF in JBIG2 stream

It turns out that some of the JBIG2 images embedded in the PDF have symbol
dictionary segments (segment type 0) with an extraneous NULL byte at the end of
the segment.  This extra byte is not consumed by the symbol dictionary segment
handler, and it prevents poppler from reading the next segment header correctly. 

The PDF generator seems to create these types of segments when it's compressing
large amounts of whitespace.  It generates an arithmetic-coded symbol dictionary
segment with SDNUMNEWSYMS set to 0, and then stores an extra NULL after the end
of the arithmetic coder data.  Note that the segment's length is "correct" -- it
includes the NULL byte -- but poppler, quite reasonably, expects the segment to
end immediately after the arithmetic-coded data is exhausted.  An example of a
PDF with this problem is attached to this bug.  

The attached patch works around this problem for all segment types by reading
through any remaining bytes left in the segment after the handler returns to
JBIG2Stream::readSegments().  It will also warn the user if a segment handler
read more bytes than the segment length.

The same issue also exists in the xpdf 3.01 code base, and a similar patch is
being forwarded to its author.

This patch was developed collaboratively with Raj Kumar of the Internet Archive
<rkumar@archive.org>.


- Paul
Comment 1 paul walmsley 2006-04-05 08:50:20 UTC
Created attachment 5199 [details]
A PDF demonstrating the bug

also available via 

   http://ia311040.us.archive.org/~rkumar/test1_opt.pdf
Comment 2 paul walmsley 2006-04-05 08:51:19 UTC
Created attachment 5200 [details] [review]
patch for this bug
Comment 3 Albert Astals Cid 2006-04-06 04:20:03 UTC
Thanks for the patch :-)

It went into the CVS