Some PDF generation software packages produce PDFs with JBIG2 pages that fail to
render with poppler, xpdf, and the Apple PDF Previewer; but which display
correctly with Adobe Acrobat Reader.
The pages which fail cause poppler to generate these messages:
Error (....): Unknown segment type in JBIG2 stream
Error (....): Unexpected EOF in JBIG2 stream
It turns out that some of the JBIG2 images embedded in the PDF have symbol
dictionary segments (segment type 0) with an extraneous NULL byte at the end of
the segment. This extra byte is not consumed by the symbol dictionary segment
handler, and it prevents poppler from reading the next segment header correctly.
The PDF generator seems to create these types of segments when it's compressing
large amounts of whitespace. It generates an arithmetic-coded symbol dictionary
segment with SDNUMNEWSYMS set to 0, and then stores an extra NULL after the end
of the arithmetic coder data. Note that the segment's length is "correct" -- it
includes the NULL byte -- but poppler, quite reasonably, expects the segment to
end immediately after the arithmetic-coded data is exhausted. An example of a
PDF with this problem is attached to this bug.
The attached patch works around this problem for all segment types by reading
through any remaining bytes left in the segment after the handler returns to
JBIG2Stream::readSegments(). It will also warn the user if a segment handler
read more bytes than the segment length.
The same issue also exists in the xpdf 3.01 code base, and a similar patch is
being forwarded to its author.
This patch was developed collaboratively with Raj Kumar of the Internet Archive
Created attachment 5199 [details]
A PDF demonstrating the bug
also available via
Created attachment 5200 [details] [review]
patch for this bug
Thanks for the patch :-)
It went into the CVS