Bug 103174 - LZW decompression can fail
Summary: LZW decompression can fail
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-09 15:02 UTC by k.dohmann
Modified: 2017-10-23 21:32 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Test PDF (46.92 KB, application/pdf)
2017-10-17 09:48 UTC, k.dohmann
Details
patch on LZW decompression fixing this issue (1012 bytes, patch)
2017-10-19 09:18 UTC, k.dohmann
Details | Splinter Review
rebased patch (949 bytes, patch)
2017-10-21 10:07 UTC, Adrian Johnson
Details | Splinter Review

Description k.dohmann 2017-10-09 15:02:05 UTC
The LZW decompression in Stream.cc / Stream.h uses a dictionary table with the fixed size of 4096 entries. But the compression is not per definition limited to this size. I have seen PDF files with tables up to a size of 32k entries.
The resulting decompression failure of course results in rendering issues.
Comment 1 Adrian Johnson 2017-10-10 08:30:38 UTC
> But the compression is not per definition limited to this size.

Actually, it is. From the PDF spec:

"Codes shall never be longer than 12 bits; therefore, entry 4095 is the last entry of the LZW table."

> I have seen PDF files with tables up to a size of 32k entries.

I'm surprised if this is the case. We output a specific error for this case and no one has previously reported it. We handle up to 4096 for the benefit of PDF generators that can't count.

Changing the size of the symbol table is not as simple as just changing the array size. The code needs to modified to increase the symbol bit size as the number of symbols increases. But the worst part is we can no longer copy the uncompressed LZW stream when outputting to a PDF or PS file. The stream needs to be uncompressed and recompressed to ensure our output conforms to the spec. That will have a performance impact on every file just to fix the very rare cases where the PDF is broken.

I'll leave it up to Albert to decide if he wants to change this.

In any case, we will need PDFs to test with. Preferably with a range of different symbol table sizes that covers the powers of two.
Comment 2 Albert Astals Cid 2017-10-10 19:59:42 UTC
Yes, we're going to need the file to be able to say something first.
Comment 3 k.dohmann 2017-10-17 09:48:00 UTC
Created attachment 134880 [details]
Test PDF
Comment 4 k.dohmann 2017-10-17 09:50:35 UTC
I added a test PDF. Acrobat and PDF.js show the connecting lines and the in the bottom right. XPDF and poppler based tools do not show the PDF correctly.
Comment 5 Adrian Johnson 2017-10-18 08:50:53 UTC
I tried increasing the dictionary size (included the code for increasing the bit size when codes cross powers of 2) but it still fails.
Comment 6 k.dohmann 2017-10-19 09:10:34 UTC
Ok, I investigated this a bit more. When I changed the table size, I did not also increase the bit size. I saw now, that all the new table entries are written, but never read (which is because the bit size staying at 12 bits, 2^12=4096 entries). I will post a patch which solves this for me.
Comment 7 k.dohmann 2017-10-19 09:18:54 UTC
Created attachment 134916 [details] [review]
patch on LZW decompression fixing this issue

Patch which does the following:
- do not clear table on nextCode bigger than table size
- do not overflow the table
Comment 8 Adrian Johnson 2017-10-21 10:07:38 UTC
Created attachment 134970 [details] [review]
rebased patch

For some reason the patch would not apply. I had to edit it in manually. I'm attaching your patch rebased to git master.

I tried your patch on the test case and a couple of other files containing LZW streams and it seems to work fine.

Albert, could you run this through the regtest.
Comment 9 Albert Astals Cid 2017-10-22 17:01:50 UTC
regtest seems to show nothing regressed.

k.dohmann@gmx.net could we get your name for proper copyright attribution?
Comment 10 k.dohmann 2017-10-23 09:10:07 UTC
Yes, my full name is Kay Dohmann.
Comment 11 Albert Astals Cid 2017-10-23 21:32:01 UTC
Pushed


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.