The LZW decompression in Stream.cc / Stream.h uses a dictionary table with the fixed size of 4096 entries. But the compression is not per definition limited to this size. I have seen PDF files with tables up to a size of 32k entries. The resulting decompression failure of course results in rendering issues.
> But the compression is not per definition limited to this size. Actually, it is. From the PDF spec: "Codes shall never be longer than 12 bits; therefore, entry 4095 is the last entry of the LZW table." > I have seen PDF files with tables up to a size of 32k entries. I'm surprised if this is the case. We output a specific error for this case and no one has previously reported it. We handle up to 4096 for the benefit of PDF generators that can't count. Changing the size of the symbol table is not as simple as just changing the array size. The code needs to modified to increase the symbol bit size as the number of symbols increases. But the worst part is we can no longer copy the uncompressed LZW stream when outputting to a PDF or PS file. The stream needs to be uncompressed and recompressed to ensure our output conforms to the spec. That will have a performance impact on every file just to fix the very rare cases where the PDF is broken. I'll leave it up to Albert to decide if he wants to change this. In any case, we will need PDFs to test with. Preferably with a range of different symbol table sizes that covers the powers of two.
Yes, we're going to need the file to be able to say something first.
Created attachment 134880 [details] Test PDF
I added a test PDF. Acrobat and PDF.js show the connecting lines and the in the bottom right. XPDF and poppler based tools do not show the PDF correctly.
I tried increasing the dictionary size (included the code for increasing the bit size when codes cross powers of 2) but it still fails.
Ok, I investigated this a bit more. When I changed the table size, I did not also increase the bit size. I saw now, that all the new table entries are written, but never read (which is because the bit size staying at 12 bits, 2^12=4096 entries). I will post a patch which solves this for me.
Created attachment 134916 [details] [review] patch on LZW decompression fixing this issue Patch which does the following: - do not clear table on nextCode bigger than table size - do not overflow the table
Created attachment 134970 [details] [review] rebased patch For some reason the patch would not apply. I had to edit it in manually. I'm attaching your patch rebased to git master. I tried your patch on the test case and a couple of other files containing LZW streams and it seems to work fine. Albert, could you run this through the regtest.
regtest seems to show nothing regressed. k.dohmann@gmx.net could we get your name for proper copyright attribution?
Yes, my full name is Kay Dohmann.
Pushed
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.