Bug 94941

Summary: Corrupted linearization hint table causes massive memory usage and several minute delay
Product: poppler Reporter: jmmorlan
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Example broken PDF file
Hints.cc patch

Description jmmorlan 2016-04-14 23:31:23 UTC
I recently encountered some PDF files that cause all poppler utilities (pdfinfo, pdftotext, evince) to allocate a large amount of memory (usually 3GB) and hang for several minutes. Acrobat Reader does not exhibit either problem.

The cause is corrupted linearization hint tables - the program that wrote the .pdfs did not properly align the start of the shared objects hint table on a byte boundary. So its header looks like:

firstSharedObjectNumber	00 00 00 00
firstSharedObjectOffset	00 00 00 00
nSharedGroupsFirst	00 00 00 01
nSharedGroups		10 00 00 01
nBitsNumObjects		10 00
groupLengthLeast	00 00 00 02
nBitsDiffGroupLength	80 01

Hints::readSharedObjectsTable allocates several giant arrays, and then spends ages trying to populate them (without checking that it's reached the end of the stream).

Since nBits* can't be more than 32, this hint table should just be rejected as invalid immediately.
Comment 1 jmmorlan 2016-04-15 17:55:52 UTC
The PDFs were produced by "Aspose.Pdf for .NET 8.9.0", a library which is apparently quite widely used.
Comment 2 jmmorlan 2016-04-15 21:49:36 UTC
Created attachment 122979 [details]
Example broken PDF file
Comment 3 Albert Astals Cid 2016-04-19 23:08:13 UTC
you seem to know what you're talking about, maybe you can produce a patch?
Comment 4 jmmorlan 2016-08-02 20:08:56 UTC
Created attachment 125492 [details] [review]
Hints.cc patch

I really don't know much about linearization, but here's a patch to try to fix a couple of problems that stand out:

1. If nBitsNumObjects or nBitsDiffGroupLength are greater than 32, bail out early
2. Improve readBits efficiency (replace recursion with iteration; fix EOF detection to work on any bit, not just those where n is equal to 1 modulo 32)
Comment 5 Albert Astals Cid 2016-11-29 23:32:15 UTC
Where in the spec does it say that those values have to be smaller than 33?
Comment 6 jmmorlan 2016-12-02 01:01:57 UTC
Right before the table of fields in the Page Offset Hint Table header, there's a note:
"All the items in Table F.3 that specify a number of bits needed, such as item 3, have values in the range 0
through 32. Although that range requires only 6 bits, 16-bit numbers shall be used."

It doesn't explicitly say this about the Shared Object Hint Table header (described in Table F.5), but there's no indication that it's different, nor can I think of any reason for it to be.
Comment 7 Albert Astals Cid 2016-12-07 21:39:53 UTC
Pushed the first part, the second part didn't apply (and was unrealted anyway).

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.