Bug 21590 - Unchecked code space ranges cause excessive memory allocations
Summary: Unchecked code space ranges cause excessive memory allocations
Status: RESOLVED WONTFIX
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-06 03:05 UTC by Nick Jones
Modified: 2011-06-19 15:19 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
limit code space range numbers to four hex digits (746 bytes, patch)
2009-05-06 03:07 UTC, Nick Jones
Details | Splinter Review

Description Nick Jones 2009-05-06 03:05:28 UTC
In a corrupted or improperly generated pdf document, code space range boundary values can contain too many digits and thus describe ranges that require excessive numbers of CMapVectorEntries arrays to simply describe them.

Seeing some other implementations of pdf readers limit the mapping hierarchy and lookup functions to two byte (four hex digit) representations of range boundaries, and in much documentation of pdf internals only mention one and two byte ranges, I felt it would make sense to check that the length of the boundary values are less than or equal to four hex digits.

hexidecimal numbers from the pdf document are consumed using sscanf and stored in unsigned ints. Two hex numbers of six digits have the potential to cause huge allocations, and two hex numbers of eight digits will usually OOM the process. (but see the first note below)

The attached patch contains a simple additional validation and common sense check.



Note 1:
The PDF spec states that the numbers used to represent code space range boundaries must be representable by an Integer, defined in Appendix C of the same specification.  The range of this Integer type is -2^32 -> 2^31 - 1.  Assuming that negative values are not allowed, valid values for the boundary should be: <00000000> -> <7fffffff>

This represents a lookup hierarchy of four levels, which is valid according to the pdf specification, and could cause poppler to allocate huge amounts of memory.


Note 2:
While playing around with code in CMap.cc, I found the addCodeSpace function a little unclear.  Also, using sscanf to parse hex numbers had the potential to overflow the Guint type if the hex number had more than eight digits.  I came up with a revised version of this function in case you are interested, something like:
----
void addCodeSpace2(CMapVectorEntry *vec, char* tok1, char* tok2)
  {
  if (strlen(tok1) > 2)
    {
    unsigned int start, end = 0;

    char startByte[] = {tok1[0], tok1[1], '\0'};
    char endByte[] = {tok2[0], tok2[1], '\0'};

    sscanf(startByte, "%x", &start);
    sscanf(endByte, "%x", &end);

    for (i = (start <= end) ? start : end;
         i <= (start > end) ? start : end; ++i) {
      {
      if (!vec[i].isVector)
        {
        vec[i].isVector = true;
        vec[i].vector = new CMapVectorEntry[256];
        }
      addCodeSpace2(vec[i].vector,  tok1 + 2, tok2 + 2);
      }
    }
  }
----
    tok1[n1 - 1] = tok2[n1 - 1] = '\0';
    
    addCodeSpace2(cmap->vector, tok1 + 1, tok2 + 1);
----


note3:
The corrupted pdf document had a codespacerange definition that looked like:
<81308130> <FE39FE39>
The repition of the digits smacks of a bug in the pdf generator software.  The document seemed to be a scan of a printed document with a handwritten signature, output as pdf. The metadata in the document simply mentioned: Canon
Comment 1 Nick Jones 2009-05-06 03:07:20 UTC
Created attachment 25541 [details] [review]
limit code space range numbers to four hex digits
Comment 2 Albert Astals Cid 2009-09-02 12:10:54 UTC
Hi, can we have the pdf that causes the problems?

Also if the spec says that up to 8 hex digits, why shouldn't we follow that?
Comment 3 Albert Astals Cid 2011-06-19 15:19:25 UTC
2 years without an answer to my question. Closing the bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.