13729 – Eats lots of memory with buggy CCITTFaxDecode image

Bug 13729 - Eats lots of memory with buggy CCITTFaxDecode image

Summary: Eats lots of memory with buggy CCITTFaxDecode image

Status:	RESOLVED FIXED

Alias:	None

Product:	poppler
Classification:	Unclassified
Component:	cairo backend (show other bugs)
Version:	unspecified
Hardware:	Other Linux (All)

Importance:	medium normal
Assignee:	poppler-bugs
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2007-12-18 19:24 UTC by P. Henrique SIlva
Modified:	2012-11-22 08:52 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
pdf which eats lots of memory (204.60 KB, application/pdf) 2007-12-18 19:25 UTC, P. Henrique SIlva	Details
View All

Description P. Henrique SIlva 2007-12-18 19:24:34 UTC

I'm trying to render the attached PDF (Evince using poppler cairo backend) and poppler eats lots and lots of memory (around 100MB per page).

But, I don't know if Evince/poppler is wrong at all. This PDF use images in CCITTFaxDecode format. But, the dimensions of the image are huge: 

/Type /XObject
/Subtype /Image
/Name /Im1
/Filter [ /CCITTFaxDecode ]
/Width 5120 /Height 6600 /BitsPerComponent 1
/ColorSpace /DeviceGray
/Length 5 0 R
/DecodeParms [ << /K -1 /Columns 5120 /Rows 6600 /EndOfBlock false /BlackIs1 false >> ]
>>
stream
...
endstream
endobj

5 0 obj
207775
endobj

Anyway, It looks like the dimensions was guessed by the producer 'cause the stream doesn't have enough samples, 207775 bytes * 8 bits/byte = 1662200 samples (grayscale colorspace), roughly a 1290x1290px image, not that huge as the guessed dict says, Is this calculation right? I'm really guessing how CCITT works.

Note that stream filter contains "/EndOfBlock false" which may confirms that the producer guessed the dimensions of the image.

PDF Reference about "EndOfBlock":

"A flag indicating whether the filter expects the encoded data to be           terminated by an end-of-block pattern, overriding the Rows parameter. **If false, the filter stops when it has decoded the number of lines indicated by Rows or when its data has been exhausted**, whichever occurs first."[...][emphasis added]

If the above calculation was right, I think that its just a matter of check if the dimensions of the image agrees with the samples and not allocate a huge buffer on CairoOutputDev.cc:1526 (SVN HEAD).

Any hint about how to implement this (assuming that my guesses was right)?

[This kind of PDF seems to be very common on Astronomy community (were I include myself), as many older (and not so, as this is from 1994) PDF comes with no OCR and only scanned images like this one]

Comment 1 P. Henrique SIlva 2007-12-18 19:25:26 UTC

Created attachment 13201 [details]
pdf which eats lots of memory

Comment 2 Albert Astals Cid 2007-12-19 12:21:11 UTC

which poppler version are you using? i can't get such behaviour here with poppler 0.6.3

Comment 3 James Cloos 2007-12-19 12:57:20 UTC

CCITTFax is for bitmaps, not graymaps.  (The dict even says that; cf the BitsPerComponent value.)

5120x6600 is about right for a 600dpi scan; 8½″ × 11″ is 5100×6600.

Comment 4 P. Henrique SIlva 2007-12-19 13:16:56 UTC

I got such behaviour with Ubuntu Gutsy Evince (poppler 0.6.x?) but also with Evince/poppler HEAD.

It should be noted that this is not a leak. On this one page PDF, maybe you didn't note that 120MB was used to render the page, but with the orginal one (26 pages) and using Evince default behaviour of rendering (render a page plus next/previous 2 pages) you will certainlly note a high memory pressure on the system as more than almost half giga is needed.

Comment 5 Adrian Johnson 2012-11-12 10:52:56 UTC

Testing the patch in bug 56858 with pdftocairo and this PDF reduces the peak memory usage from 147MB to 21MB.

Comment 6 Adrian Johnson 2012-11-22 08:52:42 UTC

(In reply to comment #5)
> Testing the patch in bug 56858 with pdftocairo and this PDF reduces the peak
> memory usage from 147MB to 21MB.

The patch has been pushed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.