Summary: | Eats lots of memory with buggy CCITTFaxDecode image | ||
---|---|---|---|
Product: | poppler | Reporter: | P. Henrique SIlva <ph.silva> |
Component: | cairo backend | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | Other | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | pdf which eats lots of memory |
Created attachment 13201 [details]
pdf which eats lots of memory
which poppler version are you using? i can't get such behaviour here with poppler 0.6.3 CCITTFax is for bitmaps, not graymaps. (The dict even says that; cf the BitsPerComponent value.) 5120x6600 is about right for a 600dpi scan; 8½″ × 11″ is 5100×6600. I got such behaviour with Ubuntu Gutsy Evince (poppler 0.6.x?) but also with Evince/poppler HEAD. It should be noted that this is not a leak. On this one page PDF, maybe you didn't note that 120MB was used to render the page, but with the orginal one (26 pages) and using Evince default behaviour of rendering (render a page plus next/previous 2 pages) you will certainlly note a high memory pressure on the system as more than almost half giga is needed. Testing the patch in bug 56858 with pdftocairo and this PDF reduces the peak memory usage from 147MB to 21MB. (In reply to comment #5) > Testing the patch in bug 56858 with pdftocairo and this PDF reduces the peak > memory usage from 147MB to 21MB. The patch has been pushed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
I'm trying to render the attached PDF (Evince using poppler cairo backend) and poppler eats lots and lots of memory (around 100MB per page). But, I don't know if Evince/poppler is wrong at all. This PDF use images in CCITTFaxDecode format. But, the dimensions of the image are huge: /Type /XObject /Subtype /Image /Name /Im1 /Filter [ /CCITTFaxDecode ] /Width 5120 /Height 6600 /BitsPerComponent 1 /ColorSpace /DeviceGray /Length 5 0 R /DecodeParms [ << /K -1 /Columns 5120 /Rows 6600 /EndOfBlock false /BlackIs1 false >> ] >> stream ... endstream endobj 5 0 obj 207775 endobj Anyway, It looks like the dimensions was guessed by the producer 'cause the stream doesn't have enough samples, 207775 bytes * 8 bits/byte = 1662200 samples (grayscale colorspace), roughly a 1290x1290px image, not that huge as the guessed dict says, Is this calculation right? I'm really guessing how CCITT works. Note that stream filter contains "/EndOfBlock false" which may confirms that the producer guessed the dimensions of the image. PDF Reference about "EndOfBlock": "A flag indicating whether the filter expects the encoded data to be terminated by an end-of-block pattern, overriding the Rows parameter. **If false, the filter stops when it has decoded the number of lines indicated by Rows or when its data has been exhausted**, whichever occurs first."[...][emphasis added] If the above calculation was right, I think that its just a matter of check if the dimensions of the image agrees with the samples and not allocate a huge buffer on CairoOutputDev.cc:1526 (SVN HEAD). Any hint about how to implement this (assuming that my guesses was right)? [This kind of PDF seems to be very common on Astronomy community (were I include myself), as many older (and not so, as this is from 1994) PDF comes with no OCR and only scanned images like this one]