Bug 85140 - [pdftocairo] Memory leak, memory fills in seconds
Summary: [pdftocairo] Memory leak, memory fills in seconds
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-17 13:30 UTC by MH
Modified: 2015-09-20 18:23 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
memoryleak.pdf (340 bytes, application/pdf)
2014-10-17 13:30 UTC, MH
Details

Description MH 2014-10-17 13:30:34 UTC
Created attachment 107989 [details]
memoryleak.pdf

OS: Fedora 20 (running in virtualbox)
Dependencies installed with: yum-builddep poppler
Version: GIT Master
Command line: master/pdftocairo -svg <attached.pdf> /dev/null

Also repeats in pdftotext and pdfimages (did not check other utils)

###########################################################################
GDB output:

Reading symbols from /home/foobar/poppler/utils/.libs/lt-pdftocairo...done.
Starting program: /home/foobar/poppler/utils/.libs/lt-pdftocairo -svg fullmemory-13-pdftocairofuzz-6.pdf /dev/null
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Syntax Warning: No valid XRef size in trailer
^C
Program received signal SIGINT, Interrupt.
Catalog::cachePageTree (this=this@entry=0x6525b0, page=page@entry=1) at Catalog.cc:312
312           pageRefs[i].num = -1;
(gdb) print pagesSize
$1 = 213804087
Comment 1 MH 2014-10-17 13:30:55 UTC
Demonstration of rising memory use (interrupted after this):

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
12629 foobar    20   0 3452920 483652   7544 R 23.4 31.5   0:01.73 lt-pdftocairo
12629 foobar    20   0 3452920 894700   7544 R 17.4 58.3   0:03.17 lt-pdftocairo
12629 foobar    20   0 3452920 1.152g   7544 R 11.7 78.7   0:04.39 lt-pdftocairo
Comment 2 Adrian Johnson 2014-10-19 12:12:52 UTC
<<
	/Type /Pages
	/Kids [3 0 R]
	/Count 213804087
>>

The PDF claims to have 213804087 pages. Catalog.cc allocates the array of page pointers and array of page Refs and takes a very long time to initialize both arrays. I don't see an easy fix. We could dynamically resize the arrays as pages are accessed so we don't allocate all the memory upfront. But I don't see the benefit of the additional work, risk of bugs, and slight drop in performance for the 99.9999% of pdfs that report the correct page size.
Comment 3 MH 2014-10-20 07:20:58 UTC
Could do a sanity check, don't load if PDF claims to be over 100,000 pages or 1 million pages long, or check that file size is at least X bytes.

A long-term solution, if such huge files are to be supported, would be to load the file in 100,000 page chunks, then load the second part etc. I'm not sure how that would fit into your architecture, however.
Comment 4 Albert Astals Cid 2014-10-26 21:17:27 UTC
I don't personally believe in that kind of sanity checks, there's a moment in life in that they will bite you back because what was supposed to be impossible is now common
Comment 5 Albert Astals Cid 2015-09-20 18:23:22 UTC
Will be fixed in next release.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.