Summary: | [pdftocairo] Memory leak, memory fills in seconds | ||
---|---|---|---|
Product: | poppler | Reporter: | MH <ravdune+bugzilla> |
Component: | general | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | fdo-bugs |
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | memoryleak.pdf |
Demonstration of rising memory use (interrupted after this): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12629 foobar 20 0 3452920 483652 7544 R 23.4 31.5 0:01.73 lt-pdftocairo 12629 foobar 20 0 3452920 894700 7544 R 17.4 58.3 0:03.17 lt-pdftocairo 12629 foobar 20 0 3452920 1.152g 7544 R 11.7 78.7 0:04.39 lt-pdftocairo <<
/Type /Pages
/Kids [3 0 R]
/Count 213804087
>>
The PDF claims to have 213804087 pages. Catalog.cc allocates the array of page pointers and array of page Refs and takes a very long time to initialize both arrays. I don't see an easy fix. We could dynamically resize the arrays as pages are accessed so we don't allocate all the memory upfront. But I don't see the benefit of the additional work, risk of bugs, and slight drop in performance for the 99.9999% of pdfs that report the correct page size.
Could do a sanity check, don't load if PDF claims to be over 100,000 pages or 1 million pages long, or check that file size is at least X bytes. A long-term solution, if such huge files are to be supported, would be to load the file in 100,000 page chunks, then load the second part etc. I'm not sure how that would fit into your architecture, however. I don't personally believe in that kind of sanity checks, there's a moment in life in that they will bite you back because what was supposed to be impossible is now common Will be fixed in next release. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 107989 [details] memoryleak.pdf OS: Fedora 20 (running in virtualbox) Dependencies installed with: yum-builddep poppler Version: GIT Master Command line: master/pdftocairo -svg <attached.pdf> /dev/null Also repeats in pdftotext and pdfimages (did not check other utils) ########################################################################### GDB output: Reading symbols from /home/foobar/poppler/utils/.libs/lt-pdftocairo...done. Starting program: /home/foobar/poppler/utils/.libs/lt-pdftocairo -svg fullmemory-13-pdftocairofuzz-6.pdf /dev/null [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Syntax Warning: No valid XRef size in trailer ^C Program received signal SIGINT, Interrupt. Catalog::cachePageTree (this=this@entry=0x6525b0, page=page@entry=1) at Catalog.cc:312 312 pageRefs[i].num = -1; (gdb) print pagesSize $1 = 213804087