Bug 95362 - pdftoppm takes forever on some files
Summary: pdftoppm takes forever on some files
Status: RESOLVED WONTFIX
Alias: None
Product: poppler
Classification: Unclassified
Component: splash backend (show other bugs)
Version: unspecified
Hardware: All Mac OS X (All)
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-12 08:58 UTC by Christian Fellmann
Modified: 2016-05-23 05:17 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
The file which takes forever to convert to jpeg with pdftoppm (1.51 MB, application/pdf)
2016-05-12 08:58 UTC, Christian Fellmann
Details

Description Christian Fellmann 2016-05-12 08:58:08 UTC
Created attachment 123641 [details]
The file which takes forever to convert to jpeg with pdftoppm

On Ubuntu 10.04.3 LTS and OSX 10.11.4 using Poppler 0.43.0

We use pdftoppm in a production application. Users can upload PDF files, which will be converted to jpeg tumbnails using pdftoppm.

Some files cause pdftoppm to use 100% cpu and the conversion takes forever (we waited > 1h). I attached one of the problematic PDF files.

Any help would be very appreciated, thanks in advance!
Comment 1 Thomas Freitag 2016-05-12 16:12:28 UTC
I had a look at that PDF and I fear that we have nobody who can help You with it and pdftoppm in near futur: the PDF uses a lot of big stroke pathes and we know that the splash backend and therefore pdftoppm has performance problems with complex pathes.
But here at least some hints. The PDF has a very high metric (60 x 168 cm). That results in a JPEG with a huge number of pixel and lines when You render it with the default of 150 dpi. If You just need thumbnails, You can reduce the render time dramatically if You reduce the resolution, i.e.

time ./utils/pdftoppm -png -cropbox -r 25 bug-poppler95362.pdf output/95362

real	32m38.676s
user	32m33.867s
sys	0m5.326s

Ok, nearly 33 minutes are also not such well but much better than over an hour.

And if all Your PDFs are such PDFs (like construction plans) You can think about to use pdftocairo for Your purposes, which is also a poppler tool but much faster with such kind of PDFs:

time ./utils/pdftocairo  -jpeg -cropbox bug-poppler95362.pdf output/95362-cairo

real	2m13.642s
user	2m8.993s
sys	0m4.695s
Comment 2 Christian Fellmann 2016-05-23 05:17:14 UTC
Thank you Thomas for your time and investigation. It helped me a lot: After your explanation and recommendation, I switched to pdftocairo which was a huge relief.

My colleague also ran some tests with imagemagick (which does the conversion with ghostscript) and while imagemagick seems to be slower for simple pdf's, it is a lot faster for complex pdf's like the one i attached.

So for our use case, imagemagick seems to be the better tool. But we will do some more tests and will decide later based on the results.

Again, thank you for your help.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.