pdftoppm runs out of memory when trying to convert this PDF file: $ wget -q https://bitbucket.org/jwilk/pdf2djvu/issue-attachment/106/jwilk/pdf2djvu/1432051577.18/106/Page156.pdf $ ulimit -v 500000 # 500MB $ pdftoppm -r 300 Page156.pdf > /dev/null Out of memory Apparently it's because it tries to allocate memory for a huge bitmap (2146x61483, whereas the output image size is only 2280x3071): #0 0x00007ffff5d4d620 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff5ce8473 in _IO_new_file_write (f=0x7ffff6018060 <_IO_2_1_stderr_>, data=0x7ffff7b54770, n=14) at fileops.c:1253 #2 0x00007ffff5ce7b33 in new_do_write (fp=fp@entry=0x7ffff6018060 <_IO_2_1_stderr_>, data=data@entry=0x7ffff7b54770 "Out of memory\n", to_do=to_do@entry=14) at fileops.c:530 #3 0x00007ffff5ce8a86 in _IO_new_file_xsputn (f=0x7ffff6018060 <_IO_2_1_stderr_>, data=<optimized out>, n=14) at fileops.c:1335 #4 0x00007ffff5cdeb0d in __GI__IO_fwrite (buf=<optimized out>, size=1, count=14, fp=0x7ffff6018060 <_IO_2_1_stderr_>) at iofwrite.c:43 #5 0x00007ffff7ac376e in gmalloc (size=131942518, checkoverflow=false) at gmem.cc:111 #6 0x00007ffff7ac395b in gmallocn (nObjs=2146, objSize=61483, checkoverflow=false) at gmem.cc:192 #7 0x00007ffff7ac397f in gmallocn (nObjs=2146, objSize=61483) at gmem.cc:196 #8 0x00007ffff7af5399 in SplashBitmap::SplashBitmap (this=0x67e9a0, widthA=2146, heightA=61483, rowPadA=1, modeA=splashModeRGB8, alphaA=true, topDown=true, separationListA=0x663350) at SplashBitmap.cc:119 #9 0x00007ffff7aeaac7 in Splash::scaleImage (this=0x6665e0, src=0x7ffff799da4c <SplashOutputDev::tilingBitmapSrc(void*, unsigned char*, unsigned char*)>, srcData=0x7fffffffdab0, srcMode=splashModeRGB8, nComps=3, srcAlpha=true, srcWidth=2146, srcHeight=2148, scaledWidth=2146, scaledHeight=61483, interpolate=false, tilingPattern=false) at Splash.cc:4133 #10 0x00007ffff7ae99eb in Splash::arbitraryTransformImage (this=0x6665e0, src=0x7ffff799da4c <SplashOutputDev::tilingBitmapSrc(void*, unsigned char*, unsigned char*)>, srcData=0x7fffffffdab0, srcMode=splashModeRGB8, nComps=3, srcAlpha=true, srcWidth=2146, srcHeight=2148, mat=0x7fffffffdae0, interpolate=false, tilingPattern=true) at Splash.cc:3934 #11 0x00007ffff7ae8f03 in Splash::drawImage (this=0x6665e0, src=0x7ffff799da4c <SplashOutputDev::tilingBitmapSrc(void*, unsigned char*, unsigned char*)>, srcData=0x7fffffffdab0, srcMode=splashModeRGB8, srcAlpha=true, w=2146, h=2148, mat=0x7fffffffdae0, interpolate=false, tilingPattern=true) at Splash.cc:3799 #12 0x00007ffff79a2aef in SplashOutputDev::tilingPatternFill (this=0x6497f0, state=0x67e180, gfxA=0x6601b0, catalog=0x6496a0, str=0x6d03f8, ptm=0x6d03c8, paintType=1, resDict=0x6709d0, mat=0x7fffffffdd20, bbox=0x6d0388, x0=0, y0=0, x1=1, y1=1, xStep=20, yStep=20) at SplashOutputDev.cc:4361 #13 0x00007ffff79fe3a3 in Gfx::doTilingPatternFill (this=0x6601b0, tPat=0x6d0370, stroke=false, eoFill=true, text=false) at Gfx.cc:2283 #14 0x00007ffff79fcc3b in Gfx::doPatternFill (this=0x6601b0, eoFill=true) at Gfx.cc:2020 #15 0x00007ffff79fc674 in Gfx::opEOFill (this=0x6601b0, args=0x7fffffffe020, numArgs=0) at Gfx.cc:1906 #16 0x00007ffff79f8000 in Gfx::execOp (this=0x6601b0, cmd=0x7fffffffe230, args=0x7fffffffe020, numArgs=0) at Gfx.cc:904 #17 0x00007ffff79f7931 in Gfx::go (this=0x6601b0, topLevel=true) at Gfx.cc:763 #18 0x00007ffff79f7765 in Gfx::display (this=0x6601b0, obj=0x7fffffffe340, topLevel=true) at Gfx.cc:729 #19 0x00007ffff7a5d765 in Page::displaySlice (this=0x650240, out=0x6497f0, hDPI=300, vDPI=300, rotate=0, useMediaBox=true, crop=false, sliceX=0, sliceY=0, sliceW=2280, sliceH=3071, printing=false, abortCheckCbk=0x0, abortCheckCbkData=0x0, annotDisplayDecideCbk=0x0, annotDisplayDecideCbkData=0x0, copyXRef=false) at Page.cc:599 #20 0x00007ffff7a61088 in PDFDoc::displayPageSlice (this=0x648e90, out=0x6497f0, page=1, hDPI=300, vDPI=300, rotate=0, useMediaBox=true, crop=false, printing=false, sliceX=0, sliceY=0, sliceW=2280, sliceH=3071, abortCheckCbk=0x0, abortCheckCbkData=0x0, annotDisplayDecideCbk=0x0, annotDisplayDecideCbkData=0x0, copyXRef=false) at PDFDoc.cc:504 #21 0x00000000004018e0 in savePageSlice (doc=0x648e90, splashOut=0x6497f0, pg=1, x=0, y=0, w=2280, h=3071, pg_w=2279.5250000000001, pg_h=3070.8625000000002, ppmFile=0x0) at pdftoppm.cc:225 #22 0x0000000000402778 in main (argc=2, argv=0x7fffffffe6c8) at pdftoppm.cc:532 Tested with Poppler 0.33.0.
The huge bitmap is a tiling pattern including its repititions before it is scaled to the resulting bitmap. This algorithm was introduced in poppler 0.17.0 and dramatically increased the speed for rendering when there is a high repitition rate. But in this case the tiling pattern itself is so huge, but the repitition count in x and y direction is nearly always small, between 1 x 1 amd 2 x 2 in nearly every of quite a lot tiling patterns. So I give it a try to fall back to the old algorithm, where the tiling pattern is rendered for every repitition again but directly to the resulting bitmap and measured the time: a) Without my changes: time ./utils/pdftoppm -png -cropbox -r 300 90596.open/Page156.pdf output/90596 real 0m29.039s user 0m28.395s sys 0m0.275s b) Use the fallback time ./utils/pdftoppm -png -cropbox -r 300 90596.open/Page156.pdf output/90596-new real 0m25.754s user 0m25.592s sys 0m0.170s So it's even faster to use the fallback here. So I create a patch where it falls back to the old algorithm if repeatX * repeatY <= 4 and measure the time of my regression test: a) Without changes: Refs created in 44 minutes and 32 seconds b) With limit and fallback: Refs created in 41 minutes and 4 seconds So I will upload a patch when a solution for bug 94053 will be committed.
Created attachment 121780 [details] [review] Fall back to Gfx implementation of tiling pattern if repetition rate is small Here the announced patch
Patch commited
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.