Created attachment 84321 [details] test pdf tile I saw in the poppler mailing list that a number of recent changes were supposed to make poppler run faster, so I tested the a 20 Aug 2013 clone and found pdftops that it was about 10% slower than last year. The times are relatively repeatable because I am reading from an SSD and writing to a RAM disk. This isn't a bug, but some of the recent changes might have had unintended consequences. I tested an unpatched 0.24.0 clone against a lightly patched 0.21.0 that I have been using for about a year. William $ ./pdftops -v pdftops version 0.24.0 Copyright 2005-2013 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC $ time ./pdftops -eps -level1sep /u/displayads/1288986-xpdfshadebug.pdf /tmp/x1.eps real 0m10.520s user 0m10.392s sys 0m0.113s $ /usr/local/bin/pdftops -v pdftops version 0.21.0 Copyright 2005-2012 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC $ time /usr/local/bin/pdftops -eps /u/displayads/1288986-xpdfshadebug.pdf /tmp/1288986-xpdfshadebug.eps real 0m8.455s user 0m8.355s sys 0m0.087s
If you have time to git bisect the regression that would be very much appreciated.
I did the bisect. It was more complicated than I thought because my laptop slows the cpu as it warms up from the builds. (The laptop has a 2.5 GHz i5-2450M.) It looks like the performance difference comes from a commit that implemented bilinear image scaling. I think that the scaling is important. When I disabled it by making Splash::isImageInterpolationRequired() return gFalse, pdftops runs faster but the results are not as good. I have attached a screen capture. I added debug code, and isImageInterpolationRequired() is called 124 times in my test file, and many of the images are scaled by almost 1.5 or almost 2. Is it worth the effort (or even possible) to make special cases for integral scaling? William results of the bisect b112602334a5de84ae30c2e90d9bc6d4609f7f96 08 Oct 12 8.455 good 1f4a012f7570ffd2120e3e8c2236de5408f3dda3 01 Mar 13 9.373 bad f8c1b55e764a6e79b0530fb1be9ee11917f4237e 01 Dec 12 9.476 bad 1969bcd693289eba1138fcaa74a684cb3ff2aefc 13 Nov 12 8.416 good 4ceb3f4f4ca0092e79bb36723a7332b071491666 23 Jun 12 8.484 good 85572b85950ed4e4421f1e61e704e5c250ca27d9 01 Dec 12 9.470 bad ff9e211cfb60eb820b9b046da546352fa59d7df9 23 Jun 12 8.481 good beff044e4fdf44e80ad7c75255cb71a83e70a293 27 Nov 12 8.425 good e6806d893a9a104e3f23d69d0245ad0e4948a409 28 Nov 12 8.474 good a97aead193a927b91a4e33d7b5d2ba7179e664df 30 Nov 12 9.494 bad a97aead193a927b91a4e33d7b5d2ba7179e664df is the first bad commit commit a97aead193a927b91a4e33d7b5d2ba7179e664df Author: Adrian Johnson <ajohnson@redneon.com> Date: Fri Nov 30 21:30:19 2012 +0100 Splash: Implement bilinear image scaling Bug #22138
Created attachment 84335 [details] screen capture showing what happens when interpolation is disabled The image on the left has interpolation disabled by making Splash::isImageInterpolationRequired() return gFalse. The image on the right is from an unpatched pdftops from poppler at commit a97aead193a927b91a4e33d7b5d2ba7179e664df. The red arrows show some of the differences. I think that poppler is working correctly, and the extra cpu time is necessary for better looking images.
I guess that unless you want to try to optimize the bilinear sclaing you can close this bug.
Created attachment 88906 [details] [review] patch for Splash::expandRow() This makes my 1288986-xpdfshadebug.pdf test file about 1% faster with pdftops -eps -level1sep 1288986-xpdfshadebug.pdf /tmp/x.eps In this run, 4852 of 29329 calls to Splash::expandRow() had a step of 1/2. This patch makes a special case for steps of 1/2 in expandRow() to use ints instead of doubles and should not change the results. It probably isn't worth pursuing this further. William
Hmmmm, isn't that just special casing for your file? Have you tried using memcpy instead of the for? Supposedly when compiling with optimization the compiler can insert "fast asm calls" there if feasible or just do a for loop if it finds out it's more expensive than doing a syscall
Created attachment 89118 [details] [review] patch for Splash::expandRow() with memcpy instead of a for loop >Hmmmm, isn't that just special casing for your file? Yes, I think that it might be over-tuning. I had expected that most of the expansions would be simple ratios that didn't need real number interpolation or possibly not any interpolation at all, but when I put in debug code, all of my test files had complicated ratios. My test files are newspaper advertisements, and the artists seem to make the bitmapped images fit into boxes, and the size of the images and boxes have no relation. Only the first half of my new loop can be converted into a memcpy(). The second half with dstBuf[nComps*x + c] = (srcBuf[nComps*p + c] + srcBuf[nComps*(p+1) + c]) / 2 has to stay a loop, although it might be possible to use the PAVGB instruction. You were right that memcpy was faster than a for loop. I am using gcc-4.7.2 on 64 bit Fedora 17, and I had thought that the optimizer would be able to figure it out. A for loop with only one statement in its body shouldn't be that hard for a compiler to optimize. Maybe gcc is afraid that the source and destination might overlap. I repeated the runs a few times with all files and executables on a ram disk. I have the real time and user time in seconds for pdftops -eps -level1sep /tmp/1288986-xpdfshadebug.pdf /tmp/x.eps original git source 10.573r 10.444u my old for loop patch 10.461r 10.344u my new memcpy patch 10.448r 10.308u I have no special need to optimize this file specifically. It is only an example that shows the cost of the bilinear scaling. Most of my files don't exercise this code or else run too quickly to give useful measurements. These patches are only an experiment to see if I could recover some of the performance penalty of the bilinear scaling by making some special cases that don't need real number calculations on each pixel, but it looks like the special cases are too rare to have a big effect. William
Honestly I don't see myself commiting this change for a marginal change in a very specific file. Anyone else has comments? Thomas? Adrian?
I don't think adding this special case is worthwhile for a 1% improvement in a small number of files. If we do find architectures where floating point is slower there is an integer only implementation of the scaling I did in bug 22138 comment 33 that will work for all images instead of special casing only some images. Another option for faster scaling would be to use a library like pixman that has optimized versions for different architectures eg SSE2 on x86. The downside is pixman only supports 8 bits per component RGB and has some limitations on image sizes so we would still need the poppler version for other the cases it doesn't handle.
I don't think that William's patch is worth the effort: scaledWidth is calculated with floats, so how often does the result really fit to the if condition?? And making the code less understandable for such a few exceptions with such a poor optimization?
I agree that it probably isn't worth it, but there was no way to tell for sure without making the patch and trying it on a few files. William
True. I'll close the bug for now, thanks a lot for the investigation :-)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.