68322 – pdftops 20 Aug 2013 is about 10% slower than last year

Bug 68322 - pdftops 20 Aug 2013 is about 10% slower than last year

Summary: pdftops 20 Aug 2013 is about 10% slower than last year

Status:	RESOLVED NOTABUG

Alias:	None

Product:	poppler
Classification:	Unclassified
Component:	utils (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	poppler-bugs
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-08-20 10:01 UTC by William Bader
Modified:	2013-11-18 18:52 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
test pdf tile (2.88 MB, text/plain) 2013-08-20 10:01 UTC, William Bader	Details
screen capture showing what happens when interpolation is disabled (74.40 KB, image/gif) 2013-08-20 13:54 UTC, William Bader	Details
patch for Splash::expandRow() (1.17 KB, patch) 2013-11-08 18:40 UTC, William Bader	Details \| Splinter Review
patch for Splash::expandRow() with memcpy instead of a for loop (1.42 KB, patch) 2013-11-13 05:07 UTC, William Bader	Details \| Splinter Review
View All

Description William Bader 2013-08-20 10:01:04 UTC

Created attachment 84321 [details]
test pdf tile

I saw in the poppler mailing list that a number of recent changes were supposed to make poppler run faster, so I tested the a 20 Aug 2013 clone and found pdftops that it was about 10% slower than last year.  The times are relatively repeatable because I am reading from an SSD and writing to a RAM disk.
This isn't a bug, but some of the recent changes might have had unintended consequences.
I tested an unpatched 0.24.0 clone against a lightly patched 0.21.0 that I have been using for about a year.
William

$ ./pdftops -v
pdftops version 0.24.0
Copyright 2005-2013 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
$ time ./pdftops -eps -level1sep /u/displayads/1288986-xpdfshadebug.pdf /tmp/x1.eps
real    0m10.520s
user    0m10.392s
sys     0m0.113s

$ /usr/local/bin/pdftops -v
pdftops version 0.21.0
Copyright 2005-2012 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
$ time /usr/local/bin/pdftops -eps /u/displayads/1288986-xpdfshadebug.pdf /tmp/1288986-xpdfshadebug.eps
real    0m8.455s
user    0m8.355s
sys     0m0.087s

Comment 1 Albert Astals Cid 2013-08-20 10:26:30 UTC

If you have time to git bisect the regression that would be very much appreciated.

Comment 2 William Bader 2013-08-20 13:49:20 UTC

I did the bisect.  It was more complicated than I thought because my laptop slows the cpu as it warms up from the builds.  (The laptop has a 2.5 GHz i5-2450M.)

It looks like the performance difference comes from a commit that implemented bilinear image scaling.  I think that the scaling is important.  When I disabled it by making Splash::isImageInterpolationRequired() return gFalse, pdftops runs faster but the results are not as good.  I have attached a screen capture.
I added debug code, and isImageInterpolationRequired() is called 124 times in my test file, and many of the images are scaled by almost 1.5 or almost 2.  Is it worth the effort (or even possible) to make special cases for integral scaling?

William

results of the bisect

b112602334a5de84ae30c2e90d9bc6d4609f7f96 08 Oct 12 8.455 good
1f4a012f7570ffd2120e3e8c2236de5408f3dda3 01 Mar 13 9.373 bad
f8c1b55e764a6e79b0530fb1be9ee11917f4237e 01 Dec 12 9.476 bad
1969bcd693289eba1138fcaa74a684cb3ff2aefc 13 Nov 12 8.416 good
4ceb3f4f4ca0092e79bb36723a7332b071491666 23 Jun 12 8.484 good
85572b85950ed4e4421f1e61e704e5c250ca27d9 01 Dec 12 9.470 bad
ff9e211cfb60eb820b9b046da546352fa59d7df9 23 Jun 12 8.481 good
beff044e4fdf44e80ad7c75255cb71a83e70a293 27 Nov 12 8.425 good
e6806d893a9a104e3f23d69d0245ad0e4948a409 28 Nov 12 8.474 good
a97aead193a927b91a4e33d7b5d2ba7179e664df 30 Nov 12 9.494 bad

a97aead193a927b91a4e33d7b5d2ba7179e664df is the first bad commit
commit a97aead193a927b91a4e33d7b5d2ba7179e664df
Author: Adrian Johnson <ajohnson@redneon.com>
Date:   Fri Nov 30 21:30:19 2012 +0100
 Splash: Implement bilinear image scaling
 Bug #22138

Comment 3 William Bader 2013-08-20 13:54:44 UTC

Created attachment 84335 [details]
screen capture showing what happens when interpolation is disabled

The image on the left has interpolation disabled by making Splash::isImageInterpolationRequired() return gFalse.  The image on the right is from an unpatched pdftops from poppler at commit a97aead193a927b91a4e33d7b5d2ba7179e664df.
The red arrows show some of the differences.
I think that poppler is working correctly, and the extra cpu time is necessary for better looking images.

Comment 4 Albert Astals Cid 2013-08-20 17:56:24 UTC

I guess that unless you want to try to optimize the bilinear sclaing you can close this bug.

Comment 5 William Bader 2013-11-08 18:40:01 UTC

Created attachment 88906 [details] [review]
patch for Splash::expandRow()

This makes my 1288986-xpdfshadebug.pdf test file about 1% faster with
pdftops -eps -level1sep 1288986-xpdfshadebug.pdf /tmp/x.eps
In this run, 4852 of 29329 calls to Splash::expandRow() had a step of 1/2.
This patch makes a special case for steps of 1/2 in expandRow() to use ints instead of doubles and should not change the results.
It probably isn't worth pursuing this further.
William

Comment 6 Albert Astals Cid 2013-11-12 23:25:07 UTC

Hmmmm, isn't that just special casing for your file?

Have you tried using memcpy instead of the for? Supposedly when compiling with optimization the compiler can insert "fast asm calls" there if feasible or just do a for loop if it finds out it's more expensive than doing a syscall

Comment 7 William Bader 2013-11-13 05:07:47 UTC

Created attachment 89118 [details] [review]
patch for Splash::expandRow() with memcpy instead of a for loop

>Hmmmm, isn't that just special casing for your file?

Yes, I think that it might be over-tuning.  I had expected that most of the expansions would be simple ratios that didn't need real number interpolation or possibly not any interpolation at all, but when I put in debug code, all of my test files had complicated ratios.  My test files are newspaper advertisements, and the artists seem to make the bitmapped images fit into boxes, and the size of the images and boxes have no relation.

Only the first half of my new loop can be converted into a memcpy().
The second half with
dstBuf[nComps*x + c] = (srcBuf[nComps*p + c] + srcBuf[nComps*(p+1) + c]) / 2
has to stay a loop, although it might be possible to use the PAVGB instruction.

You were right that memcpy was faster than a for loop.  I am using gcc-4.7.2 on 64 bit Fedora 17, and I had thought that the optimizer would be able to figure it out.  A for loop with only one statement in its body shouldn't be that hard for a compiler to optimize.  Maybe gcc is afraid that the source and destination might overlap.

I repeated the runs a few times with all files and executables on a ram disk.
I have the real time and user time in seconds for pdftops -eps -level1sep /tmp/1288986-xpdfshadebug.pdf /tmp/x.eps

original git source   10.573r 10.444u
my old for loop patch 10.461r 10.344u 
my new memcpy patch   10.448r 10.308u

I have no special need to optimize this file specifically.  It is only an example that shows the cost of the bilinear scaling.  Most of my files don't exercise this code or else run too quickly to give useful measurements.

These patches are only an experiment to see if I could recover some of the performance penalty of the bilinear scaling by making some special cases that don't need real number calculations on each pixel, but it looks like the special cases are too rare to have a big effect.

William

Comment 8 Albert Astals Cid 2013-11-13 19:29:34 UTC

Honestly I don't see myself commiting this change for a marginal change in a very specific file. 

Anyone else has comments?

Thomas? Adrian?

Comment 9 Adrian Johnson 2013-11-13 21:22:54 UTC

I don't think adding this special case is worthwhile for a 1% improvement in a small number of files.

If we do find architectures where floating point is slower there is an integer only implementation of the scaling I did in bug 22138 comment 33 that will work for all images instead of special casing only some images.

Another option for faster scaling would be to use a library like pixman that has optimized versions for different architectures eg SSE2 on x86. The downside is pixman only supports 8 bits per component RGB and has some limitations on image sizes so we would still need the poppler version for other the cases it doesn't handle.

Comment 10 Thomas Freitag 2013-11-18 08:02:04 UTC

I don't think that William's patch is worth the effort: scaledWidth is calculated with floats, so how often does the result really fit to the if condition?? And making the code less understandable for such a few exceptions with such a poor optimization?

Comment 11 William Bader 2013-11-18 17:56:36 UTC

I agree that it probably isn't worth it, but there was no way to tell for sure without making the patch and trying it on a few files.
William

Comment 12 Albert Astals Cid 2013-11-18 18:52:20 UTC

True.

I'll close the bug for now, thanks a lot for the investigation :-)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.