Bug 68420 - Cache lcms transformed values for ICC colorspaces in getRGB(), getCMYK() and getGray()
Summary: Cache lcms transformed values for ICC colorspaces in getRGB(), getCMYK() and ...
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium enhancement
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-22 10:15 UTC by Thomas Freitag
Modified: 2013-10-01 20:59 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
PDF-X version of ducks & roses which illustrates the effect (1.08 MB, application/pdf)
2013-08-22 10:15 UTC, Thomas Freitag
Details
cache cms values in getGray(), getRGB() and getCMYK() (5.46 KB, patch)
2013-08-26 10:30 UTC, Thomas Freitag
Details | Splinter Review

Description Thomas Freitag 2013-08-22 10:15:22 UTC
Created attachment 84442 [details]
PDF-X version of ducks & roses which illustrates the effect

In bug 66928 we speed up rendering of images in icc colorspaces with the usage of getRGBLine() instead of getRGB() for images. But this will not help for PDF, where shadings are in icc colorspaces.
I created a sample PDF from ducks & roses, which uses a lot of shadings, which I attach here. 
Without caching already transformed values we have the following result:

time ./utils/pdftoppm -png -cropbox bugzilla/iducksNroses1.pdf  output/ducks

real	0m6.395s
user	0m5.928s
sys	0m0.020s

and it uses 852,840,264 bytes (measured with valgrind).

With a simple cache we can increase that speed:

time ./utils/pdftoppm -png -cropbox bugzilla/iducksNroses1.pdf  output/ducks

real	0m3.367s
user	0m3.284s
sys	0m0.048s

without using much more memory: 853,832,752 bytes

If You're interested, I'll create a patch after bug 66928 and bug 34053 will be closed.
Comment 1 Albert Astals Cid 2013-08-25 17:44:52 UTC
I think that having a cache is always a good thing, i'm just scared at how much memory use it could potentially end up using, sure in your case is just a 1MB more over 900MB, wow we really need that much memory? Are you sure you're using valgrind right? Actually the correct tool for this would be massif.

Anyway, what i'm trying to say i'd be much happier with some kind of patch if the cache was limited, though obviously that means maitaining a proper cache and not just a std::something, which is both harder to code and may slow it down a bit.

Because if we don't limit the cache it can end up being 64M big, no?
Comment 2 Thomas Freitag 2013-08-26 08:08:52 UTC
(In reply to comment #1)
> I think that having a cache is always a good thing, i'm just scared at how
> much memory use it could potentially end up using, sure in your case is just
> a 1MB more over 900MB, wow we really need that much memory? Are you sure
> you're using valgrind right? Actually the correct tool for this would be
> massif.

Sorry about horrifying You with the memory usage, I should have explain that it is the sum of memory allocated during runtime, not the highwater value during runtime. I just wanted to point to the fact, that even for the attached PDF with a very huge using of different colors we just need that 1 MB of memory.
(Keep in mind: because we now use getRGBLine for images, getRGB is only used in fill and stroke operations now, and so shadings are these painting operations which are using a lot of getRGB calls with different colors)

> 
> Anyway, what i'm trying to say i'd be much happier with some kind of patch
> if the cache was limited, though obviously that means maitaining a proper
> cache and not just a std::something, which is both harder to code and may
> slow it down a bit.

We can still use the std::map as the cache container but limit it to certain number of values we cache. I'll make some tests to see how much this would reduce the speed and if how many values need to be cached by the attached PDF.

> 
> Because if we don't limit the cache it can end up being 64M big, no?

I don't think that we end up with 64M....
Comment 3 Thomas Freitag 2013-08-26 10:30:05 UTC
Created attachment 84642 [details] [review]
cache cms values in getGray(), getRGB() and getCMYK()

Some additional remarks:

1. I tested the cache with -png, -png -gray and -jpegcmyk, in all cases the rendering time took round about 2.5 seconds less than without cache on my laptop.
2. I encountered, that the test PDF which I produced use a different ICC colorspace for each shading. Therefore copying the cache is not necessary in the ::copy function.
3. The increasement of speed comes from the lot of hits in the cache, but the number of cached items is quite small. Therefore I choose a small limit (2048) for the cache!
Comment 4 Albert Astals Cid 2013-08-29 20:42:05 UTC
Is it ok if we talk about this once i'm back? Need to read guides to see what i should visit on my holiday :D
Comment 5 Albert Astals Cid 2013-09-25 20:13:43 UTC
I do get a reasonable increase too, from 2400ms to 1800ms

Two things:
 * It'd be cool if we could reuse the key and not calculate it twice, but then it'll probably make the code harder to read, so ignore

 * It'd be cool if we had "a proper cache" and not just cache the first 2048 items, so probably ignore too :D

Only thing, is i think i prefer #define CMSCACHE_LIMIT 2048 in the .cc and not in the header, but that's just me nitpicking.

So should i start the regtest just in case something bad went wrong or you prefer to get bug 34053 commited first as you say in the bug description?
Comment 6 Albert Astals Cid 2013-09-25 20:14:11 UTC
increase -> decrease :D (or increase of speed ;-))
Comment 7 Thomas Freitag 2013-09-26 07:18:51 UTC
(In reply to comment #5)
> 
> So should i start the regtest just in case something bad went wrong or you
> prefer to get bug 34053 commited first as you say in the bug description?

It's up to You. After You commit this or bug 34053 first, I'll create a rebased patch for the other one, if You need that.
Comment 8 Albert Astals Cid 2013-10-01 20:59:16 UTC
Commited.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.