In doing a performance test with cairo using evince, I notice that fbFetchTransformed uses 12.81% of the overall time loading a large PDF file. It uses 15.44% of overall time if you include functions called by fbFetchTransformed. In discussing this code with our performance experts here at Sun (James.Cheng@sun.com), the following issues were noticed: The code seems to be doing a lot of lookups into image data. It would be faster to build a lookup table. table[0] = 0, table[1] = 0x010101, ..., and replace out[i] = (in[i] >> 16) ... with a table lookup: out[i] = table[in[i]]; instead of doing logic like this. I notice that each of these 4 lines are consuming 1% of the overall time, so improving this lines would be significant: r = (((ft * idisty + fb * disty) >> 16) & 0xff); r |= (((ft * idisty + fb * disty) >> 8) & 0xff00); r |= (((ft * idisty + fb * disty)) & 0xff0000); r |= (((ft * idisty + fb * disty) << 8) & 0xff000000 At the very least, it would be better to compute (ft * idisty + fb * disty) once and reuse the variable rather than recomputing for each line. The code contains lines like the following - the lines setting tl, tr, bl, and br each consume 1%. tl = x1_out|y1_out ? 0 : fetch(b, x_off, indexed); tr = x2_out|y1_out ? 0 : fetch(b, x_off + 1, indexed); b += stride; bl = x1_out|y2_out ? 0 : fetch(b, x_off, indexed); br = x2_out|y2_out ? 0 : fetch(b, x_off + 1, indexed); Is it really necessary to do conditional checking for each call? Overall it seems that that a lot of pointer chasing is done in the for-loop which is unnecessary. Variables like pict->pDrawable->width, pict->pDrawable->height, pict->pDrawable->x, and pict->pDrawable->y are used over and over again in these functions. But these could be set to variables outside of the for loop to avoid chasing pointers. Also it seems that some computations are done over and over again. For example (x1 + pict->pDrawable->x), (x2 + pict->pDrawable->x), (y1 + pict->pDrawable->y), (y2 + pict->pDrawable->y), (ft * idisty + fb * disty). It would probably be better to compute each of the above once in the for loop and reuse a temporary variable rather than recomputing over and over. Examples of lines in the for-loop that use these variables: b = bits + (y1 + pict->pDrawable->y)*stride; b = bits + (y2 + pict->pDrawable->y)*stride; tl = fetch(b, x1 + pict->pDrawable->x, indexed); tr = fetch(b, x2 + pict->pDrawable->x, indexed); b = bits + (y2 + pict->pDrawable->y)*stride; bl = fetch(b, x1 + pict->pDrawable->x, indexed); br = fetch(b, x2 + pict->pDrawable->x, indexed); r = (((ft * idisty + fb * disty) >> 16) & 0xff); r |= (((ft * idisty + fb * disty) >> 8) & 0xff00); There are just some examples. The code contains many places where the same math operations are done over and over which could be avoided. Since this function consumes so much CPU, it seems there is some low-hanging fruit for improving the performance here.
I also notice that these macros are used extensively in this function. #define DIV(a,b) ((((a) < 0) == ((b) < 0)) ? (a) / (b) :\ ((a) - (b) + 1 - (((b) < 0) << 1)) / (b)) #define MOD(a, b) ((b) == 1 ? 0 : \ (a) >= 0 ? (a) % (b) : (b) - (-(a) - 1) % (b) - 1) I wonder whether they can be simplified and/or sped up. They look a bit labor intensive.
Created attachment 8984 [details] [review] patch removing pointer chasing I wrote the attached patch to remove all pointer chasing from fbcompose.c. I ran make test, and the tests all pass. I notice that this seems to improve the overall performance of running cairo-perf by about 1%.
Proposing blocker for 1.4.2.
Is someone working on patches for this?
I think the patch I already submitted addresses the most serious and obvious performance issues. I raise a few other questions about whether certain macros could be simplified, etc. These questions were mostly extracted from an email dialog I had with the Sun mediaLib team as we looked over this function looking for opportunities to improve its performance - since it is obviously a big time consumer in normal cases. But if these questions don't prompt any "A-ha!" ideas about how to improve things, I'd say just apply the provided patch and close the bug. And, in the future, maybe keep an eye out for ways to improve the performance in this area of the code.
Ok. The patch seems harmless. Just that it doesn't apply cleanly anymore. I will apply it right away if an updated one is attached. Thanks.
(In reply to comment #6) > Ok. The patch seems harmless. Just that it doesn't apply cleanly anymore. I > will apply it right away if an updated one is attached. It does look harmless. But it also doesn't seem to make a difference either. Brian said "this seems to improve the overall performance of running cairo-perf by about 1%." Overall performance of cairo-perf? That's pretty unusual, as many tests aren't even going to hit this code. In addition, the 1% is more likely noise. There is a reason that cairo-perf-diff-files ignores by default any speedup/slowdown of 5% or less. So, I personally think it would be strange to apply a 691 line patch for a performance improvement of 1%. But I've had a tough day, so maybe I'm being too tough. And I don't want to discourage anybody helping in the performance area. So...whatever. It _is_ harmless.
Created attachment 9480 [details] [review] this patch should work better Here's an updated patch. Some compilers may do some of this for you when you turn optimization up, but it's nicer, I think, if the code doesn't have so much pointer chasing. It should improve performance not to dereference pointers.
I'd advise not applying the patch. There could be different ways to justify a patch like this, neither of which has been sufficiently demonstrated: 1. The patch could show a real performance improvement. If so, there would be almost nothing to argue against. We always want the code to run faster. (And I'd even accept a patch that only help when cairo is compiled with a braindead compiler that can't handle the most elementary forms of common sub-expression elimination---though I sincerely hope such compilers don't even exist.) But as discussed above, there has been no satisfactory demonstration of a performance benefit. (Any discussion of "overall" improvement to cairo-perf doesn't even make sense. A change like this to very specific code should show improvments to tests that exercise this code, and have no impact on other tests). 2. The patch could be an obvious clean-up, (for better aesthetics or easier maintainability). The patch is already adding more lines of code (165) than it removes (127) which is a strike against it in terms of aesthetics and maintainability. There are a few cases where the code does look marginally better, (like the first hunk where 11 occurrences of "pict->transform" get replaced with a simpler-to-read "transform"). And yes, a change like that will add more code than is removed, but still improve the code. But other portions of the code actually appear to add noise (to my eye at least). Now, I won't argue that the pixman code is actually aesthetically pleasing--- it's definitey not. Expressions with two sets of arrow operators, (such as pict->pDrawable->height), are definitely ugly, so I appreciate the effort to clean those up. But the suggested replacements are problematic ("pixels" for pict->pDrawable suggests there's mismatched naming here---though the problem could be that pDrawable should be named pixels), and (pix_height has an unfortunate abberviation, which is something we like to avoid in cairo). Finally, the most important thing to understand is that a lot of the poor aesthetics of pixman, (compared to most of the cairo code), comes from the legacy of this code coming from the X server fb implementation, (that's where such an ugly name as pDrawable comes from, for example). And there's a effort happening right now to try to get the divergent pixman and fb implementations merged back together so the X server can share a common implementation of this code with cairo. So gratuitous changes, (even good style cleanups), should be avoided in pixman right now as they will make that merge effort more difficult. So, that's a NACK from me on this patch for now. But I definitely appreciate people looking closely at cairo---especially efforts to improve the performance. So please continue that, and if you can present compelling evidence of a performance improvement, please let us know and we'll be glad to accept it. Thanks, -Carl
(In reply to comment #9) > So gratuitous changes, (even good style cleanups), should be avoided in > pixman right now as they will make that merge effort more difficult. I should point out that there's really no good way for a potential cairo contributor to know about this particular issue. Hopefully the pixman/fb merge work will happen in short order, (though the problem has existed for years---so we'll see I guess), and this barrier to improving the code will go away. I know that I'm quite anxious to do a lot of style cleanups inside of pixman. The lingering StudlyCaps (and horrific mixed_underscore_and_StudlyCaps stuff drives me batty). I stay sane for now by just not looking into pixman whenever possible... -Carl
This seems to be essentially a bug against pixman.
WONTFIX'ing, because the patch is obsolete and no performance improvements were ever demonstrated.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.