Bug 9994

Summary: performance opportunities in cairo
Product: pixman Reporter: Brian Cameron <brian.cameron>
Component: pixmanAssignee: Søren Sandmann Pedersen <soren.sandmann>
Status: RESOLVED WONTFIX QA Contact: Søren Sandmann Pedersen <soren.sandmann>
Severity: normal    
Priority: medium    
Version: git master   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: patch removing pointer chasing
this patch should work better

Description Brian Cameron 2007-02-15 20:14:12 UTC
In doing a performance test with cairo using evince, I notice that fbFetchTransformed uses 12.81% of the overall time loading a large PDF file.  It uses 15.44% of overall time if you include functions called by fbFetchTransformed.  In discussing this code with our performance experts here at Sun (James.Cheng@sun.com), the following issues were noticed:

The code seems to be doing a lot of lookups into image data.  It would be faster to build a lookup table.  table[0] = 0, table[1] = 0x010101, ..., and
replace out[i] = (in[i] >> 16) ... with a table lookup:

    out[i] = table[in[i]];

instead of doing logic like this.  I notice that each of these 4 lines are consuming 1% of the overall time, so improving this lines would be significant:

           r = (((ft * idisty + fb * disty) >> 16) & 0xff);
           r |= (((ft * idisty + fb * disty) >> 8) & 0xff00);
           r |= (((ft * idisty + fb * disty)) & 0xff0000);
           r |= (((ft * idisty + fb * disty) << 8) & 0xff000000 

At the very least, it would be better to compute (ft * idisty + fb * disty) once and reuse the variable rather than recomputing for each line.

The code contains lines like the following - the lines setting tl, tr, bl, and br each consume 1%.

                    tl = x1_out|y1_out ? 0 : fetch(b, x_off, indexed);
                    tr = x2_out|y1_out ? 0 : fetch(b, x_off + 1, indexed);
                    b += stride;
                    bl = x1_out|y2_out ? 0 : fetch(b, x_off, indexed);
                    br = x2_out|y2_out ? 0 : fetch(b, x_off + 1, indexed);

Is it really necessary to do conditional checking for each call?

Overall it seems that that a lot of pointer chasing is done in the for-loop which is unnecessary.  Variables like pict->pDrawable->width, pict->pDrawable->height, pict->pDrawable->x, and pict->pDrawable->y are used over and over again in these functions.  But these could be set to variables outside of the for loop to avoid chasing pointers.

Also it seems that some computations are done over and over again.  For example 
(x1 + pict->pDrawable->x), (x2 + pict->pDrawable->x), (y1 + pict->pDrawable->y), (y2 + pict->pDrawable->y), (ft * idisty + fb * disty).

It would probably be better to compute each of the above once in the for loop and reuse a temporary variable rather than recomputing over and over.  Examples
of lines in the for-loop that use these variables:

             b = bits + (y1 + pict->pDrawable->y)*stride;
             b = bits + (y2 + pict->pDrawable->y)*stride;
             tl = fetch(b, x1 + pict->pDrawable->x, indexed);
             tr = fetch(b, x2 + pict->pDrawable->x, indexed);
             b = bits + (y2 + pict->pDrawable->y)*stride;
             bl = fetch(b, x1 + pict->pDrawable->x, indexed);
             br = fetch(b, x2 + pict->pDrawable->x, indexed);
             r = (((ft * idisty + fb * disty) >> 16) & 0xff);
             r |= (((ft * idisty + fb * disty) >> 8) & 0xff00);

There are just some examples.  The code contains many places where the same math operations are done over and over which could be avoided.  Since this function consumes so much CPU, it seems there is some low-hanging fruit for improving the performance here.
Comment 1 Brian Cameron 2007-03-01 23:02:29 UTC
I also notice that these macros are used extensively in this function.

#define DIV(a,b) ((((a) < 0) == ((b) < 0)) ? (a) / (b) :\
        ((a) - (b) + 1 - (((b) < 0) << 1)) / (b))

#define MOD(a, b) ((b) == 1 ? 0 : \
        (a) >= 0 ? (a) % (b) : (b) - (-(a) - 1) % (b) - 1)

I wonder whether they can be simplified and/or sped up.  They look a bit labor intensive.
Comment 2 Brian Cameron 2007-03-05 06:00:03 UTC
Created attachment 8984 [details] [review]
patch removing pointer chasing


I wrote the attached patch to remove all pointer chasing from fbcompose.c.  I ran make test, and the tests all pass.  I notice that this seems to improve the overall performance of running cairo-perf by about 1%.
Comment 3 Behdad Esfahbod 2007-03-05 11:41:19 UTC
Proposing blocker for 1.4.2.
Comment 4 Behdad Esfahbod 2007-04-03 16:38:00 UTC
Is someone working on patches for this?
Comment 5 Brian Cameron 2007-04-03 19:37:45 UTC
I think the patch I already submitted addresses the most serious and obvious performance issues.  I raise a few other questions about whether certain macros could be simplified, etc.  These questions were mostly extracted from an email dialog I had with the Sun mediaLib team as we looked over this function looking for opportunities to improve its performance - since it is obviously a big time consumer in normal cases.  But if these questions don't prompt any "A-ha!" ideas about how to improve things, I'd say just apply the provided patch and close the bug.  And, in the future, maybe keep an eye out for ways to improve the performance in this area of the code.
Comment 6 Behdad Esfahbod 2007-04-04 14:11:05 UTC
Ok.  The patch seems harmless.  Just that it doesn't apply cleanly anymore.  I will apply it right away if an updated one is attached.

Thanks.
Comment 7 Daniel Amelang 2007-04-04 22:41:29 UTC
(In reply to comment #6)
> Ok.  The patch seems harmless.  Just that it doesn't apply cleanly anymore.  I
> will apply it right away if an updated one is attached.

It does look harmless. But it also doesn't seem to make a difference either. Brian said "this seems to improve the overall performance of running cairo-perf by about 1%." Overall performance of cairo-perf? That's pretty unusual, as many tests aren't even going to hit this code.

In addition, the 1% is more likely noise. There is a reason that cairo-perf-diff-files ignores by default any speedup/slowdown of 5% or less.

So, I personally think it would be strange to apply a 691 line patch for a performance improvement of 1%. But I've had a tough day, so maybe I'm being too tough. And I don't want to discourage anybody helping in the performance area. So...whatever. It _is_ harmless.
Comment 8 Brian Cameron 2007-04-05 01:44:43 UTC
Created attachment 9480 [details] [review]
this patch should work better


Here's an updated patch.  Some compilers may do some of this for you when you turn optimization up, but it's nicer, I think, if the code doesn't have so much pointer chasing.  It should improve performance not to dereference pointers.
Comment 9 Carl Worth 2007-04-05 09:22:26 UTC
I'd advise not applying the patch.

There could be different ways to justify a patch like this, neither of which
has been sufficiently demonstrated:

1. The patch could show a real performance improvement.

   If so, there would be almost nothing to argue against. We always want the
   code to run faster. (And I'd even accept a patch that only help when cairo
   is compiled with a braindead compiler that can't handle the most elementary
   forms of common sub-expression elimination---though I sincerely hope such
   compilers don't even exist.)

   But as discussed above, there has been no satisfactory demonstration of a
   performance benefit. (Any discussion of "overall" improvement to cairo-perf
   doesn't even make sense. A change like this to very specific code should
   show improvments to tests that exercise this code, and have no impact on
   other tests).

2. The patch could be an obvious clean-up, (for better aesthetics or easier
   maintainability).

   The patch is already adding more lines of code (165) than it removes (127)
   which is a strike against it in terms of aesthetics and maintainability.

   There are a few cases where the code does look marginally better, (like the
   first hunk where 11 occurrences of "pict->transform" get replaced with a
   simpler-to-read "transform"). And yes, a change like that will add more code
   than is removed, but still improve the code.

   But other portions of the code actually appear to add noise (to my eye at
   least).

   Now, I won't argue that the pixman code is actually aesthetically pleasing---
   it's definitey not. Expressions with two sets of arrow operators, (such
   as pict->pDrawable->height), are definitely ugly, so I appreciate the effort
   to clean those up. But the suggested replacements are problematic ("pixels"
   for pict->pDrawable suggests there's mismatched naming here---though the
   problem could be that pDrawable should be named pixels), and (pix_height has
   an unfortunate abberviation, which is something we like to avoid in cairo).

   Finally, the most important thing to understand is that a lot of the poor
   aesthetics of pixman, (compared to most of the cairo code), comes from the
   legacy of this code coming from the X server fb implementation, (that's where
   such an ugly name as pDrawable comes from, for example). And there's a
   effort happening right now to try to get the divergent pixman and fb
   implementations merged back together so the X server can share a common
   implementation of this code with cairo.

   So gratuitous changes, (even good style cleanups), should be avoided in
   pixman right now as they will make that merge effort more difficult.

So, that's a NACK from me on this patch for now.

But I definitely appreciate people looking closely at cairo---especially efforts
to improve the performance. So please continue that, and if you can present
compelling evidence of a performance improvement, please let us know and we'll
be glad to accept it.

Thanks,

-Carl


Comment 10 Carl Worth 2007-04-05 09:41:19 UTC
(In reply to comment #9)
>    So gratuitous changes, (even good style cleanups), should be avoided in
>    pixman right now as they will make that merge effort more difficult.

I should point out that there's really no good way for a potential cairo contributor to know about this particular issue. Hopefully the pixman/fb merge work will happen in short order, (though the problem has existed for years---so we'll see I guess), and this barrier to improving the code will go away.

I know that I'm quite anxious to do a lot of style cleanups inside of pixman. The lingering StudlyCaps (and horrific mixed_underscore_and_StudlyCaps stuff drives me batty).

I stay sane for now by just not looking into pixman whenever possible...

-Carl
Comment 11 Søren Sandmann Pedersen 2008-07-16 12:25:15 UTC
This seems to be essentially a bug against pixman.
Comment 12 Søren Sandmann Pedersen 2009-05-22 08:11:30 UTC
WONTFIX'ing, because the patch is obsolete and no performance improvements were ever demonstrated.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.