Bug 10151 - Add mediaLib support
Summary: Add mediaLib support
Status: RESOLVED NOTABUG
Alias: None
Product: pixman
Classification: Unclassified
Component: pixman (show other bugs)
Version: git master
Hardware: SPARC All
: medium normal
Assignee: Søren Sandmann Pedersen
QA Contact: cairo-bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-01 23:14 UTC by Brian Cameron
Modified: 2009-06-05 03:33 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
mediaLib patch for cairo (4.71 KB, patch)
2007-03-01 23:15 UTC, Brian Cameron
Details | Splinter Review
updated patch (4.77 KB, text/x-patch)
2007-03-02 00:42 UTC, Brian Cameron
Details
updated patch (4.77 KB, text/x-patch)
2007-03-02 00:44 UTC, Brian Cameron
Details
updated patch (4.77 KB, patch)
2007-03-02 00:45 UTC, Brian Cameron
Details | Splinter Review
performance output with mediaLib (83.65 KB, text/plain)
2007-03-05 03:14 UTC, Brian Cameron
Details
output without mediaLib (83.65 KB, patch)
2007-03-05 03:37 UTC, Brian Cameron
Details | Splinter Review
Updated patch (4.78 KB, patch)
2007-04-03 22:29 UTC, Brian Cameron
Details | Splinter Review
updated patch for cairo 1.4.6 (4.78 KB, patch)
2007-05-03 00:46 UTC, Brian Cameron
Details | Splinter Review
cairo-perf report (45.47 KB, text/plain)
2007-11-28 12:18 UTC, Brian Cameron
Details
comparing two runs (6.44 KB, text/x-patch)
2007-11-28 22:57 UTC, Brian Cameron
Details
updated patch (3.31 KB, patch)
2007-11-29 15:42 UTC, Brian Cameron
Details | Splinter Review
output of running "make check" (103.74 KB, text/plain)
2007-11-29 15:49 UTC, Brian Cameron
Details
performance report for updated patch (86.31 KB, text/plain)
2007-11-29 15:58 UTC, Brian Cameron
Details
report for just function #1 (fbStore_a8) (4.42 KB, text/plain)
2007-11-29 16:39 UTC, Brian Cameron
Details
report for just function #2 (fbCombineOverU) (23.90 KB, text/plain)
2007-11-29 16:40 UTC, Brian Cameron
Details

Description Brian Cameron 2007-03-01 23:14:56 UTC
I wrote the attached patch that adds mediaLib support to cairo and makes a big difference speeding up evince.  The following is some performance stats showing percentage of time in various functions while starting up evince with a big complicated pdf document and scrolling down about 50 pages.

before

15.19% fbCombineInU
10.30% fbFetchTransformed
 9.15% GfxImageColorMap::getRGBLine
 6.51% fbCombineOverU
 5.12% GfxDeviceGrayColorSpace::getRGBLine
 3.26% fbFetch_a8
 2.64% fbFetchPixel_x8r8g8b8
 2.62% fbCombineMaskU
 1.35% fbStore_a8 

after also now doing inline

16.82 fbFetchTransformed
 9.90 GfxImageColorMap
 6.97 GfxDeviceGrayColorSpace
 6.85 mlib_ImageBlend_ZERO_DA
 3.07 fbFetchPixel_x8r8g8b8
 2.23 fbFetch_a8
 1.67 mlib_VideoColorRGBint_to_BGRAint
 1.26 mlib_ImageBlend_ONE_OMDA
 0.06 mlib_ImageSetStruct
 0.05 mlib_ImageChannelExtract
 0.02 mlib_ImageBlend_SA_ZERO

I suspect that before the maintainers would accept this patch that they would want the mediaLib functions to be separated more like the fbmmx.[ch] files.  But before I do that extra work, I wanted to see if you would accept this patch.

Note that I am also working on a GTK+ mediaLib integration patch which the GTK maintainers say they will accept sometime soon.

  http://bugzilla.gnome.org/show_bug.cgi?id=344813

I reference this only because some of the discussion in that bug explains answers to questions you may have about what mediaLib is and that it is appropriate for integration.
Comment 1 Brian Cameron 2007-03-01 23:15:17 UTC
Created attachment 8938 [details] [review]
mediaLib patch for cairo
Comment 2 Brian Cameron 2007-03-02 00:42:48 UTC
Created attachment 8943 [details]
updated patch


This patch properly surrounds the #pragma with "#ifdef __SUNPRO_C" since this minor performance tuning (not related to mediaLib) works with other compilers.  Caught by James Cheng (from the mediaLib team) in review.
Comment 3 Brian Cameron 2007-03-02 00:44:19 UTC
Created attachment 8945 [details]
updated patch


This patch properly surrounds the #pragma with "#ifdef __SUNPRO_C" since this minor performance tuning (not related to mediaLib) works with other compilers.  Caught by James Cheng (from the mediaLib team) in review.
Comment 4 Brian Cameron 2007-03-02 00:45:39 UTC
Created attachment 8946 [details] [review]
updated patch


This patch properly surrounds the #pragma with "#ifdef __SUNPRO_C" since this minor performance tuning (not related to mediaLib) works with other compilers.  Caught by James Cheng (from the mediaLib team) in review.
Comment 5 Brian Cameron 2007-03-02 00:53:48 UTC
Oh sorry for adding the same patch 3 times - my browser went a bit wonky
Comment 6 Daniel Amelang 2007-03-02 21:15:14 UTC
Sounds like good stuff. What did the built-in cairo performance test suite say about any speedups/slowdowns as a result of this change?
Comment 7 Brian Cameron 2007-03-05 03:14:02 UTC
Created attachment 8980 [details]
performance output with mediaLib

performance output with mediaLib patch
Comment 8 Brian Cameron 2007-03-05 03:37:22 UTC
Created attachment 8981 [details] [review]
output without mediaLib


I have attached the performance output with and without mediaLib.  You can see that mediaLib does improve some functions.

I do notice that the functions that have MEDIALIB tunings aren't exercised a great deal by this performance suite: fbCombineOverU called 3.67% of the time,  fbCombineMaskU 1.67% of the time, fbStore_a8 0.03% of the time, and fbCombineInU not at all.

In fact, the performance test looks like this:

pixman_gradient_color                10.79%
fbFetchSourcePict                    10.49%
fbCombineOverU                        3.67%
gradientPixel                         2.64%
sqrt                                  2.56%
_active_edges_to_traps                2.15%
memcpy                                2.05%
fbCombineMaskU                        1.67%
fbRasterizeEdges8                     1.58%
_cairo_bo_sweep_line_validate         1.36%
_cairo_bo_edge_start_or_continue_trap 1.17%
_cairo_pixman_render_edge_init        1.17%
fbCombineOutReverseU                  1.13%
UDiv                                  1.07%
pixman_fill_rect_32bpp                0.95%
qsort                                 0.93%
fbCombineAddU                         0.91%

So, almost 50% of the total time is in just the above functions.  I wonder if the performance test coverage is a bit poor?

For reference, using mediaLib the percentages change as follows:

fbFetchSourcePict       16.26%
pixman_gradient_color   15.28%
gradientPixel            3.94%
sqrt                     3.80%
_active_edges_to_traps   1.98%
memcpy                   1.92%
fbRasterizeEdges8        1.48%
_cairo_bo_sweep_line_validate         1.12%
_cairo_bo_edge_start_or_continue_trap 1.11%
_cairo_pixman_render_edge_init        1.05%
mlib_ImageBlend_ONE_OMDA 1.05%
_cairo_uint64x64_128_mul 1.01%
UDiv                     0.91%
mlib_ImageBlend_ZERO_DA  0.91%
fbCombineAddU            0.87%
Total runtime 86.531 seconds
Comment 9 Carl Worth 2007-03-05 08:42:41 UTC
(In reply to comment #8)
> I do notice that the functions that have MEDIALIB tunings aren't exercised a
> great deal by this performance suite: fbCombineOverU called 3.67% of the time, 
> fbCombineMaskU 1.67% of the time, fbStore_a8 0.03% of the time, and
> fbCombineInU not at all.
...
> So, almost 50% of the total time is in just the above functions.  I wonder if
> the performance test coverage is a bit poor?

The general strategy we've been taking is to allow the performance suite to grow according to specific performance issues that people encounter or work to improve.

Perctange of time spent in any given function across the entire performance test suite is not an interesting metric at all. We have not done any work to balance that, nor do we report any metrics across the entire test suite.

Instead, as you work to improve specific aspects of cairo's performance, you should be writing new performance test cases, (see cairo/perf/README), to exercise the aspects of interest. After adding the test cases, then you can add your fixes and cairo-perf-diff will then demonstrate the improvement. And, with this approach, the existence of the test case will help us to avoid any future performance regression in this area.

Thanks for your efforts to improve cairo.

-Carl
Comment 10 Brian Cameron 2007-04-03 22:29:46 UTC
Created attachment 9471 [details] [review]
Updated patch


In testing we found that the previous patch didn't work on Sparc because it was using "#ifdef __BIG_ENDIAN__" instead of the correct "#ifdef _BIG_ENDIAN".

This updated patch fixes the problem and works properly on x86 and Sparc.

No comments on this bug report for a while.  Can this mediaLib support go into cairo?
Comment 11 Brian Cameron 2007-05-03 00:46:10 UTC
Created attachment 9845 [details] [review]
updated patch for cairo 1.4.6

The last version of the patch does not apply against cairo 1.4.6 due to a change in pixman/src/Makefile.am.  This patch fixes this minor problem so it applies cleanly against the latest cairo.
Comment 12 Brian Cameron 2007-05-15 18:37:05 UTC
Note that the patch to add mediaLib support to GTK+ was recently accepted and commited, so it would be nice to get this patch upstream as well so mediaLib is better supported out-of-the-box by cairo as well.  Refer to:

http://bugzilla.gnome.org/show_bug.cgi?id=344813
Comment 13 Daniel Amelang 2007-05-15 20:47:23 UTC
A couple things:

1) pixman is currently being extracted from cairo into its own separate library that will be shared with the xserver. So expect to move your efforts in that direction and update your patches accordingly.

2) As Carl said above, it's important to use cairo-perf to report any speedups you've gained. That way, the advantages are clear. So far, the only numbers reported here are relative percentages, which aren't nearly as convincing. In addition, you're not leveraging the statistical methods that Carl has spent a good amount of time on to make cairo-perf numbers as meaningful as possible.

That said, I don't doubt that you'll see some nice speedups since we currently have no SIMD optimized code for Sun architectures. Can't wait until Sun open sources the "fast" version of mediaLib :)
Comment 14 Brian Cameron 2007-07-30 07:28:11 UTC
How do you run cairo-perf?  I've never used it before.  The test case that I notice significant performance improvement is with using many documents in evince.  The tuned functions seem to be hit a lot when viewing PDF documents.  Can you run cairo-perf against evince?

Also, FYI, the SSE2/VIS/etc. mediaLib code is planned to be released under an open license in the near future.  The mediaLib team is working through a few final issues that are necessary before this can happen, but they are working on it.  They aren't just releasing the C code just to be rude.  :)
Comment 15 Carl Worth 2007-07-30 08:40:49 UTC
(In reply to comment #14)
> How do you run cairo-perf?  I've never used it before.

The cairo-perf program is included with the cairo source distribution
in the cairo/perf directory. You can build and run it by simply typing
"make perf" at the top level of the cairo source distribution. Or you
can read cairo/perf/README for details on other ways of running it.

>                                                        The test case that I
> notice significant performance improvement is with using many documents in
> evince.  The tuned functions seem to be hit a lot when viewing PDF documents. 
> Can you run cairo-perf against evince?

No, there's no way to run cairo-perf against evince. It's not that
kind of tool. Instead, cairo-perf is a set of micro-benchmarks that
serve as the performance regression test suite for cairo. We use is to
attempt to ensure that no slowdowns are introduced from one release to
the next. And we also use it to quantify the benefits of any proposed
performance optimizations.

So, if you've got a proposed performance optimization, please run
cairo-perf to quantify its benefit, (and see the discussion of
cairo-perf-diff in the README for more hints on how to do this).

And if you run into a case where you're certain that your change has
some benefit that cairo-perf isn't showing, then it's quite easy to
augment cairo-perf, (and again the README has pointers on how to do
this). So, if you end up needing to do that, you should be able to use
evince to figure out what unique thing it might be doing that might
not be currently exercised by the cairo-perf suite. And if you google
for libcairowrap you might also find it helpful in extracting a
sequence of cairo calls from a complete program.

I hope that's helpful.

-Carl
Comment 16 Brian Cameron 2007-11-28 08:54:05 UTC
The mediaLib team recently released it under CDDL, including all VIS/VIS2/MMX/SSE2 code.  So now, people on Linux might consider seeing how well these mediaLib tunings improve things on Linux.  I have heard that mediaLib builds and runs well on many Linux distros.

The mediaLib 2.5 source code is now available for download
from:
    http://www.opensolaris.org/os/downloads/devpro/

The tarball devpro-mlib-src-20071106.tar.bz2 consists of
source code for building all versions of libmlib.so.2 and
libmlib_sse2.so.2 under /usr/lib.  In other words, all VIS/
VIS2 and MMX/SSE2 code for mediaLib is now open sourced.
It has over 4000 files, or 2.4 million lines of code.

More information about mediaLib can be found at:

    http://www.sun.com/processors/vis/mlib.html

Please send your questions and comments on mediaLib to: 
Comment 17 Brian Cameron 2007-11-28 12:18:43 UTC
Created attachment 12795 [details]
cairo-perf report


Attached please find a diff report generated by running "make check" in the perf directory with the following two test cases:

1) cairo without USE_MEDIALIB defined
2) cairo with USE_MEDIALIB defined

Then I ran cairo-perf-diff-files to generate the attachment. 

I think it shows that many (about 30) functions have 5-14% speedup, and many 
other functions have smaller speedups. 

It seems two functions are slightly slower (paint_similar_rgb_source-512 is 6.27x slower and stroke_solid_rgb_source-128 is 4.02x slower though going from 9.39 to 9.47 doesn't really seem that significant to me).  There are a few other slower functions, all below 1.57x slower, which is probably not significant.

I think this shows that the patch, overall, improves performance significantly.  Can this now go upstream?
Comment 18 Daniel Amelang 2007-11-28 15:30:03 UTC
(In reply to comment #17)
> I think it shows that many (about 30) functions have 5-14% speedup, and many 
> other functions have smaller speedups. 

Actually, it says 5-14 _times_ faster, not 5-14 percent faster. Don't shortchange yourself :)

> It seems two functions are slightly slower (paint_similar_rgb_source-512 is
> 6.27x slower and stroke_solid_rgb_source-128 is 4.02x slower though going from
> 9.39 to 9.47 doesn't really seem that significant to me).  There are a few
> other slower functions, all below 1.57x slower, which is probably not
> significant.

Yea, what the heck is going on with cairo-perf calling a 9.39 -> 9.47 change a "4.02x slowdown"? Carl?

And notice that the top 4 slowdowns are all on the xlib surface, while the majority of the speedups are on the image surface. The 6x slowdown is pretty serious, Maybe it has to do with a inefficient xlib fallback-to-image-surface occuring (especially since the modifications only change pixman, and thus should only show up when an image surface comes into play). Might this have something to do with the fact that mlib images are now being created and destroyed very frequently? So that's something you probably shouldn't ignore.

Along those lines, remember what Carl said here:

https://bugs.freedesktop.org/show_bug.cgi?id=11529#c15

>  Can this now go upstream?
 
FYI, Soeren is the pixman maintainer, so he makes the final call.

Oh, and you have run the regression tests, right?
Comment 19 Brian Cameron 2007-11-28 22:57:45 UTC
Created attachment 12820 [details]
comparing two runs

Thanks for the comments.  Note that I am using Solaris x86, which has RENDER
support.  This bug is not related to the issues with running Solaris on Sparc,
where Xrender is not available (due to using the Xsun Xserver on Sparc).  So
Xrender shouldn't be a factor with any differences discussed on this report.

I think the differences in the slowdown's are red herrings, and perhaps due to
something odd in how the Xserver behaves on Solaris.  I ran two tests, both
with mediaLib enabled and am attaching this. 

---

The attached comparison is checking cairo compiled without mediaLib in both
cases. 

 xlib-rgba      fill_solid_rgb_source-256
    19.94 0.50% ->  19.96 64.96%:  7.68x speedup
 xlib-rgba paint-with-alpha_solid_rgb_source-256
    42.20 0.36% ->  42.24 64.99%:  7.66x speedup

 xlib-rgb   paint_similar_rgba_source-256
    0.28 84.96% ->   1.50 1.12%:  6.98x slowdown
 xlib-rgba      fill_solid_rgb_source-128
    5.86 60.16% ->   5.95 1.64%:  5.92x slowdown

--- 

The last report I shared with you shows the following:

 xlib-rgba paint-with-alpha_solid_rgb_source-512
     167.66 0.22% -> 167.70 65.14%:  7.66x speedup
 xlib-rgba   paint_similar_rgb_source-256
     1.55 3.52% ->   0.37 79.66%:  5.95x speedup

 xlib-rgba   paint_similar_rgb_source-512
     0.80 84.14% ->   4.93 1.46%:  6.27x slowdown
 xlib-rgba    stroke_solid_rgb_source-128
     9.39 52.67% ->   9.47 1.88%:  4.02x slowdown

---

I also ran a 3rd comparison and got these results:

 xlib-rgba paint-with-alpha_solid_rgb_source-256
     42.20 0.36% ->  42.18 64.69%:  7.60x speedup
 xlib-rgba   paint_similar_rgb_source-256
     1.50 2.79% ->   0.30 80.80%:  6.07x speedup

 xlib-rgb   paint_similar_rgba_source-256
     0.28 84.96% ->   1.49 1.68%:  6.93x slowdown
 xlib-rgba      fill_solid_rgb_source-128
     5.86 60.16% ->   5.78 1.62%:  5.79x slowdown

---

Note the following

- In all 3 cases there are two functions that are faster and two
  functions that are slower by roughly the same amount.
- Note that if the test is "xlib-rgb" then the functions are "*rgba*"
  while if the test is "xlib-rgba", then the functions are "*rgb*" (not
  "*rgba*")
- Note that while the functions in the 3 examples tend to be the
  same subset of functions, they aren't the same for each test. 

---

So, I think the changes in these xlib-rgb/xlib-rgba functions are
note related to the patch.  My guess is there is some Xserver behavior
that perhaps causes this?  Like perhaps the Xserver skips frames in
some situations when you test it hard, causing some tests to vary
like this?

Also the fact that the functions that are patched to use MEDIALIB are
not even used in the xlib-rgb/xlib-rgba case.  So, I think we can
ignore the xlib-rgb/xlib-rgba functions as being an artifact of 
something not related to this patch.   

The reports do show, however, that the functions that do use MEDIALIB
show consistant speed-up, some of them fairly significant.  So I think
it is okay to accept this patch.
Comment 20 Daniel Amelang 2007-11-29 13:11:51 UTC
(In reply to comment #19)
> I think the differences in the slowdown's are red herrings, and perhaps due to
> something odd in how the Xserver behaves on Solaris.

That would be my guess, too. You could try running your non-medialib version multiple times to see if you get the strange behavior, in which case you'd know more certainly that your patch doesn't introduce this behavior.

> Also the fact that the functions that are patched to use MEDIALIB are
> not even used in the xlib-rgb/xlib-rgba case.

Probably not, unless you hit an image fallback, which AFAIK only happens with the xlib surface when RENDER isn't available. But you've already pointed out that it is in this case.

> The reports do show, however, that the functions that do use MEDIALIB
> show consistant speed-up, some of them fairly significant.  So I think
> it is okay to accept this patch.

Sure, although it is a little strange that no one has mentioned having run the regression tests with this patch. You have, haven't you?
Comment 21 Brian Cameron 2007-11-29 15:42:50 UTC
Created attachment 12840 [details] [review]
updated patch


Sorry, I didn't realize that there are regression tests.  I went ahead and ran them.  I found problems with 2 of the mediaLib tunings.  This updated patch removes those tunings for now, leaving two others.  I notice that the regression tests generate the same output running without the patch and with this patch.  Though some tests do fail in both cases.  I'll attach the output of the regression tests to this bug report.  Perhaps you can let me know if this may indicate Solaris specific bug(s) we should look into?

I still notice significant performance improvement even though this updated patch has two tunings removed.  I'll attach the performance improvement report next.
Comment 22 Brian Cameron 2007-11-29 15:49:06 UTC
Created attachment 12841 [details]
output of running "make check"


Here is the output from "make check" without mediaLib.  The output when the patch is applied is the same.

I do notice that the ft-text-vertical-layout-type1 and ft-text-vertical-layout-type-3 tests always fail.

Is this expected, or do you have any ideas why these tests might be not working?
I notice that to get "make check" in the src directory to pass, I need to change check-def.sh and check-plt.sh to have "#!/bin/bash" at the top rather than having "#!/bin/sh".  This is because on Linux /bin/sh is a symlink to /bin/bash.  Can this change be made to these two files to make them more portable.  Otherwise I get these errors:

./check-def.sh: !: not found
make[2]: Entering directory `/export/home/packages/BUILD/SUNWgnome-base-libs-2.21.2/i386/cairo-1.4.12/src'
make[2]: `cairo.def' is up to date.
make[2]: Leaving directory `/export/home/packages/BUILD/SUNWgnome-base-libs-2.21.2/i386/cairo-1.4.12/src'
./check-def.sh: bad substitution
FAIL: check-def.sh
./check-plt.sh: !: not found

I also notice that perf/cairo-perf-diff has the same issue and needs to have "#!/bin/sh" changed to "#!/bin/bash" to work on Solrais.
Comment 23 Brian Cameron 2007-11-29 15:58:26 UTC
Created attachment 12842 [details]
performance report for updated patch


Here is the performance report for the updated patch.  You'll notice that many functions report (about 29) report over 6x speedup and 12 more report over 4x speedup.

The slowdown information also looks better than in the last report.
Comment 24 Brian Cameron 2007-11-29 16:39:27 UTC
Created attachment 12851 [details]
report for just function #1 (fbStore_a8)


Here is the performance report for just tuning the fbStore_a8 function.  Ignore the speedup/slowdown for xlib-rgb* functions.  It doesn't look like this really makes much of a difference.  Perhaps I should resubmit the patch without this?

The next report shows that pretty much all of the performance improvement comes from the other change to fbCombineOverU
Comment 25 Brian Cameron 2007-11-29 16:40:08 UTC
Created attachment 12852 [details]
report for just function #2 (fbCombineOverU)


Here is the performance report for fbCombineOverU.  Note that this is where the significant changes are.
Comment 26 Brian Cameron 2007-11-29 16:43:35 UTC
Okay, thanks for being patient with me, and helping me get going with the regression tests and verifying the performance benfits with cairo-perf.

I think based on these tests, I recommend that we add this patch upstream, but only tune the function fbCombineOverU.  This adds no regressions according to the regression tests and improves performance significantly according to cairo-perf.  

Does this make sense?  

I will touch base with the mediaLib team.  The problems I am seeing with the two functions that break the regression tests (fbCombineMaskU and fbCombineInU) may be due to bugs in the mediaLib code.  If so, perhaps I can work with them to fix those bugs, and propose these changes again later.  Or perhaps we can figure out a way to make the code conditionally use the mediaLib functions for the cases where it doesn't cause regressions.  It seems that the mediaLib implementations only cause regressions in the argb case.

Comment 27 Daniel Amelang 2007-11-29 20:23:35 UTC
(In reply to comment #21)
> Created an attachment (id=12840) [details]
> updated patch

Notice what I said 5 months ago (1st bullet):

http://bugs.freedesktop.org/show_bug.cgi?id=10151#c13

Comment 28 Brian Cameron 2007-12-13 17:23:56 UTC
With some help from James Cheng from the mediaLib team I found out why the two 
places where my mediaLib patch was breaking the regression tests (fbCombineMaskU and fbCombineInU).  The problem is that both of these functions use the macro 
FbByteMul, which is defined as follows:

/*
 x_c = (x_c * a) / 255 + y
*/
#define FbByteMulAdd(x, a, y) do {                                \
       CARD32 t = ((x & 0xff00ff) * a) + 0x800080;               \
       t = (t + ((t >> 8) & 0xff00ff)) >> 8;                     \
       t &= 0xff00ff;                                            \
       t += y & 0xff00ff;                                        \
       t |= 0x1000100 - ((t >> 8) & 0xff00ff);                   \
       t &= 0xff00ff;                                            \
                                                                 \
       x = (((x >> 8) & 0xff00ff) * a) + 0x800080;                 \
       x = (x + ((x >> 8) & 0xff00ff)) >> 8;                       \
       x &= 0xff00ff;                                              \
       x += (y >> 8) & 0xff00ff;                                   \
       x |= 0x1000100 - ((t >> 8) & 0xff00ff);                     \
       x &= 0xff00ff;                                              \
       x <<= 8;                                                    \
       x += t;                                                     \
   } while (0) 

  As far as I can see, it actually has +0.5 for rounding, i.e.,

    d1 = (x * a)/255.0 + 0.5;

  Currently, the mediaLib function does this:

    d2 = (x * a)/256.0;

  This is the result

    max12 = 2, max13 = 0, max23 = 2

  i.e., the maximum difference is 2 between d1 and d2.

  BTW, if d1 doesn't have +0.5, then max12 = max13 = 1. 

Looking into why the mediaLib integration for fbCombineOverU does work without
causing any reguression tests, here is some information:

  As far as I can see, it is actually equivalent to the following:

    x_c = (x_c * a) / 255.0 + y + 0.5;
    if (x_c > 255) x_c = 255;

  In other words, fbCombineOverU does something like this (simplified
  for one channel):

    d1 = (d * (255 - a))/255.0 + s + 0.5;
    if (d1 > 0xff) d1 = 0xff;

  while the mediaLib function mlib_ImageBlend_OMSA_ONE_Inp does this:

    d2 = (d * (256 - a))/256.0 + s;    /* mediaLib */
    if (d2 > 0xff) d2 = 0xff;

  Here is the result of test2:

    max12 = 1, max13 = 0, max23 = 1

  i.e., the maximum difference between d1 and d2 is 1, better than the
   other case.  

There is a performance benefit for doing /256 instead of /255, 
especially for VIS implementation.  So is not doing +0.5.  FbByteMul
seems to be a relatively expensive macro for doing /255, although it
might be faster than doing /255 directly on most platforms, which
explains why a mediaLib function call could beat a macro on performance
in some cases.  As you know, mediaLib is mostly opt to performance when
there is a trade-off between precision and performance. 

So it seems that the regression tests are failing due to reasonable off-by-one issues.  Would it make sense to fix the regression tests to allow a bit more fuzz in these cases where these mediaLib tests cause failure?  Or might it make
sense for cairo to adopt the "/256" method since it should work faster in 
general, according to James?

Any thoughs appreciated.
Comment 29 Brian Cameron 2007-12-14 15:02:39 UTC
That last comment was crafted by me from an email thread discussion with a lot of editing.  I asked James to review and he had some additional comments:

Mostly correct, except that the last sentence might cause some
misunderstanding.

"Or might it make
sense for cairo to adopt the "/256" method since it should work faster in
general, according to James?"

I didn't suggest that cairo should adopt the "/256" method.
It is faster, probably obvious to many programmers, but it's
kind of approximation.  The author of the pixman appeared to
have spent quite some effort to make it more accurate, i.e.,
using 255 as 1.0 of alpha.  The MMX/SSE code in pixman uses
similar algorithms so it can keep the same precision.  We
probably can change our SSE2 mediaLib code to do something
like that to get better performance, but it might be too
expensive for the VIS version to use similar algorithms.

You don't have to modify your question there.  But I guess the
answer would be "No" or even "Absolutely NOT!"
Comment 30 Chris Wilson 2008-10-10 07:44:10 UTC
Not sure what the current status is, but this belongs to pixman.
Comment 31 Søren Sandmann Pedersen 2009-06-05 03:33:42 UTC
This bug is obsolete by now.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.