Bug 23184 - cairo's performance downgrades 4X with server master than server-1.6
cairo's performance downgrades 4X with server master than server-1.6
Status: VERIFIED FIXED
Product: xorg
Classification: Unclassified
Component: Server/General
unspecified
Other Linux (All)
: high normal
Assigned To: Carl Worth
Xorg Project Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-08-06 18:52 UTC by zhao jian
Modified: 2009-09-24 19:53 UTC (History)
0 users

See Also:


Attachments
xorg.0.log (34.53 KB, text/plain)
2009-08-06 18:52 UTC, zhao jian
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description zhao jian 2009-08-06 18:52:05 UTC
Created attachment 28410 [details]
xorg.0.log

System Environment:
----------------------
Platform:       G41
Arch:           x86_64
OSD:            Fedora release 9 (Sulphur)
Libdrm:         (master)5a73f066ba149816cc0fc2de4b97ec4714cf8ebc
Mesa:           (master)03607708b0499816291f0fb0d1c331fbf034f0ba
Xserver:        (master)a85523dc50f392a33a1c00302a0946828bc9249d
Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
Kernel:         (drm-intel-next)0c2e39525b3b53a97a0202c5f35058147e53977e

Bug Description:
---------------------
I test with cairo-perf on G41, find there is regression when test with swfdec-fill-rate-2xaa.trace and swfdec-fill-rate-4xaa.trace. Maybe they are the same issue. The performance data I get with the newest code of 20090729 is 3 times slower than the data with our Q2 release code. And I find the regression is caused by xserver, if I only change the xserver from master to 1.6 branch, it performs much better. 
with code of 20090729:
swfdec-fill-rate-2xaa.trace
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image        swfdec-fill-rate-2xaa   36.754   36.769   0.02%    5/6
[  0]     xlib        swfdec-fill-rate-2xaa  184.982  194.616   2.23%    6/6
swfdec-fill-rate-4xaa.trace
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image        swfdec-fill-rate-4xaa  135.698  135.709   0.01%    6/6
[  0]     xlib        swfdec-fill-rate-4xaa  743.517  744.167   0.05%    6/6
I only change the xserver to server-1.6 branch(606f6dba16d42e3546a82a386d5a01087467b511):
swfdec-fill-rate-2xaa.trace.KMS
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image        swfdec-fill-rate-2xaa   36.642   36.643   0.00%    4/6
[  0]     xlib        swfdec-fill-rate-2xaa   51.859   51.883   0.06%    5/6
swfdec-fill-rate-4xaa.trace.KMS
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image        swfdec-fill-rate-4xaa  135.807  135.948   0.30%    6/6
[  0]     xlib        swfdec-fill-rate-4xaa  199.334  199.804   0.12%    6/6

Reproduce Steps:
---------------------
1. xinit&
2. cairo-perf-trace swfdec-fill-rate-2xaa.trace(swfdec-fill-rate-4xaa.trace)
Comment 1 Carl Worth 2009-09-14 14:45:04 UTC
(In reply to comment #0)
> System Environment:
> ----------------------
> Platform:       G41
> Arch:           x86_64
> OSD:            Fedora release 9 (Sulphur)
> Libdrm:         (master)5a73f066ba149816cc0fc2de4b97ec4714cf8ebc
> Mesa:           (master)03607708b0499816291f0fb0d1c331fbf034f0ba
> Xserver:        (master)a85523dc50f392a33a1c00302a0946828bc9249d
> Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
> Kernel:         (drm-intel-next)0c2e39525b3b53a97a0202c5f35058147e53977e

Thanks for the bug report.

The details above showing the versions at which the regression
first appeared are very appreciated. Thanks!

What's missing is the previously tested versions at which things were
last seen to be working. From a separate report, I believe these are
the working versions:

Last known versions without regression
--------------------------------------
Libdrm:         (master)30449829c0347dc7dbe29acb13e49e2f2cb72ae9
Mesa:           (master)506bacb8e40b0a170a4b620113506925d2333735
Xserver:                (master)b1c3dc6ae226db178420e3b5f297b94afc87c94c
Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
Kernel_unstable:    (drm-intel-next)2a2430f4542467502d39660bfd66b0004fd8d6a9

Let me know if I didn't get those right.

-Carl
Comment 2 zhao jian 2009-09-14 23:35:48 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > System Environment:
> > ----------------------
> > Platform:       G41
> > Arch:           x86_64
> > OSD:            Fedora release 9 (Sulphur)
> > Libdrm:         (master)5a73f066ba149816cc0fc2de4b97ec4714cf8ebc
> > Mesa:           (master)03607708b0499816291f0fb0d1c331fbf034f0ba
> > Xserver:        (master)a85523dc50f392a33a1c00302a0946828bc9249d
> > Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
> > Kernel:         (drm-intel-next)0c2e39525b3b53a97a0202c5f35058147e53977e
> Thanks for the bug report.
> The details above showing the versions at which the regression
> first appeared are very appreciated. Thanks!
> What's missing is the previously tested versions at which things were
> last seen to be working. From a separate report, I believe these are
> the working versions:
> Last known versions without regression
> --------------------------------------
> Libdrm:         (master)30449829c0347dc7dbe29acb13e49e2f2cb72ae9
> Mesa:           (master)506bacb8e40b0a170a4b620113506925d2333735
> Xserver:                (master)b1c3dc6ae226db178420e3b5f297b94afc87c94c
> Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
> Kernel_unstable:    (drm-intel-next)2a2430f4542467502d39660bfd66b0004fd8d6a9
> Let me know if I didn't get those right.
> -Carl

No. Carl, maybe you can pay more attention on my bug description. :)  
I first found this regression with the code of 20090729 compared with our Q2 release. Finally I find it was caused by xserver, that is if I only change 20090729's xserver from master to server-1.6-branch, it works well. You can just try with the commit of 20090729 and compare server on master branch and on server-1.6-branch. 
And the code of 20090729 is: 
Libdrm:         (master)5a73f066ba149816cc0fc2de4b97ec4714cf8ebc
Mesa:           (master)03607708b0499816291f0fb0d1c331fbf034f0ba
Xserver:        (master)a85523dc50f392a33a1c00302a0946828bc9249d        (bad)
Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
Kernel:         (drm-intel-next)2a2430f4542467502d39660bfd66b0004fd8d6a9 
Xserver:   (server-1.6-branch) 606f6dba16d42e3546a82a386d5a01087467b511 (good)
Comment 3 Carl Worth 2009-09-16 09:34:23 UTC
(In reply to comment #2)
> No. Carl, maybe you can pay more attention on my bug description. :)

Yes, clearly I need to do that.

To help me avoid mistakes in the future, it still would be helpful to have both "before" and "after" git commit IDs for any regressions identified. Thanks!

-Carl
Comment 4 Carl Worth 2009-09-16 16:55:43 UTC
Here are the results of my attempt to reproduce this:

System environment
------------------
Platform:		GM965 (Lenovo Thinkpad x200s)
Arch:			x86
OSD:			Debian unstable
xf86-video-intel:	master: b8c5c996e888485c3a16d645c8490592534a7882
cairo:			master: 56c9b2de7a2b93b2e0c59cf98326d8c0d4d508ba
cairo-traces:		master: b889dfc97c585d737b1b6ab139c0dbcd1ef01cf4

I tested with cairo-perf-trace. I first trimmed down the testcases of
interest with:

./csi-trace --trim=10 < full/swfdec-fill-rate-2xaa.trace > benchmark/swfdec-fill-rate-2xaa.trace
./csi-trace --trim=10 < full/swfdec-fill-rate-4xaa.trace > benchmark/swfdec-fill-rate-4xaa.trace

That makes each take only about 10 seconds on the image backend, which
just makes it faster to go trhough many runs quickly.

I then tested two X server versions (master and 1.6) with the
following results:

xserver master (b8c5c996e888485c3a16d645c8490592534a7882)
---------------------------------------------------------
$ CAIRO_TEST_TARGET="image,xlib" ./cairo-perf-trace -i 3 ./cairo-traces/benchmark/swfdec-fill-rate-2xaa.trace ./cairo-traces/benchmark/swfdec-fill-rate-4xaa.trace
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image        swfdec-fill-rate-2xaa   18.682   18.683   0.13%    3/3
[  1]    image        swfdec-fill-rate-4xaa   19.388   19.396   0.02%    3/3
[  0]     xlib        swfdec-fill-rate-2xaa   33.758   34.072   2.91%    3/3
[  1]     xlib        swfdec-fill-rate-4xaa   37.228   37.324   0.28%    3/3

xserver 1.6 (606f6dba16d42e3546a82a386d5a01087467b511)
------------------------------------------------------
$ CAIRO_TEST_TARGET="image,xlib" ./cairo-perf-trace -i 3 ./cairo-traces/benchmark/swfdec-fill-rate-2xaa.trace ./cairo-traces/benchmark/swfdec-fill-rate-4xaa.trace 
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image        swfdec-fill-rate-2xaa   18.569   18.639   0.19%    3/3
[  1]    image        swfdec-fill-rate-4xaa   19.232   19.238   0.03%    3/3
[  0]     xlib        swfdec-fill-rate-2xaa   20.165   20.168   0.44%    3/3
[  1]     xlib        swfdec-fill-rate-4xaa   24.654   24.816   0.43%    3/3

So on this system I have reproduced a slowdown with the current master
X server, (though not quite as dramatic as the 4x of the original bug
report). That could be from different CPU speed affecting the change,
due to the trimming, etc.

I'll bisect the xserver next to identify a commit introducing the
performance regression.

-Carl
Comment 5 zhao jian 2009-09-16 21:39:01 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > No. Carl, maybe you can pay more attention on my bug description. :)
> Yes, clearly I need to do that.
> To help me avoid mistakes in the future, it still would be helpful to have both
> "before" and "after" git commit IDs for any regressions identified. Thanks!
> -Carl

OK. I will list both "before" and "after" git commit IDs. :) 
Comment 6 Carl Worth 2009-09-22 11:31:20 UTC
I bisected this change through the X server and found that the commit
causing the performance regression was simply the commit changing the
version number of the X server.

The issue is that cairo is querying the X server and changing its
behavior depending on the X server version. So in one sense, this
isn't a driver bug at all, since it's the application that is actually
doing something different.

But Chris Wilson took a different approach and said, "But still, with
the new X server cairo is doing what it *should* have been doing all
along, (and simply wasn't doing to avoid X server bugs). So why is it
actually slower?.

Chris then answered this with the following commit:

commit 57fc09cef28bad2e3e8455b93ef2927118f8a3a3
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Sep 20 01:02:39 2009 +0100

    Avoid fallbacks for a1 src/mask

    Carl Worth did the hard work in identifying that the regression in
    cairo between X.org 1.6 and 1.7 was caused by cairo sending an a1
    mask to the server in 1.7 whereas in 1.6 cairo used local fallbacks
    (as the source was using RepeatPad, which triggers cairo's
    'buggy_pad_reflect' fallback for X.org 1.6). This was causing the driver
    to do a fallback to handle the a1 mask instead, which due to the GPU
    pipeline stall is much more expensive than the equivalent fallback in
    cairo.

    Reference:
      cairo's performance downgrades 4X with server master than server-1.6.
      https://bugs.freedesktop.org/show_bug.cgi?id=23184

    The fix is a relatively simple extension of the current
    uxa_picture_from_pixman_image() to use CompositePicture() instead of
    CopyArea() when we need to convert to a new format.

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 7 zhao jian 2009-09-24 19:53:14 UTC
With the newest code, the swfdec-fill-rate-2xaa.trace and swfdec-fill-rate-4xaa.trace's performance data improve amazingly 10X~15X, from about 5X slower than its image backend to 2X faster than its image backend now. So verified. 
The data with 0917's code:
swfdec-fill-rate-2xaa.trace
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image        swfdec-fill-rate-2xaa   26.087   26.137   0.12%    6/6
[  0]     xlib        swfdec-fill-rate-2xaa  125.475  125.515   0.03%    6/6
swfdec-fill-rate-4xaa.trace
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image        swfdec-fill-rate-4xaa   93.333   93.498   0.08%    6/6
[  0]     xlib        swfdec-fill-rate-4xaa  501.693  501.919   0.04%    6/6
The data with 0924's code:
swfdec-fill-rate-2xaa.trace
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image        swfdec-fill-rate-2xaa   26.122   26.129   0.08%    5/6
[  0]     xlib        swfdec-fill-rate-2xaa   10.566   10.687   0.68%    6/6
swfdec-fill-rate-4xaa.trace
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image        swfdec-fill-rate-4xaa   93.288   93.325   0.02%    5/6
[  0]     xlib        swfdec-fill-rate-4xaa   34.080   34.162   0.16%    5/6