Bug 16647

Summary: XDrawLine performance regression
Product: xorg Reporter: Clemens Eisserer <linuxhippy>
Component: Server/GeneralAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: ewalsh, mcepl, pcjc2, pedretti.fabio, zdenek.kabelac
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Profile running Xorg-1.3
none
Profile running Xorg-1.5
none
workload2 on 1.3
none
workload2 on 1.5
none
The "benchmark" itself
none
minimal benchmark 2.0
none
Possible solution for second benchmark
none
oprofile results of the line benchmark with patch
none
a real-world workload, with patch
none
array-index based devPrivates implementation none

Description Clemens Eisserer 2008-07-09 02:28:14 UTC
Created attachment 17598 [details]
Profile running Xorg-1.3

I've Xorg-1.3 running on Feodra8 (intel-2.1.1) and Xorg-7.4.903 running on Rawhide with Intel-2.2.1.

On the Xorg-1.5 system XDrawLine is faster to screen and a lot faster to an 8-bit pixmap.

Xorg-1.5 - 100.000 lines:
Screen: 460ms
8-bit pixmap: 650ms

Xorg-1.3 - 100.000 lines:
Screen: 360ms
8-bit pixmap: 170ms


I attached profiles created by oprofile.
Comment 1 Clemens Eisserer 2008-07-09 02:29:14 UTC
Created attachment 17599 [details]
Profile running Xorg-1.5
Comment 2 Michel Dänzer 2008-07-09 02:41:33 UTC
Looks like a lot of cycles are burnt in dixLookupPrivate() with 1.5. CC'ing Eamon Walsh, who said he's going to fix this known regression.
Comment 3 Clemens Eisserer 2008-07-09 03:04:52 UTC
Thanks a lot :)

PS: The original report is a bit confusing. I am using Xorg-server-1.4.903 of course, not Xorg-7.4.903.
And "On the Xorg-1.5 system..." should have been "On the Xorg-server-1.3 system", sorry about the confusion.
Comment 4 Clemens Eisserer 2008-07-09 03:50:47 UTC
Created attachment 17603 [details]
workload2 on 1.3
Comment 5 Clemens Eisserer 2008-07-09 03:51:10 UTC
Created attachment 17604 [details]
workload2 on 1.5
Comment 6 Clemens Eisserer 2008-07-09 03:51:36 UTC
Created attachment 17605 [details]
The "benchmark" itself
Comment 7 Clemens Eisserer 2008-07-09 03:52:34 UTC
I tested a different workload, where I draw 12800 lines and 12800 single-height rects to an 8-bit pixmap (this is quite like my real-world workload will look).

This time not many cycles are spent in dixLookupPrivate, but a lot of time is spent in pixman itself.

I attached the profiles, as well as my "benchmark".

Thanks, Clemens
Comment 8 Michel Dänzer 2008-07-09 04:18:26 UTC
(In reply to comment #7)
> This time not many cycles are spent in dixLookupPrivate, but a lot of time is
> spent in pixman itself.

The 'Msecs ellapsed' value varies wildly here, but I don't see any lasting CPU usage that would allow for a useful profile. Is that different from what you're seeing?

Either way, this seems likely to be a different issue from what you reported here initially, so it should probably be tracked separately.

It would also be nice if you could try these with the xserver master branch, which has some EXA optimizations over the 1.5 branch.

> I attached the profiles, as well as my "benchmark".

Next time, please include a Makefile instead of an x86 binary. :) (and add a toplevel directory to the tarball)
Comment 9 Clemens Eisserer 2008-07-09 05:17:11 UTC
Created attachment 17606 [details]
minimal benchmark 2.0
Comment 10 Clemens Eisserer 2008-07-09 05:27:04 UTC
The new version of the benchmark is striped to just do line-drawing and rects and composite the result, nothing more.

There is now an infinite benchmark loop, the strange results from time to time are because the time-code is flawed (it does not cope with overflow).

These are the results I get:
Xorg-1.3: 80ms
Xorg-1.5: 230ms
(GeForce6600: 40ms with nvidia's closed driver, not mine)

> Next time, please include a Makefile instead of an x86 binary. :) (and add a
> toplevel directory to the tarball)
Sorry about that, the toplevel-tarballs also annoy me every time I encounter one.
I created a bash-compile-script, sorry I don't know make.

> Either way, this seems likely to be a different issue from what you reported
> here initially, so it should probably be tracked separately.
I don't know, maybe you could have a look again at the new benchmark?

Thanks a lot, Clemens
Comment 11 Michel Dänzer 2008-07-11 01:11:00 UTC
Created attachment 17632 [details] [review]
Possible solution for second benchmark

Does this patch help for the second benchmark? It greatly reduces the valid region tracking overhead with it here and puts dixLookupPrivate back to the top of the profile. So this probably should have been a separate report, but that may be moot now anyway. :)
Comment 12 Clemens Eisserer 2008-07-11 10:20:20 UTC
Created attachment 17638 [details]
oprofile results of the line benchmark with patch
Comment 13 Clemens Eisserer 2008-07-11 10:20:53 UTC
Created attachment 17639 [details]
a real-world workload, with patch
Comment 14 Clemens Eisserer 2008-07-11 10:23:24 UTC
Thanks a lot for the patch, performance is now on par with Xorg-1.3, even with dixLookupPrivate eating 25% of total cycles.
Sorry that I did not open a seperate report, I was not sure how much both issues are connected.

Thanks again for fixing it, Clemens
Comment 15 Eamon Walsh 2008-07-11 11:03:21 UTC
I'm working on the dixLookupPrivate issue, hope to have a solution sometime soon.

I have an O(1) implementation done but the callers have to be changed slightly to accommodate it.
Comment 16 Zdenek Kabelac 2008-08-29 01:16:13 UTC
Anything new about this bug - dixLookupPrivate seems to take over 50% of Xorg while rendering glxgears  according to oprofile
Comment 17 Daniel Stone 2008-08-29 07:28:24 UTC
On Fri, Aug 29, 2008 at 01:16:13AM -0700, bugzilla-daemon@freedesktop.org wrote:
> Anything new about this bug - dixLookupPrivate seems to take over 50% of Xorg
> while rendering glxgears  according to oprofile

We're in the middle of fixing this.
Comment 18 Eamon Walsh 2008-08-29 13:33:41 UTC
Created attachment 18573 [details] [review]
array-index based devPrivates implementation
Comment 19 Eamon Walsh 2008-08-29 13:34:48 UTC
Please apply the attached patch to the current git master (including the changes from yesterday) and run your performance test again.

You'll need to be using in-tree drivers because out-of-tree drivers may not have changed the devPrivate keys to point to integer storage yet.
Comment 20 Maarten Maathuis 2008-09-12 15:10:52 UTC
I tested this patch, and it seems to bring down the cpu usage of dixLookupPrivate to a more acceptable level, anything special that's keeping it from being applied to master?
Comment 21 Eamon Walsh 2008-09-12 17:07:12 UTC
I went ahead and committed the patch, and send a notice to the Xorg mailing list.

It could affect out-of-tree drivers that need to adjust to it, so I was waiting for some confirmation that it did in fact address the performance issues.
Comment 22 Fabio Pedretti 2008-10-09 03:03:40 UTC
This also appears to have been bakported to 1.5 branch:
http://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.5-branch&id=8ef37c194fa08d3911095299413a42a01162b078

Should be marked as fixed?
Comment 23 Julien Cristau 2008-10-20 05:19:14 UTC
> --- Comment #22 from Fabio <fabio.ped@libero.it>  2008-10-09 03:03:40 PST ---
> This also appears to have been bakported to 1.5 branch:
> http://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.5-branch&id=8ef37c194fa08d3911095299413a42a01162b078
> 
It's been reverted as breaking ABI.
Comment 24 Peter Clifton 2008-12-26 09:48:44 UTC
Michel Dänzer's patch was never applied. Does it still help the performance any?

W.r.t, the dixLookupPrivate issues - which seems to be more the focus of this bug report, I'm still seeing about 11% of world time used up in dixLookupPrivate with xorg 1.5.99.3 for the line rendering test I'm running. (Uses cairo, not XDrawLine)
Comment 25 Michel Dänzer 2009-02-15 08:51:06 UTC
(In reply to comment #24)
> Michel Dänzer's patch was never applied.

I pushed it to the master branch.


> W.r.t, the dixLookupPrivate issues - which seems to be more the focus of this
> bug report, I'm still seeing about 11% of world time used up in
> dixLookupPrivate with xorg 1.5.99.3 for the line rendering test I'm running.
> (Uses cairo, not XDrawLine)

I'm working on reducing the dixLookupPrivate calls in EXA by passing around the private pointers internally where possible. I'll hopefully have it ready for review soon, but I'm not sure how much of the overhead you're seeing it'll eliminate.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.