Bug 16647 - XDrawLine performance regression
Summary: XDrawLine performance regression
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-09 02:28 UTC by Clemens Eisserer
Modified: 2009-02-15 08:51 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Profile running Xorg-1.3 (1005 bytes, text/plain)
2008-07-09 02:28 UTC, Clemens Eisserer
no flags Details
Profile running Xorg-1.5 (655 bytes, text/plain)
2008-07-09 02:29 UTC, Clemens Eisserer
no flags Details
workload2 on 1.3 (677 bytes, text/plain)
2008-07-09 03:50 UTC, Clemens Eisserer
no flags Details
workload2 on 1.5 (749 bytes, text/plain)
2008-07-09 03:51 UTC, Clemens Eisserer
no flags Details
The "benchmark" itself (40.01 KB, application/x-gzip)
2008-07-09 03:51 UTC, Clemens Eisserer
no flags Details
minimal benchmark 2.0 (10.00 KB, application/x-tar)
2008-07-09 05:17 UTC, Clemens Eisserer
no flags Details
Possible solution for second benchmark (3.54 KB, patch)
2008-07-11 01:11 UTC, Michel Dänzer
no flags Details | Splinter Review
oprofile results of the line benchmark with patch (581 bytes, text/plain)
2008-07-11 10:20 UTC, Clemens Eisserer
no flags Details
a real-world workload, with patch (1023 bytes, text/plain)
2008-07-11 10:20 UTC, Clemens Eisserer
no flags Details
array-index based devPrivates implementation (7.72 KB, patch)
2008-08-29 13:33 UTC, Eamon Walsh
no flags Details | Splinter Review

Description Clemens Eisserer 2008-07-09 02:28:14 UTC
Created attachment 17598 [details]
Profile running Xorg-1.3

I've Xorg-1.3 running on Feodra8 (intel-2.1.1) and Xorg-7.4.903 running on Rawhide with Intel-2.2.1.

On the Xorg-1.5 system XDrawLine is faster to screen and a lot faster to an 8-bit pixmap.

Xorg-1.5 - 100.000 lines:
Screen: 460ms
8-bit pixmap: 650ms

Xorg-1.3 - 100.000 lines:
Screen: 360ms
8-bit pixmap: 170ms


I attached profiles created by oprofile.
Comment 1 Clemens Eisserer 2008-07-09 02:29:14 UTC
Created attachment 17599 [details]
Profile running Xorg-1.5
Comment 2 Michel Dänzer 2008-07-09 02:41:33 UTC
Looks like a lot of cycles are burnt in dixLookupPrivate() with 1.5. CC'ing Eamon Walsh, who said he's going to fix this known regression.
Comment 3 Clemens Eisserer 2008-07-09 03:04:52 UTC
Thanks a lot :)

PS: The original report is a bit confusing. I am using Xorg-server-1.4.903 of course, not Xorg-7.4.903.
And "On the Xorg-1.5 system..." should have been "On the Xorg-server-1.3 system", sorry about the confusion.
Comment 4 Clemens Eisserer 2008-07-09 03:50:47 UTC
Created attachment 17603 [details]
workload2 on 1.3
Comment 5 Clemens Eisserer 2008-07-09 03:51:10 UTC
Created attachment 17604 [details]
workload2 on 1.5
Comment 6 Clemens Eisserer 2008-07-09 03:51:36 UTC
Created attachment 17605 [details]
The "benchmark" itself
Comment 7 Clemens Eisserer 2008-07-09 03:52:34 UTC
I tested a different workload, where I draw 12800 lines and 12800 single-height rects to an 8-bit pixmap (this is quite like my real-world workload will look).

This time not many cycles are spent in dixLookupPrivate, but a lot of time is spent in pixman itself.

I attached the profiles, as well as my "benchmark".

Thanks, Clemens
Comment 8 Michel Dänzer 2008-07-09 04:18:26 UTC
(In reply to comment #7)
> This time not many cycles are spent in dixLookupPrivate, but a lot of time is
> spent in pixman itself.

The 'Msecs ellapsed' value varies wildly here, but I don't see any lasting CPU usage that would allow for a useful profile. Is that different from what you're seeing?

Either way, this seems likely to be a different issue from what you reported here initially, so it should probably be tracked separately.

It would also be nice if you could try these with the xserver master branch, which has some EXA optimizations over the 1.5 branch.

> I attached the profiles, as well as my "benchmark".

Next time, please include a Makefile instead of an x86 binary. :) (and add a toplevel directory to the tarball)
Comment 9 Clemens Eisserer 2008-07-09 05:17:11 UTC
Created attachment 17606 [details]
minimal benchmark 2.0
Comment 10 Clemens Eisserer 2008-07-09 05:27:04 UTC
The new version of the benchmark is striped to just do line-drawing and rects and composite the result, nothing more.

There is now an infinite benchmark loop, the strange results from time to time are because the time-code is flawed (it does not cope with overflow).

These are the results I get:
Xorg-1.3: 80ms
Xorg-1.5: 230ms
(GeForce6600: 40ms with nvidia's closed driver, not mine)

> Next time, please include a Makefile instead of an x86 binary. :) (and add a
> toplevel directory to the tarball)
Sorry about that, the toplevel-tarballs also annoy me every time I encounter one.
I created a bash-compile-script, sorry I don't know make.

> Either way, this seems likely to be a different issue from what you reported
> here initially, so it should probably be tracked separately.
I don't know, maybe you could have a look again at the new benchmark?

Thanks a lot, Clemens
Comment 11 Michel Dänzer 2008-07-11 01:11:00 UTC
Created attachment 17632 [details] [review]
Possible solution for second benchmark

Does this patch help for the second benchmark? It greatly reduces the valid region tracking overhead with it here and puts dixLookupPrivate back to the top of the profile. So this probably should have been a separate report, but that may be moot now anyway. :)
Comment 12 Clemens Eisserer 2008-07-11 10:20:20 UTC
Created attachment 17638 [details]
oprofile results of the line benchmark with patch
Comment 13 Clemens Eisserer 2008-07-11 10:20:53 UTC
Created attachment 17639 [details]
a real-world workload, with patch
Comment 14 Clemens Eisserer 2008-07-11 10:23:24 UTC
Thanks a lot for the patch, performance is now on par with Xorg-1.3, even with dixLookupPrivate eating 25% of total cycles.
Sorry that I did not open a seperate report, I was not sure how much both issues are connected.

Thanks again for fixing it, Clemens
Comment 15 Eamon Walsh 2008-07-11 11:03:21 UTC
I'm working on the dixLookupPrivate issue, hope to have a solution sometime soon.

I have an O(1) implementation done but the callers have to be changed slightly to accommodate it.
Comment 16 Zdenek Kabelac 2008-08-29 01:16:13 UTC
Anything new about this bug - dixLookupPrivate seems to take over 50% of Xorg while rendering glxgears  according to oprofile
Comment 17 Daniel Stone 2008-08-29 07:28:24 UTC
On Fri, Aug 29, 2008 at 01:16:13AM -0700, bugzilla-daemon@freedesktop.org wrote:
> Anything new about this bug - dixLookupPrivate seems to take over 50% of Xorg
> while rendering glxgears  according to oprofile

We're in the middle of fixing this.
Comment 18 Eamon Walsh 2008-08-29 13:33:41 UTC
Created attachment 18573 [details] [review]
array-index based devPrivates implementation
Comment 19 Eamon Walsh 2008-08-29 13:34:48 UTC
Please apply the attached patch to the current git master (including the changes from yesterday) and run your performance test again.

You'll need to be using in-tree drivers because out-of-tree drivers may not have changed the devPrivate keys to point to integer storage yet.
Comment 20 Maarten Maathuis 2008-09-12 15:10:52 UTC
I tested this patch, and it seems to bring down the cpu usage of dixLookupPrivate to a more acceptable level, anything special that's keeping it from being applied to master?
Comment 21 Eamon Walsh 2008-09-12 17:07:12 UTC
I went ahead and committed the patch, and send a notice to the Xorg mailing list.

It could affect out-of-tree drivers that need to adjust to it, so I was waiting for some confirmation that it did in fact address the performance issues.
Comment 22 Fabio Pedretti 2008-10-09 03:03:40 UTC
This also appears to have been bakported to 1.5 branch:
http://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.5-branch&id=8ef37c194fa08d3911095299413a42a01162b078

Should be marked as fixed?
Comment 23 Julien Cristau 2008-10-20 05:19:14 UTC
> --- Comment #22 from Fabio <fabio.ped@libero.it>  2008-10-09 03:03:40 PST ---
> This also appears to have been bakported to 1.5 branch:
> http://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.5-branch&id=8ef37c194fa08d3911095299413a42a01162b078
> 
It's been reverted as breaking ABI.
Comment 24 Peter Clifton 2008-12-26 09:48:44 UTC
Michel Dänzer's patch was never applied. Does it still help the performance any?

W.r.t, the dixLookupPrivate issues - which seems to be more the focus of this bug report, I'm still seeing about 11% of world time used up in dixLookupPrivate with xorg 1.5.99.3 for the line rendering test I'm running. (Uses cairo, not XDrawLine)
Comment 25 Michel Dänzer 2009-02-15 08:51:06 UTC
(In reply to comment #24)
> Michel Dänzer's patch was never applied.

I pushed it to the master branch.


> W.r.t, the dixLookupPrivate issues - which seems to be more the focus of this
> bug report, I'm still seeing about 11% of world time used up in
> dixLookupPrivate with xorg 1.5.99.3 for the line rendering test I'm running.
> (Uses cairo, not XDrawLine)

I'm working on reducing the dixLookupPrivate calls in EXA by passing around the private pointers internally where possible. I'll hopefully have it ready for review soon, but I'm not sure how much of the overhead you're seeing it'll eliminate.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.