Summary: | XDrawLine performance regression | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Clemens Eisserer <linuxhippy> | ||||||||||||||||||||||
Component: | Server/General | Assignee: | Xorg Project Team <xorg-team> | ||||||||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||||||||
Severity: | normal | ||||||||||||||||||||||||
Priority: | medium | CC: | ewalsh, mcepl, pcjc2, pedretti.fabio, zdenek.kabelac | ||||||||||||||||||||||
Version: | git | ||||||||||||||||||||||||
Hardware: | Other | ||||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||
Attachments: |
|
Created attachment 17599 [details]
Profile running Xorg-1.5
Looks like a lot of cycles are burnt in dixLookupPrivate() with 1.5. CC'ing Eamon Walsh, who said he's going to fix this known regression. Thanks a lot :) PS: The original report is a bit confusing. I am using Xorg-server-1.4.903 of course, not Xorg-7.4.903. And "On the Xorg-1.5 system..." should have been "On the Xorg-server-1.3 system", sorry about the confusion. Created attachment 17603 [details]
workload2 on 1.3
Created attachment 17604 [details]
workload2 on 1.5
Created attachment 17605 [details]
The "benchmark" itself
I tested a different workload, where I draw 12800 lines and 12800 single-height rects to an 8-bit pixmap (this is quite like my real-world workload will look). This time not many cycles are spent in dixLookupPrivate, but a lot of time is spent in pixman itself. I attached the profiles, as well as my "benchmark". Thanks, Clemens (In reply to comment #7) > This time not many cycles are spent in dixLookupPrivate, but a lot of time is > spent in pixman itself. The 'Msecs ellapsed' value varies wildly here, but I don't see any lasting CPU usage that would allow for a useful profile. Is that different from what you're seeing? Either way, this seems likely to be a different issue from what you reported here initially, so it should probably be tracked separately. It would also be nice if you could try these with the xserver master branch, which has some EXA optimizations over the 1.5 branch. > I attached the profiles, as well as my "benchmark". Next time, please include a Makefile instead of an x86 binary. :) (and add a toplevel directory to the tarball) Created attachment 17606 [details]
minimal benchmark 2.0
The new version of the benchmark is striped to just do line-drawing and rects and composite the result, nothing more. There is now an infinite benchmark loop, the strange results from time to time are because the time-code is flawed (it does not cope with overflow). These are the results I get: Xorg-1.3: 80ms Xorg-1.5: 230ms (GeForce6600: 40ms with nvidia's closed driver, not mine) > Next time, please include a Makefile instead of an x86 binary. :) (and add a > toplevel directory to the tarball) Sorry about that, the toplevel-tarballs also annoy me every time I encounter one. I created a bash-compile-script, sorry I don't know make. > Either way, this seems likely to be a different issue from what you reported > here initially, so it should probably be tracked separately. I don't know, maybe you could have a look again at the new benchmark? Thanks a lot, Clemens Created attachment 17632 [details] [review] Possible solution for second benchmark Does this patch help for the second benchmark? It greatly reduces the valid region tracking overhead with it here and puts dixLookupPrivate back to the top of the profile. So this probably should have been a separate report, but that may be moot now anyway. :) Created attachment 17638 [details]
oprofile results of the line benchmark with patch
Created attachment 17639 [details]
a real-world workload, with patch
Thanks a lot for the patch, performance is now on par with Xorg-1.3, even with dixLookupPrivate eating 25% of total cycles. Sorry that I did not open a seperate report, I was not sure how much both issues are connected. Thanks again for fixing it, Clemens I'm working on the dixLookupPrivate issue, hope to have a solution sometime soon. I have an O(1) implementation done but the callers have to be changed slightly to accommodate it. Anything new about this bug - dixLookupPrivate seems to take over 50% of Xorg while rendering glxgears according to oprofile On Fri, Aug 29, 2008 at 01:16:13AM -0700, bugzilla-daemon@freedesktop.org wrote: > Anything new about this bug - dixLookupPrivate seems to take over 50% of Xorg > while rendering glxgears according to oprofile We're in the middle of fixing this. Created attachment 18573 [details] [review] array-index based devPrivates implementation Please apply the attached patch to the current git master (including the changes from yesterday) and run your performance test again. You'll need to be using in-tree drivers because out-of-tree drivers may not have changed the devPrivate keys to point to integer storage yet. I tested this patch, and it seems to bring down the cpu usage of dixLookupPrivate to a more acceptable level, anything special that's keeping it from being applied to master? I went ahead and committed the patch, and send a notice to the Xorg mailing list. It could affect out-of-tree drivers that need to adjust to it, so I was waiting for some confirmation that it did in fact address the performance issues. This also appears to have been bakported to 1.5 branch: http://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.5-branch&id=8ef37c194fa08d3911095299413a42a01162b078 Should be marked as fixed? > --- Comment #22 from Fabio <fabio.ped@libero.it> 2008-10-09 03:03:40 PST ---
> This also appears to have been bakported to 1.5 branch:
> http://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.5-branch&id=8ef37c194fa08d3911095299413a42a01162b078
>
It's been reverted as breaking ABI.
Michel Dänzer's patch was never applied. Does it still help the performance any? W.r.t, the dixLookupPrivate issues - which seems to be more the focus of this bug report, I'm still seeing about 11% of world time used up in dixLookupPrivate with xorg 1.5.99.3 for the line rendering test I'm running. (Uses cairo, not XDrawLine) (In reply to comment #24) > Michel Dänzer's patch was never applied. I pushed it to the master branch. > W.r.t, the dixLookupPrivate issues - which seems to be more the focus of this > bug report, I'm still seeing about 11% of world time used up in > dixLookupPrivate with xorg 1.5.99.3 for the line rendering test I'm running. > (Uses cairo, not XDrawLine) I'm working on reducing the dixLookupPrivate calls in EXA by passing around the private pointers internally where possible. I'll hopefully have it ready for review soon, but I'm not sure how much of the overhead you're seeing it'll eliminate. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 17598 [details] Profile running Xorg-1.3 I've Xorg-1.3 running on Feodra8 (intel-2.1.1) and Xorg-7.4.903 running on Rawhide with Intel-2.2.1. On the Xorg-1.5 system XDrawLine is faster to screen and a lot faster to an 8-bit pixmap. Xorg-1.5 - 100.000 lines: Screen: 460ms 8-bit pixmap: 650ms Xorg-1.3 - 100.000 lines: Screen: 360ms 8-bit pixmap: 170ms I attached profiles created by oprofile.