Bug 30996 - [bisected drm-intel-next]Performance of 3D GAME urbanterror had a regression
Summary: [bisected drm-intel-next]Performance of 3D GAME urbanterror had a regression
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-19 23:09 UTC by wang,jinjin
Modified: 2016-12-07 16:16 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description wang,jinjin 2010-10-19 23:09:50 UTC
System Environment:
--------------------------
Libdrm:		(master)2.4.22-3-g09b1062628f2cbddb3ebae20e7b3b8a0a93acebf
 Mesa:		(master)9e96b695b0bc59e01e69fd266f542dc3948114ad
 Xserver:		(master)xorg-server-1.9.0-148-g5aff712a8d2eb9f965ecbb93216cc0bcdc327ae6
 Xf86_video_intel:		(master)2.12.902-23-ga1c54f69643671ce296c57d132852e9846cc41d3
 Cairo:		(master)0d93468efc7f7337b63c0cd746d5185e14d345f1
 Kernel:	(drm-intel-next) 12f889c548c7521bcb21b3439ac6a1bb507357d4

Detailed description:
--------------------------------
With the newest commit 12f889c548c7521bcb21b3439ac6a1bb507357d4, the performance of urbenterror had a regression from 53fps down to 35fps. I found the culprit was kernel and bisected it to get the first bad commit.
[The first bad commit message:
commit 2549d6c26ce1c85a76990b972a2c7e8f440455cd
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Oct 14 12:10:41 2010 +0100

    drm/i915: Avoid vmallocing a buffer for the relocations]

Reproduce steps:
----------------
1.xinit& 
2. vblank_mode=0 urbanterror +timedemo 1 +set demodone 'quit' +set demoloop1 'demo pts1; set nextdemo vstr demodone' +vstr demoloop1 +set r_customwidth 1024 +set r_customheight 768
Comment 1 wang,jinjin 2010-10-19 23:17:14 UTC
some information:

1. That bug also affects the performance of openarena and cairo-perf to down
2. In the process of bisect, I found the #bug30764 may be fixed by a commit. For with the comimit 
[Kernel: (drm-intel-next)f684960ed5b902994ba6540138d910f5caf7ea2a]
urbenterror run over with no X error .
Comment 2 Chris Wilson 2010-10-20 01:41:30 UTC
Yikes, it was a serious throughput win here. Since we go from a vmalloc+memcpy to nxmemcpy, it should have always been a win....

Which platforms? And cairo-perf using xlib?
Comment 3 Chris Wilson 2010-10-20 08:32:08 UTC
I see a 20% perf decrease in firefox-planet-gnome on g45 but no variation on q35. Judging from the perf profile the discrepancy appears in pwrite and not the vmalloc. Given the current state of the i965 ddx, I'm not that concerned that its bad behaviour is further penalised, though I still want to understand just where the perf is going since the delta in the profile does not seem to be 20%...


The more important question is what happened with urbanterror?
Comment 4 Chris Wilson 2010-10-20 11:05:02 UTC
I see a 54->47fps drop on pts-urbanterror g45. Not the change I was expecting.
Comment 5 Chris Wilson 2010-10-20 12:13:10 UTC
Watching the profile, it looks like before the commit, urbanterror is often CPU bound with the vmalloc being at the top of the profile (% CPU time at least). After the patch, the vmalloc related work is eliminated and urbanterror is not as CPU bound. Yet is slower.
Comment 6 Chris Wilson 2010-10-20 12:59:35 UTC
That was the clue necessary to look at perf trace i915_gem_request_wait_begin and wonder why we were mapping the relocation tree back into the GTT domain every time...

commit b5dc608c98d929abbf2fe932ed07b3c868d83342
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Oct 20 20:59:57 2010 +0100

    drm/i915: Copy the updated reloc->presumed_offset back to the user
    
    If the userspace driver is using a constant relocation array with a
    static buffer, they will pass the same relocation array back to the
    kernel. So we *do* need to update the presumed offset value in those
    relocations to reflect the current object so that they remain correct
    with future batchbuffers and we avoid the necessity of having to suspend
    execution and perform redundant relocations.
    
    Fixes the regression introduced by 12f889c for applications using
    absolute addressing on trees of buffer (i.e. the current consumers of
    libdrm_intel.so).
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30996
    Reported-by: Wang, Jinjin <jinjin.wang@intel.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 7 wang,jinjin 2010-10-22 00:23:22 UTC
Thanks for your efficiency, Chris.
Comment 8 Jari Tahvanainen 2016-12-07 16:16:48 UTC
Closing old verified+fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.