System Environment: -------------------------- Libdrm: (master)2.4.22-3-g09b1062628f2cbddb3ebae20e7b3b8a0a93acebf Mesa: (master)9e96b695b0bc59e01e69fd266f542dc3948114ad Xserver: (master)xorg-server-1.9.0-148-g5aff712a8d2eb9f965ecbb93216cc0bcdc327ae6 Xf86_video_intel: (master)2.12.902-23-ga1c54f69643671ce296c57d132852e9846cc41d3 Cairo: (master)0d93468efc7f7337b63c0cd746d5185e14d345f1 Kernel: (drm-intel-next) 12f889c548c7521bcb21b3439ac6a1bb507357d4 Detailed description: -------------------------------- With the newest commit 12f889c548c7521bcb21b3439ac6a1bb507357d4, the performance of urbenterror had a regression from 53fps down to 35fps. I found the culprit was kernel and bisected it to get the first bad commit. [The first bad commit message: commit 2549d6c26ce1c85a76990b972a2c7e8f440455cd Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Oct 14 12:10:41 2010 +0100 drm/i915: Avoid vmallocing a buffer for the relocations] Reproduce steps: ---------------- 1.xinit& 2. vblank_mode=0 urbanterror +timedemo 1 +set demodone 'quit' +set demoloop1 'demo pts1; set nextdemo vstr demodone' +vstr demoloop1 +set r_customwidth 1024 +set r_customheight 768
some information: 1. That bug also affects the performance of openarena and cairo-perf to down 2. In the process of bisect, I found the #bug30764 may be fixed by a commit. For with the comimit [Kernel: (drm-intel-next)f684960ed5b902994ba6540138d910f5caf7ea2a] urbenterror run over with no X error .
Yikes, it was a serious throughput win here. Since we go from a vmalloc+memcpy to nxmemcpy, it should have always been a win.... Which platforms? And cairo-perf using xlib?
I see a 20% perf decrease in firefox-planet-gnome on g45 but no variation on q35. Judging from the perf profile the discrepancy appears in pwrite and not the vmalloc. Given the current state of the i965 ddx, I'm not that concerned that its bad behaviour is further penalised, though I still want to understand just where the perf is going since the delta in the profile does not seem to be 20%... The more important question is what happened with urbanterror?
I see a 54->47fps drop on pts-urbanterror g45. Not the change I was expecting.
Watching the profile, it looks like before the commit, urbanterror is often CPU bound with the vmalloc being at the top of the profile (% CPU time at least). After the patch, the vmalloc related work is eliminated and urbanterror is not as CPU bound. Yet is slower.
That was the clue necessary to look at perf trace i915_gem_request_wait_begin and wonder why we were mapping the relocation tree back into the GTT domain every time... commit b5dc608c98d929abbf2fe932ed07b3c868d83342 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Oct 20 20:59:57 2010 +0100 drm/i915: Copy the updated reloc->presumed_offset back to the user If the userspace driver is using a constant relocation array with a static buffer, they will pass the same relocation array back to the kernel. So we *do* need to update the presumed offset value in those relocations to reflect the current object so that they remain correct with future batchbuffers and we avoid the necessity of having to suspend execution and perform redundant relocations. Fixes the regression introduced by 12f889c for applications using absolute addressing on trees of buffer (i.e. the current consumers of libdrm_intel.so). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30996 Reported-by: Wang, Jinjin <jinjin.wang@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Thanks for your efficiency, Chris.
Closing old verified+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.