Summary: | [bisected drm-intel-next]Performance of 3D GAME urbanterror had a regression | ||
---|---|---|---|
Product: | DRI | Reporter: | wang,jinjin <jinjin.wang> |
Component: | DRM/Intel | Assignee: | Chris Wilson <chris> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | jbarnes |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
wang,jinjin
2010-10-19 23:09:50 UTC
some information: 1. That bug also affects the performance of openarena and cairo-perf to down 2. In the process of bisect, I found the #bug30764 may be fixed by a commit. For with the comimit [Kernel: (drm-intel-next)f684960ed5b902994ba6540138d910f5caf7ea2a] urbenterror run over with no X error . Yikes, it was a serious throughput win here. Since we go from a vmalloc+memcpy to nxmemcpy, it should have always been a win.... Which platforms? And cairo-perf using xlib? I see a 20% perf decrease in firefox-planet-gnome on g45 but no variation on q35. Judging from the perf profile the discrepancy appears in pwrite and not the vmalloc. Given the current state of the i965 ddx, I'm not that concerned that its bad behaviour is further penalised, though I still want to understand just where the perf is going since the delta in the profile does not seem to be 20%... The more important question is what happened with urbanterror? I see a 54->47fps drop on pts-urbanterror g45. Not the change I was expecting. Watching the profile, it looks like before the commit, urbanterror is often CPU bound with the vmalloc being at the top of the profile (% CPU time at least). After the patch, the vmalloc related work is eliminated and urbanterror is not as CPU bound. Yet is slower. That was the clue necessary to look at perf trace i915_gem_request_wait_begin and wonder why we were mapping the relocation tree back into the GTT domain every time... commit b5dc608c98d929abbf2fe932ed07b3c868d83342 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Oct 20 20:59:57 2010 +0100 drm/i915: Copy the updated reloc->presumed_offset back to the user If the userspace driver is using a constant relocation array with a static buffer, they will pass the same relocation array back to the kernel. So we *do* need to update the presumed offset value in those relocations to reflect the current object so that they remain correct with future batchbuffers and we avoid the necessity of having to suspend execution and perform redundant relocations. Fixes the regression introduced by 12f889c for applications using absolute addressing on trees of buffer (i.e. the current consumers of libdrm_intel.so). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30996 Reported-by: Wang, Jinjin <jinjin.wang@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Thanks for your efficiency, Chris. Closing old verified+fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.