Created attachment 98105 [details] dmesg system Environment: -------------------------- Platform: BDW Kernel:(drm-intel-nightly)11ddb598492d1f97f894495eea398febb41a9eb0 Bug detailed description: ----------------------------- igt/gem_exec_lut_handle is slow on -nightly(11ddb598492d1f97f894495eea398febb41a9eb0) and -next-queued(b7c0d9df97c10ec5693a838df2fd53058f8e9e96). it's passed on -fixes(7f1950fbb989e8fc5463b307e062b4529d51c862) . it spends about 15 minutes. output: date;./gem_exec_lut_handle;date Mon Apr 28 14:30:05 EDT 2014 IGT-Version: 1.6-ga595a40 (x86_64) (Linux: 3.15.0-rc2_drm-intel-nightly_11ddb5_20140428+ x86_64) relocation: buffers= 1: old= 125316 + 117.9*reloc, lut= 146850 + 112.0*reloc (ns) relocation: buffers= 2: old= 406548 + 141.1*reloc, lut= 419875 + 140.6*reloc (ns) relocation: buffers= 4: old= 800835 + 139.2*reloc, lut= 816025 + 137.0*reloc (ns) relocation: buffers= 8: old=1166923 + 135.5*reloc, lut=1232500 + -75.5*reloc (ns) relocation: buffers= 16: old= 192457 + 81.2*reloc, lut= 205708 + 65.6*reloc (ns) relocation: buffers= 32: old= 192791 + 122.6*reloc, lut= 209907 + 119.2*reloc (ns) relocation: buffers= 64: old= 506904 + 148.8*reloc, lut= 542884 + 60.0*reloc (ns) relocation: buffers= 128: old= 191845 + 132.5*reloc, lut= 202870 + 130.7*reloc (ns) relocation: buffers= 256: old= 526010 + 175.1*reloc, lut= 550133 + 139.0*reloc (ns) relocation: buffers= 512: old= 359939 + 120.2*reloc, lut= 364441 + 101.7*reloc (ns) relocation: buffers=1024: old= 666040 + 193.0*reloc, lut= 659301 + 164.3*reloc (ns) relocation: buffers=2048: old=1392698 + 90.6*reloc, lut=1326707 + 31.1*reloc (ns) skip-relocs: buffers= 1: old= 99454 + 77.9*reloc, lut= 109764 + 76.7*reloc (ns) skip-relocs: buffers= 2: old= 404212 + 101.5*reloc, lut= 418515 + 100.4*reloc (ns) skip-relocs: buffers= 4: old= 824913 + -9.5*reloc, lut= 864003 + -114.4*reloc (ns) skip-relocs: buffers= 8: old= 100407 + 78.6*reloc, lut= 110857 + 77.6*reloc (ns) skip-relocs: buffers= 16: old= 411945 + 102.9*reloc, lut= 427001 + 101.5*reloc (ns) skip-relocs: buffers= 32: old= 815200 + 95.8*reloc, lut= 830975 + 92.6*reloc (ns) skip-relocs: buffers= 64: old=1165378 + 89.7*reloc, lut=1178587 + 87.6*reloc (ns) skip-relocs: buffers= 128: old=1524780 + 90.8*reloc, lut=1537187 + 89.3*reloc (ns) skip-relocs: buffers= 256: old=1906707 + 92.3*reloc, lut=1918720 + 89.7*reloc (ns) skip-relocs: buffers= 512: old=2337270 + 105.8*reloc, lut=2347068 + 91.7*reloc (ns) skip-relocs: buffers=1024: old=2053583 + -544.2*reloc, lut=1952384 + -534.9*reloc (ns) skip-relocs: buffers=2048: old= 680265 + 113.4*reloc, lut= 652852 + 68.2*reloc (ns) no-relocs: buffers= 1: old= 304519 + 32.3*reloc, lut= 331179 + -24.8*reloc (ns) no-relocs: buffers= 2: old= 97862 + 64.8*reloc, lut= 108347 + 65.1*reloc (ns) no-relocs: buffers= 4: old= 414029 + 94.3*reloc, lut= 430559 + 93.7*reloc (ns) no-relocs: buffers= 8: old= 838338 + 75.1*reloc, lut= 852344 + 74.6*reloc (ns) no-relocs: buffers= 16: old=1185574 + 70.9*reloc, lut=1198192 + 70.3*reloc (ns) no-relocs: buffers= 32: old=1516555 + 70.3*reloc, lut=1528722 + 69.6*reloc (ns) no-relocs: buffers= 64: old=1847106 + 69.1*reloc, lut=1859239 + 69.6*reloc (ns) no-relocs: buffers= 128: old=2190399 + 75.2*reloc, lut=2205177 + 71.6*reloc (ns) no-relocs: buffers= 256: old=2566325 + 71.3*reloc, lut=2578291 + 72.2*reloc (ns) no-relocs: buffers= 512: old= 730893 + -143.7*reloc, lut= 671261 + -119.0*reloc (ns) no-relocs: buffers=1024: old= 660928 + -76.5*reloc, lut= 667962 + -86.1*reloc (ns) no-relocs: buffers=2048: old= 689709 + 79.9*reloc, lut= 683225 + 77.5*reloc (ns) Mon Apr 28 14:44:33 EDT 2014 Reproduce steps: ---------------------------- 1. ./gem_exec_lut_handle
You should be able to bisect this based on the first line (rather than wait 15 minutes).
9403eb1064168ea7b47c5ccd04ec17b98ca9a0de is the first bad commit commit 9403eb1064168ea7b47c5ccd04ec17b98ca9a0de Author: Chris Wilson <chris@chris-wilson.co.uk> AuthorDate: Mon Mar 17 12:21:55 2014 +0000 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Fri Apr 25 16:18:01 2014 +0200 drm/i915: Do not call retire_requests from wait_for_rendering A common issue we have is that retiring requests causes recursion through GTT manipulation or page table manipulation which we can only handle at very specific points. However, to maintain internal consistency (enforced through our sanity checks on write_domain at various points in the GEM object lifecycle) we do need to retire the object prior to marking it with a new write_domain, and also clear the write_domain for the implicit flush following a batch. Note that this then allows the unbound objects to still be on the active lists, and so care must be taken when removing objects from unbound lists (similar to the caveats we face processing the bound lists). v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules, by refactoring it to call into __i915_gem_shrink(). v3: Missed an object-retire prior to changing cache domains in i915_gem_object_set_cache_leve() v4: Rebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> :040000 040000 469c92ff9c22309ca81fa43a75c249c1112b69da a036ee8486228ba2ddc646c607595011a31cdde7 M drivers
You only see this on BDW right? As it doesn't seem to affect the platforms here.
(In reply to comment #3) > You only see this on BDW right? As it doesn't seem to affect the platforms > here. This bug still can be reproduce on PNV and HSW, Thanks.
The root cause is ppgtt: commit 1f70999f9052f5a1b0ce1a55aff3808f2ec9fe42 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 27 22:43:07 2014 +0000 drm/i915: Prevent recursion by retiring requests when the ring is full As the VM do not track activity of objects and instead use a large hammer to forcibly idle and evict all of their associated objects when one is released, it is possible for that to cause a recursion when we need to wait for free space on a ring and call retire requests. (intel_ring_begin -> intel_ring_wait_request -> i915_gem_retire_requests_ring -> i915_gem_context_free -> i915_gem_evict_vm -> i915_gpu_idle -> intel_ring_begin etc) In order to remove the requirement for calling retire-requests from intel_ring_wait_request, we have to inline a couple of steps from retiring requests, notably we have to record the position of the request we wait for and use that to update the available ring space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> removes the i915_gem_retire_requests_ring() when we run out of ring space, and this test has no explicit request retiring and so depends upon this implicit retire.
*** Bug 78246 has been marked as a duplicate of this bug. ***
commit 1cf0ba14740d96fbf6f58a201f000a34b74f4725 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon May 5 09:07:33 2014 +0100 drm/i915: Flush request queue when waiting for ring space During the review of commit 1f70999f9052f5a1b0ce1a55aff3808f2ec9fe42 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 27 22:43:07 2014 +0000 drm/i915: Prevent recursion by retiring requests when the ring is full Ville raised the point that our interaction with request->tail was likely to foul up other uses elsewhere (such as hang check comparing ACTHD against requests). However, we also need to restore the implicit retire requests that certain test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the spectre that the ppgtt will randomly call i915_gpu_idle() and recurse back into intel_ring_begin(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023 Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Remove now unused 'tail' variable as spotted by Brad.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes on latest -nightly(2be456541ea41728002ccca2de5235f48d14326e), Thanks.
Closing verified+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.