Created attachment 56439 [details] An example of the error state Having updated sna to prefer to use the BLT for doing fills (it is far faster than using the current shaders on RENDER) if we have semaphores, I am now encountering a reproducible hang whilst running cairo-traces (though not always in the same spot). sna: commit df4e1059a4e09998334dde6aa1c8ccfe76e442c0 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jan 31 19:19:13 2012 +0000 sna/gen6: Prefer to do fills using the BLT
Various magic patches tried so far: finish-gpu, o_l_r seqno fix, rc6=0, fbc=0, forcewaked
Also in case it matters, this is an mba4.1 (11") with an i5-2467m. Now where's the stepping info?... ;-)
Created attachment 56445 [details] Sample error state Updated error state to include inter-ring sync registers, as supposedly used by the semaphore mbox.
Created attachment 56453 [details] [review] always bind to ppgtt (patch against current my-next) There are too many stern warnings that not having 1:1 global:per-process gtt bindings can lead to hilarity, and one of the saner theories to explain the error_state is to assume that the gpu read a bunch of MI_NOOP from the render ring (which it would if it would read through ppgtt). So let's try to rule that out first.
Good news! No effect with forcing the 1:1 mapping of ggtt:ppgtt.
The light at the end of the tunnel seems to be not to use LLC-caching on the ring buffers... But first lets see if it survives just one complete cycle!
No, it didn't make it to the end of the cairo-traces.
The reported head postion through the HWS is just a random value: [ 55.365394] ring 0: bogus return from HWS: 9023488 > 2099072 [ 55.530138] ring 0: bogus return from HWS: 113971200 > 2142592 [ 55.770873] ring 0: bogus return from HWS: 94113792 > 2208832 [ 56.012128] ring 0: bogus return from HWS: 146026496 > 4236360 [ 56.037306] ring 0: bogus return from HWS: 269692928 > 4241160 [ 56.042295] ring 0: bogus return from HWS: 9060352 > 4242120 [ 56.043003] ring 0: bogus return from HWS: 9060352 > 4242428
Created attachment 56463 [details] [review] No more HEAD please, I'm British! My theory as to why this requires semaphores and became easier to hit with the pipe-control w/a, is that we need to exhaust the ring buffer often enough to read a bogus and dangerous value. Using semaphores (or UXA) generates lots of BLT/RENDER switches and lots of tiny batches, and the pipe-control magnified the amount of space required for every batch.
commit 6aa56062eaba67adfb247cded244fd877329588d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 29 21:44:37 2010 +0100 drm/i915/ringbuffer: Use the HEAD auto-reporting mechanism ... the offending commit that introduced the issue.
For the record, the debugging patch was equivalent to: diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 023739f..54420ed 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1065,12 +1065,18 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) * fallback to the slow and accurate path. */ head = intel_read_status_page(ring, 4); +#if 0 if (head > ring->head) { ring->head = head; ring->space = ring_space(ring); if (ring->space >= n) return 0; } +#else + if (head > I915_READ_HEAD(ring)) + printk("ring %d: read of reported HEAD %d is in advance of actual HEAD %d\n", + ring->id, head, I915_READ_HEAD(ring)); +#endif trace_i915_ring_wait_begin(ring); if (drm_core_check_feature(dev, DRIVER_GEM))
It does come with a consequence for *GPU* traces: Slowdowns ========= xlib firefox-fishtank 2324.51 -> 12551.04: 5.40x slowdown xlib firefox-paintball 882.43 -> 19323.76: 21.90x slowdown
It does come with a consequence for *GPU-bound* traces: Slowdowns ========= xlib firefox-fishtank 2324.51 -> 12551.04: 5.40x slowdown xlib firefox-paintball 882.43 -> 19323.76: 21.90x slowdown
Created attachment 56479 [details] [review] Return tail from ring->add_request()
Created attachment 56480 [details] [review] Use request->head to guess current ring->head
Created attachment 56481 [details] [review] Keep using request->head until we run out of requests Ok, using the request to estimate HEAD prevents the performance regression, and things appear to be stable again on SNB. Testing on PNV now.
Argh. Looks like I fluffed the testing and that the perf-cases I have exploit this bug... Bizarre, very bizarre...
Created attachment 56522 [details] [review] Use request->tail to guess current ring->head Current patch that seems non-buggy, but no perf improvement (despite the old changelog!)
*** Bug 45273 has been marked as a duplicate of this bug. ***
A sample of autoreported heads from snb: [ 141.970680] head now 00e10000, actual 00008238, bogus? 1 [ 141.974062] head now 02734000, actual 000083c8, bogus? 1 [ 141.974425] head now 00e10000, actual 00008488, bogus? 1 [ 141.980374] head now 032b5000, actual 000088b8, bogus? 1 [ 141.980885] head now 03271000, actual 00008950, bogus? 1 [ 142.040628] head now 02101000, actual 00008b40, bogus? 1 [ 142.180173] head now 02734000, actual 00009050, bogus? 1 [ 142.181090] head now 00000000, actual 00000ae0, bogus? 0 [ 142.183737] head now 02734000, actual 00009050, bogus? 1 [ 142.183747] head now 00000000, actual 00000b30, bogus? 0 [ 142.183755] head now 02734000, actual 00009050, bogus? 1 [ 142.190904] head now 03271000, actual 00009110, bogus? 1 [ 142.216101] head now 02101000, actual 00009290, bogus? 1 [ 142.242984] head now 00e54000, actual 00009420, bogus? 1 [ 142.280395] head now 00e10000, actual 000095a0, bogus? 1 [ 142.299506] head now 02185000, actual 00009660, bogus? 1 [ 142.314656] head now 05642000, actual 00009720, bogus? 1 [ 142.353926] head now 02734000, actual 000098a0, bogus? 1
From q35: [ 217.975608] head now 00010000, actual 00010000, bogus? 0 [ 436.725613] head now 00200000, actual 00200000, bogus? 0 [ 462.956033] head now 00210000, actual 00210010, bogus? 0 [ 485.501409] head now 00400000, actual 00400020, bogus? 0 [ 508.064280] head now 00410000, actual 00410000, bogus? 0 [ 530.576078] head now 00600000, actual 00600020, bogus? 0 The misses there are equally worrying, but at least don't cause us to blow up. Also no evidence that this the long sort after all-generation bug.
Ok, the failure on q35 is starting to look extremely systematic: [ 217.975608] head now 00010000, actual 00010000, bogus? 0 [ 436.725613] head now 00200000, actual 00200000, bogus? 0 [ 462.956033] head now 00210000, actual 00210010, bogus? 0 [ 485.501409] head now 00400000, actual 00400020, bogus? 0 [ 508.064280] head now 00410000, actual 00410000, bogus? 0 [ 530.576078] head now 00600000, actual 00600020, bogus? 0 [ 553.273489] head now 00610000, actual 00610018, bogus? 0 [ 575.732116] head now 00800000, actual 00800028, bogus? 0 [ 598.396240] head now 00810000, actual 00810058, bogus? 0 [ 620.868948] head now 00a00000, actual 00a00008, bogus? 0 [ 726.153771] head now 00a10000, actual 00a10008, bogus? 0 That is either the wrap counter is being incremented twice, or we miss notifications for the odd wrap values.
Created attachment 56749 [details] [review] Revert use of autoreported head
Created attachment 56750 [details] [review] Record tail at each request and use to estimate ring->head.
Created attachment 56765 [details] A sample error state showing requests
Boo, hiss! With proper waiting, we go from 12Mglyphs/s to 5.2Mglyphs/s on i5-2520m. :( I was so happy to have broken the 10Mglyphs/s barrier and thrashed nvidia into submission. Oh hardware, how I hate thee.
*** Bug 46575 has been marked as a duplicate of this bug. ***
Note that i-g-t/tests/gem_ringfill was enhanced to exercise this bug.
On the way to linus, currently sitting in airlied/drm-fixes: commit 5d031e5b633d910f35e6e0abce94d9d842390006 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Feb 8 13:34:13 2012 +0000 drm/i915: Remove use of the autoreported ringbuffer HEAD position This is a revert of 6aa56062eaba67adfb247cded244fd877329588d. This was originally introduced to workaround reads of the ringbuffer registers returning 0 on SandyBridge causing hangs due to ringbuffer overflow. The root cause here was reads through the GT powerwell require the forcewake dance, something we only learnt of later. Now it appears that reading the reported head position from the HWS is returning garbage, leading once again to hangs. For example, on q35 the autoreported head reports: [ 217.975608] head now 00010000, actual 00010000 [ 436.725613] head now 00200000, actual 00200000 [ 462.956033] head now 00210000, actual 00210010 [ 485.501409] head now 00400000, actual 00400020 [ 508.064280] head now 00410000, actual 00410000 [ 530.576078] head now 00600000, actual 00600020 [ 553.273489] head now 00610000, actual 00610018 which appears reasonably sane. In contrast, if we look at snb: [ 141.970680] head now 00e10000, actual 00008238 [ 141.974062] head now 02734000, actual 000083c8 [ 141.974425] head now 00e10000, actual 00008488 [ 141.980374] head now 032b5000, actual 000088b8 [ 141.980885] head now 03271000, actual 00008950 [ 142.040628] head now 02101000, actual 00008b40 [ 142.180173] head now 02734000, actual 00009050 [ 142.181090] head now 00000000, actual 00000ae0 [ 142.183737] head now 02734000, actual 00009050 In addition, the automatic reporting of the head position is scheduled to be defeatured in the future. It has no more utility, remove it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45492 Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Tested-by: Eric Anholt <eric@anholt.net> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.