Summary: | [snb] semaphores deadlock -- testing improved deadlock breaker | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Stefan Huber <shuber> | ||||||
Component: | DRM/Intel | Assignee: | Rodrigo Vivi <rodrigo.vivi> | ||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Severity: | normal | ||||||||
Priority: | medium | CC: | intel-gfx-bugs | ||||||
Version: | XOrg git | ||||||||
Hardware: | Other | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
Stefan Huber
2014-06-30 11:15:24 UTC
Try: diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 0edc97f..9e5a295 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2852,7 +2852,7 @@ static int semaphore_passed(struct intel_engine_cs *ring) { struct drm_i915_private *dev_priv = ring->dev->dev_private; struct intel_engine_cs *signaller; - u32 seqno, ctl; + u32 seqno; ring->hangcheck.deadlock++; @@ -2860,19 +2860,20 @@ static int semaphore_passed(struct intel_engine_cs *ring) if (signaller == NULL) return -1; + printk("%s waiting on %s [recursion depth %d], seqno 0x%x [current 0x%x]\n", + ring->name, signaller->name, signaller->hangcheck.deadlock, + seqno, signaller->get_seqno(signaller, false)); + /* Prevent pathological recursion due to driver bugs */ if (signaller->hangcheck.deadlock >= I915_NUM_RINGS) return -1; - /* cursory check for an unkickable deadlock */ - ctl = I915_READ_CTL(signaller); - if (ctl & RING_WAIT_SEMAPHORE && semaphore_passed(signaller) < 0) - return -1; - if (i915_seqno_passed(signaller->get_seqno(signaller, false), seqno)) return 1; - if (signaller->hangcheck.deadlock) + /* cursory check for an unkickable deadlock */ + if (I915_READ_CTL(signaller) & RING_WAIT_SEMAPHORE && + semaphore_passed(signaller) < 0) return -1; return 0; (In reply to comment #1) I have upgraded to 3.15.2 and applied the second patch too. I will ping you when/if the error occurs next. (According to my logs I had GPU crashes on Feb 5, Feb 20, Apr 4, Apr 9, Apr 21, May 13, May 16, Jun 3, Jun 23, Jun 30.) So far so good, no crashes with the proposed patch until now. It should emit a warning when it fires, could you check your logs to see if you have had such an event? (In reply to comment #4) > It should emit a warning when it fires, could you check your logs to see if > you have had such an event? # zcat messages-* | cat - messages | grep "waiting on" -A 8 Jul 18 16:33:09 euklid kernel: [ 9291.280145] render ring waiting on blitter ring [recursion depth 0], seqno 0x801bd [current 0x801bd] Jul 18 16:33:09 euklid kernel: [ 9291.280639] [drm] GPU HANG: ecode -1:0x00000000, reason: Kicking stuck semaphore on render ring, action: continue Jul 18 16:33:09 euklid kernel: [ 9291.280640] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Jul 18 16:33:09 euklid kernel: [ 9291.280641] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Jul 18 16:33:09 euklid kernel: [ 9291.280642] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Jul 18 16:33:09 euklid kernel: [ 9291.280642] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Jul 18 16:33:09 euklid kernel: [ 9291.280643] [drm] GPU crash dump saved to /sys/class/drm/card0/error Jul 18 16:33:09 euklid kernel: [ 9291.280669] blitter ring waiting on render ring [recursion depth 0], seqno 0x801c1 [current 0x801bf] Interesting, I cannot remember that that there was a crash yesterday. Created attachment 103100 [details]
/sys/class/drm/card0/error from Jul 18
(In reply to comment #5) > (In reply to comment #4) > > It should emit a warning when it fires, could you check your logs to see if > > you have had such an event? > > # zcat messages-* | cat - messages | grep "waiting on" -A 8 > Jul 18 16:33:09 euklid kernel: [ 9291.280145] render ring waiting on blitter > ring [recursion depth 0], seqno 0x801bd [current 0x801bd] > Jul 18 16:33:09 euklid kernel: [ 9291.280669] blitter ring waiting on render > ring [recursion depth 0], seqno 0x801c1 [current 0x801bf] > > Interesting, I cannot remember that that there was a crash yesterday. You weren't meant to! Thanks, that shows that the patch did the trick. commit a0d036b074b4a5a933e37fcb9bdd6b3cc80a0387 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Jul 19 12:40:42 2014 +0100 drm/i915: Reorder the semaphore deadlock check, again commit 4be173813e57c7298103a83155c2391b5b167b4c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 6 10:22:29 2014 +0100 drm/i915: Reorder semaphore deadlock check did the majority of the work, but it missed one crucial detail: The check for the unkickable deadlock on this ring must come after the check whether the ring that we are waiting on has already passed its target seqno. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80709 Tested-by: Stefan Huber <shuber@sthu.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.