Summary: | [CI][CFL] Occasional hang in drv_selftest@live_workarounds | ||
---|---|---|---|
Product: | DRI | Reporter: | Tomi Sarvela <tomi.p.sarvela> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | intel-gfx-bugs, martin.peres |
Version: | DRI git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | CFL | i915 features: |
Description
Tomi Sarvela
2018-07-11 08:47:47 UTC
My impression is that this is the same bug that affects live_hangcheck on execlists, in that it looks to be the restart from reset that freezes. Unlike live_hangcheck we don't have a timer in the background to kick live_workarounds in case of reset failure. I should fix that. This should turn the incompletes into fails: commit cb4dc8daf4cb72d7833148a6087b425b5c20e903 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jul 11 13:29:52 2018 +0100 drm/i915/selftests: Add a safety net to live_workarounds Since live_workarounds poke around the w/a registers and checks to see if they survive across a reset, we are prone to fouling the machine and leaving it in a non-recoverable state. Wrap the probe inside a timeout to abort the test if the reset fails. v2: Include GEM_TRACE on declaring wedged. v3: Add a few includes to make the header look standalone. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107188 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180711122952.18448-1-chris@chris-wilson.co.uk *** Bug 107220 has been marked as a duplicate of this bug. *** Found a subsequent BUG_ON (following the act of wedging the driver) that makes this worse than just the reset (live_hangcheck) failure. *** Bug 107292 has been marked as a duplicate of this bug. *** commit a99b32a6fff7e482a267c72e565c8c410ce793d7 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Aug 14 18:18:57 2018 +0100 drm/i915: Clear stop-engine for a pardoned reset If we pardon a per-engine reset, we may leave the STOP_RING bit asserted in RING_MI_MODE resulting in the engine hanging. Unconditionally clear it on the per-engine exit path as we know that either we skipped the reset and so need the cancellation, or the reset was successful and the cancellation is a no-op, or there was an error and we will follow up with a full-reset or wedging (both of which will stop the engines again as required). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107188 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106560 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180814171857.24673-1-chris@chris-wilson.co.uk Last seen 1 month ago. Closing the bug. This bug used to appear around 1- 20 rounds, now it doesn't appear since 217 rounds. Closing the bug. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.