Summary: | [CI][BAT] igt@i915_selftest@live_hangcheck - dmesg-fail - failed to complete request after reset | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | high | CC: | intel-gfx-bugs |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | ICL | i915 features: | GEM/Other |
Description
Martin Peres
2019-05-15 08:28:27 UTC
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: igt@i915_selftest@live_hangcheck - dmesg-fail - failed to complete request after reset - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4983/fi-icl-y/igt@i915_selftest@live_hangcheck.html The HW bogosity is: <7> [533.332880] [drm:execlists_resume [i915]] STOP_RING still set in RING_MI_MODE One occurrence, attributed to HW issues. Lowering priority and let's keep watching. (In reply to Francesco Balestrieri from comment #3) > One occurrence, attributed to HW issues. Lowering priority and let's keep > watching. It has occurred a few times. Not always the same test, but if reset fails, this error tends to show up. One small breakthrough: https://patchwork.freedesktop.org/patch/318310/?series=63752&rev=1 commit c30d5dc653cbc78f9b634b7b72e25057a68c527c (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jul 16 13:49:28 2019 +0100 drm/i915/gt: Push engine stopping into reset-prepare Push the engine stop into the back reset_prepare (where it already was!) This allows us to avoid dangerously setting the RING registers to 0 for logical contexts. If we clear the register on a live context, those invalid register values are recorded in the logical context state and replayed (with hilarious results). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-2-chris@chris-wilson.co.uk commit fff8102aaed59014cb2d8034bdca231185496b16 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jul 16 13:49:29 2019 +0100 drm/i915/execlists: Process interrupted context on reset By stopping the rings, we may trigger an arbitration point resulting in a premature context-switch (i.e. a completion event before the request is actually complete). This clears the active context before the reset, but we must remember to rewind the incomplete context for replay upon resume. Fixes: 1863e3020ab5 ("drm/i915/execlists: Always reset the context's RING registers") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-3-chris@chris-wilson.co.uk Hopefully. The mystery of STOP_RING persisting over reset remains, but I expect that to be mere noise; an incidental HW detail (unless it means that the reset itself didn't happen...) Not seen on drm-tip. So closing and archiving this. The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.