https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4983/fi-icl-y/igt@i915_selftest@live_hangcheck.html <3> [533.539575] i915_reset_engine(rcs0:self-priority): failed to complete request after reset
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: igt@i915_selftest@live_hangcheck - dmesg-fail - failed to complete request after reset - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4983/fi-icl-y/igt@i915_selftest@live_hangcheck.html
The HW bogosity is: <7> [533.332880] [drm:execlists_resume [i915]] STOP_RING still set in RING_MI_MODE
One occurrence, attributed to HW issues. Lowering priority and let's keep watching.
(In reply to Francesco Balestrieri from comment #3) > One occurrence, attributed to HW issues. Lowering priority and let's keep > watching. It has occurred a few times. Not always the same test, but if reset fails, this error tends to show up.
One small breakthrough: https://patchwork.freedesktop.org/patch/318310/?series=63752&rev=1
commit c30d5dc653cbc78f9b634b7b72e25057a68c527c (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jul 16 13:49:28 2019 +0100 drm/i915/gt: Push engine stopping into reset-prepare Push the engine stop into the back reset_prepare (where it already was!) This allows us to avoid dangerously setting the RING registers to 0 for logical contexts. If we clear the register on a live context, those invalid register values are recorded in the logical context state and replayed (with hilarious results). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-2-chris@chris-wilson.co.uk commit fff8102aaed59014cb2d8034bdca231185496b16 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jul 16 13:49:29 2019 +0100 drm/i915/execlists: Process interrupted context on reset By stopping the rings, we may trigger an arbitration point resulting in a premature context-switch (i.e. a completion event before the request is actually complete). This clears the active context before the reset, but we must remember to rewind the incomplete context for replay upon resume. Fixes: 1863e3020ab5 ("drm/i915/execlists: Always reset the context's RING registers") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-3-chris@chris-wilson.co.uk Hopefully. The mystery of STOP_RING persisting over reset remains, but I expect that to be mere noise; an incidental HW detail (unless it means that the reset itself didn't happen...)
Not seen on drm-tip. So closing and archiving this.
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.