Bug 110683

Summary: [CI][BAT] igt@i915_selftest@live_hangcheck - dmesg-fail - failed to complete request after reset
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: ICL i915 features: GEM/Other

Description Martin Peres 2019-05-15 08:28:27 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4983/fi-icl-y/igt@i915_selftest@live_hangcheck.html

<3> [533.539575] i915_reset_engine(rcs0:self-priority): failed to complete request after reset
Comment 1 CI Bug Log 2019-05-15 08:30:16 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@i915_selftest@live_hangcheck - dmesg-fail - failed to complete request after reset
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4983/fi-icl-y/igt@i915_selftest@live_hangcheck.html
Comment 2 Chris Wilson 2019-05-15 08:31:04 UTC
The HW bogosity is:

<7> [533.332880] [drm:execlists_resume [i915]] STOP_RING still set in RING_MI_MODE
Comment 3 Francesco Balestrieri 2019-06-03 05:42:18 UTC
One occurrence, attributed to HW issues. Lowering priority and let's keep watching.
Comment 4 Chris Wilson 2019-06-14 18:55:34 UTC
(In reply to Francesco Balestrieri from comment #3)
> One occurrence, attributed to HW issues. Lowering priority and let's keep
> watching.

It has occurred a few times. Not always the same test, but if reset fails, this error tends to show up.
Comment 5 Chris Wilson 2019-07-16 14:10:13 UTC
One small breakthrough: https://patchwork.freedesktop.org/patch/318310/?series=63752&rev=1
Comment 6 Chris Wilson 2019-07-17 17:53:56 UTC
commit c30d5dc653cbc78f9b634b7b72e25057a68c527c (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 16 13:49:28 2019 +0100

    drm/i915/gt: Push engine stopping into reset-prepare
    
    Push the engine stop into the back reset_prepare (where it already was!)
    This allows us to avoid dangerously setting the RING registers to 0 for
    logical contexts. If we clear the register on a live context, those
    invalid register values are recorded in the logical context state and
    replayed (with hilarious results).
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-2-chris@chris-wilson.co.uk

commit fff8102aaed59014cb2d8034bdca231185496b16
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 16 13:49:29 2019 +0100

    drm/i915/execlists: Process interrupted context on reset
    
    By stopping the rings, we may trigger an arbitration point resulting in
    a premature context-switch (i.e. a completion event before the request
    is actually complete). This clears the active context before the reset,
    but we must remember to rewind the incomplete context for replay upon
    resume.
    
    Fixes: 1863e3020ab5 ("drm/i915/execlists: Always reset the context's RING registers")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-3-chris@chris-wilson.co.uk

Hopefully. The mystery of STOP_RING persisting over reset remains, but I expect that to be mere noise; an incidental HW detail (unless it means that the reset itself didn't happen...)
Comment 7 swathi.dhanavanthri 2019-11-22 23:41:03 UTC
Not seen on drm-tip. So closing and archiving this.
Comment 8 CI Bug Log 2019-11-22 23:41:12 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.