Bug 110683 - [CI][BAT] igt@i915_selftest@live_hangcheck - dmesg-fail - failed to complete request after reset
Summary: [CI][BAT] igt@i915_selftest@live_hangcheck - dmesg-fail - failed to complete ...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-15 08:28 UTC by Martin Peres
Modified: 2019-07-17 17:53 UTC (History)
1 user (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2019-05-15 08:28:27 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4983/fi-icl-y/igt@i915_selftest@live_hangcheck.html

<3> [533.539575] i915_reset_engine(rcs0:self-priority): failed to complete request after reset
Comment 1 CI Bug Log 2019-05-15 08:30:16 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@i915_selftest@live_hangcheck - dmesg-fail - failed to complete request after reset
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4983/fi-icl-y/igt@i915_selftest@live_hangcheck.html
Comment 2 Chris Wilson 2019-05-15 08:31:04 UTC
The HW bogosity is:

<7> [533.332880] [drm:execlists_resume [i915]] STOP_RING still set in RING_MI_MODE
Comment 3 Francesco Balestrieri 2019-06-03 05:42:18 UTC
One occurrence, attributed to HW issues. Lowering priority and let's keep watching.
Comment 4 Chris Wilson 2019-06-14 18:55:34 UTC
(In reply to Francesco Balestrieri from comment #3)
> One occurrence, attributed to HW issues. Lowering priority and let's keep
> watching.

It has occurred a few times. Not always the same test, but if reset fails, this error tends to show up.
Comment 5 Chris Wilson 2019-07-16 14:10:13 UTC
One small breakthrough: https://patchwork.freedesktop.org/patch/318310/?series=63752&rev=1
Comment 6 Chris Wilson 2019-07-17 17:53:56 UTC
commit c30d5dc653cbc78f9b634b7b72e25057a68c527c (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 16 13:49:28 2019 +0100

    drm/i915/gt: Push engine stopping into reset-prepare
    
    Push the engine stop into the back reset_prepare (where it already was!)
    This allows us to avoid dangerously setting the RING registers to 0 for
    logical contexts. If we clear the register on a live context, those
    invalid register values are recorded in the logical context state and
    replayed (with hilarious results).
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-2-chris@chris-wilson.co.uk

commit fff8102aaed59014cb2d8034bdca231185496b16
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 16 13:49:29 2019 +0100

    drm/i915/execlists: Process interrupted context on reset
    
    By stopping the rings, we may trigger an arbitration point resulting in
    a premature context-switch (i.e. a completion event before the request
    is actually complete). This clears the active context before the reset,
    but we must remember to rewind the incomplete context for replay upon
    resume.
    
    Fixes: 1863e3020ab5 ("drm/i915/execlists: Always reset the context's RING registers")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-3-chris@chris-wilson.co.uk

Hopefully. The mystery of STOP_RING persisting over reset remains, but I expect that to be mere noise; an incidental HW detail (unless it means that the reset itself didn't happen...)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.