Bug 108954

Summary: [CI][SHARDS] igt@i915_selftest@live_workarounds - dmesg-fail - *ERROR* rcs0 workaround lost on before reset!
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Mika Kuoppala <mika.kuoppala>
Status: RESOLVED NOTOURBUG QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: ICL i915 features: GEM/Other

Description Martin Peres 2018-12-05 14:22:56 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5255/shard-iclb3/igt@i915_selftest@live_workarounds.html

<3> [140.850285] [drm:wa_list_verify [i915]] *ERROR* rcs0 workaround lost on before reset! (b118=0/0, expected 200040, mask=200040)
Comment 1 Chris Wilson 2018-12-05 15:18:17 UTC
Even more confusing is that it stopped occurring. Still the same machine (icl6).
Comment 2 Chris Wilson 2018-12-05 15:55:25 UTC
Failure in reading; it has failed twice now on iclb3.

Passed on iclb5, iclb6, iclb7 so far. It could just be that one machine... Fits in with the pre-production theory.
Comment 3 Chris Wilson 2018-12-06 08:28:55 UTC
And now it is appearing out of the blue on BAT fi-icl-u3.
Comment 4 Chris Wilson 2018-12-06 16:49:51 UTC
(In reply to Chris Wilson from comment #3)
> And now it is appearing out of the blue on BAT fi-icl-u3.

That at least appears to have been a bad merge now fixed up by Tvrtko.
Comment 5 Francesco Balestrieri 2018-12-18 11:42:33 UTC
Not seen in 6 days, used to be occurring very frequently. Hopefully fixed but let's keep monitoring it...
Comment 6 Francesco Balestrieri 2019-01-08 09:25:37 UTC
Keeps happening, here is the latest occurrence from 5 days ago:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5357/shard-iclb3/igt@i915_selftest@live_workarounds.html
Comment 7 Francesco Balestrieri 2019-01-08 13:07:41 UTC
Only seen in iclb3 in the past month, but has failed on b1, b7 and b5 before that.
Comment 8 Francesco Balestrieri 2019-01-09 08:25:08 UTC
Mika, could you take a look?
Comment 9 Chris Wilson 2019-01-11 12:57:58 UTC
*** Bug 109306 has been marked as a duplicate of this bug. ***
Comment 10 CI Bug Log 2019-01-11 13:23:42 UTC
A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@i915_selftest@live_workarounds - dmesg-fail - *ERROR* rcs0 workaround lost on before reset! -}
{+ ICL: igt@i915_selftest@live_workarounds - dmesg-fail - *ERROR* rcs0(_REF)? workaround lost on before reset! +}

 No new failures caught with the new filter
Comment 11 Francesco Balestrieri 2019-02-06 07:21:17 UTC
Latest occurrence from today: 

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5543/shard-iclb3/igt@i915_selftest@live_workarounds.html
Comment 12 Francesco Balestrieri 2019-03-01 08:16:44 UTC
Still continues to happen only on iclb3.
Comment 13 Francesco Balestrieri 2019-03-12 12:56:19 UTC
Now it's everywhere except iclb3 :D
Comment 14 Chris Wilson 2019-04-17 10:05:15 UTC
This still fails sporadically across the different icl in the shards!

commit 769f0dab622c58e3158fc55d761b62a61e7fa2e5 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Apr 17 08:56:29 2019 +0100

    drm/i915: Make workaround verification *optional*
    
    Sometimes the HW doesn't even play fair, and completely forgets about
    register writes. Skip verifying known troublemakers.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=108954
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190417075657.19456-4-chris@chris-wilson.co.uk

We really need to follow up with a hsd...

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.