https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4711/shard-kbl6/igt@gem_eio@reset-stress.html (gem_eio:1302) CRITICAL: Test assertion failure function check_wait, file ../tests/gem_eio.c:258: (gem_eio:1302) CRITICAL: Failed assertion: elapsed < 250e6 (gem_eio:1302) CRITICAL: Wake up following reset+wedge took 3545.628ms Subtest reset-stress failed. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-blb-e6850/igt@gem_eio@wait-10ms.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-bwr-2160/igt@gem_eio@wait-10ms.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-gdg-551/igt@gem_eio@wait-10ms.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-pnv-d510/igt@gem_eio@wait-10ms.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4589/shard-hsw7/igt@gem_eio@wait-10ms.html (gem_eio:11661) CRITICAL: Test assertion failure function check_wait, file ../tests/gem_eio.c:258: (gem_eio:11661) CRITICAL: Failed assertion: elapsed < 250e6 (gem_eio:11661) CRITICAL: Wake up following reset+wedge took 3947.832ms Subtest wait-10ms failed. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-bwr-2160/igt@gem_eio@wait-1us.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-gdg-551/igt@gem_eio@wait-1us.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-pnv-d510/igt@gem_eio@wait-1us.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-blb-e6850/igt@gem_eio@wait-1us.html (gem_eio:1171) CRITICAL: Test assertion failure function check_wait, file ../tests/gem_eio.c:258: (gem_eio:1171) CRITICAL: Failed assertion: elapsed < 250e6 (gem_eio:1171) CRITICAL: Wake up following reset+wedge took 3417.919ms Subtest wait-1us failed.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4660/shard-apl2/igt@gem_eio@reset-stress.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4713/shard-kbl7/igt@gem_eio@reset-stress.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4711/shard-kbl6/igt@gem_eio@reset-stress.html (gem_eio:1302) CRITICAL: Test assertion failure function check_wait, file ../tests/gem_eio.c:258: (gem_eio:1302) CRITICAL: Failed assertion: elapsed < 250e6 (gem_eio:1302) CRITICAL: Wake up following reset+wedge took 3545.628ms Subtest reset-stress failed.
These are conflating errors. The missed breadcrumb should be fixed by commit a4a717010f4e8cacaa3f0cae8a22f25c39ae1d41 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Aug 8 11:51:00 2018 +0100 drm/i915: Unmask user interrupts writes into HWSP on snb/ivb/vlv/hsw An oddity occurs on Sandybridge, Ivybridge and Haswell (and presumably Valleyview) in that for the period following the GPU restart after a reset, there are no GT interrupts received. From Ville's notes, bit 0 in the HWSTAM corresponds to the render interrupt, and if we unmask it we do see immediate resumption of GT interrupt delivery (via the master irq handler) after the reset. v2: Limit the w/a to the render interrupt from rcs Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107500 Fixes: c5498089463b ("drm/i915: Mask everything in ring HWSTAM on gen6+ in ringbuffer mode") References: d420a50c21ef ("drm/i915: Clean up the HWSTAM mess") Testcase: igt/gem_eio/reset-stress Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180808105101.913-2-chris@chris-wilson.co.uk then commit d6fee0dee09317d5e83e9b855316cb779dd679cf Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Aug 14 11:40:56 2018 +0100 drm/i915: Kick waiters on resetting legacy rings This reapplies commit 39f3be162c46 ("drm/i915: Kick waiters on resetting legacy rings") after the improved gem_eio was run across all machines we found that gen3 and early gen4 still lost the immediate interrupt following reset, and the HWSTAM w/a applied to gen6+ is inadequate. Unlike the later gen, on gen3/4 the principle (and only tests to fail so far) are the wait vs reset test cases, whereas the reset stress case works fine (which was the predominantly failing case for gen6+). That is enough to suggest the underlying issue is sufficiently different to support the difference in HWSTAM efficacy. Testcase: igt/gem_eio/wait-10ms References: 39f3be162c46 ("drm/i915: Kick waiters on resetting legacy rings") References: a69ab52b0358 ("drm/i915: Remove extra waiter kick on legacy resets") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180814104056.27001-1-chris@chris-wilson.co.uk However, there are later results reported here that do not have an explanation (nothing reported at all in dmesg for the missing interval).
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.