Bug 107711 - [BAT] igt@gem_eio@(reset-stress|wait-10ms|wait-1us) - fail - Failed assertion: elapsed < 250e6
Summary: [BAT] igt@gem_eio@(reset-stress|wait-10ms|wait-1us) - fail - Failed assertion...
Status: CLOSED INVALID
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-28 12:04 UTC by Martin Peres
Modified: 2018-09-18 08:48 UTC (History)
1 user (show)

See Also:
i915 platform: ALL
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-08-28 12:04:56 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4711/shard-kbl6/igt@gem_eio@reset-stress.html
	
(gem_eio:1302) CRITICAL: Test assertion failure function check_wait, file ../tests/gem_eio.c:258:
(gem_eio:1302) CRITICAL: Failed assertion: elapsed < 250e6
(gem_eio:1302) CRITICAL: Wake up following reset+wedge took 3545.628ms
Subtest reset-stress failed.


https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-blb-e6850/igt@gem_eio@wait-10ms.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-bwr-2160/igt@gem_eio@wait-10ms.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-gdg-551/igt@gem_eio@wait-10ms.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-pnv-d510/igt@gem_eio@wait-10ms.html

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4589/shard-hsw7/igt@gem_eio@wait-10ms.html

(gem_eio:11661) CRITICAL: Test assertion failure function check_wait, file ../tests/gem_eio.c:258:
(gem_eio:11661) CRITICAL: Failed assertion: elapsed < 250e6
(gem_eio:11661) CRITICAL: Wake up following reset+wedge took 3947.832ms
Subtest wait-10ms failed.


https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-bwr-2160/igt@gem_eio@wait-1us.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-gdg-551/igt@gem_eio@wait-1us.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-pnv-d510/igt@gem_eio@wait-1us.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-blb-e6850/igt@gem_eio@wait-1us.html

(gem_eio:1171) CRITICAL: Test assertion failure function check_wait, file ../tests/gem_eio.c:258:
(gem_eio:1171) CRITICAL: Failed assertion: elapsed < 250e6
(gem_eio:1171) CRITICAL: Wake up following reset+wedge took 3417.919ms
Subtest wait-1us failed.
Comment 1 Martin Peres 2018-08-28 12:06:28 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4660/shard-apl2/igt@gem_eio@reset-stress.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4713/shard-kbl7/igt@gem_eio@reset-stress.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4711/shard-kbl6/igt@gem_eio@reset-stress.html

(gem_eio:1302) CRITICAL: Test assertion failure function check_wait, file ../tests/gem_eio.c:258:
(gem_eio:1302) CRITICAL: Failed assertion: elapsed < 250e6
(gem_eio:1302) CRITICAL: Wake up following reset+wedge took 3545.628ms
Subtest reset-stress failed.
Comment 2 Chris Wilson 2018-08-28 12:24:53 UTC
These are conflating errors. The missed breadcrumb should be fixed by

commit a4a717010f4e8cacaa3f0cae8a22f25c39ae1d41
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 8 11:51:00 2018 +0100

    drm/i915: Unmask user interrupts writes into HWSP on snb/ivb/vlv/hsw
    
    An oddity occurs on Sandybridge, Ivybridge and Haswell (and presumably
    Valleyview) in that for the period following the GPU restart after a
    reset, there are no GT interrupts received. From Ville's notes, bit 0 in
    the HWSTAM corresponds to the render interrupt, and if we unmask it we
    do see immediate resumption of GT interrupt delivery (via the master irq
    handler) after the reset.
    
    v2: Limit the w/a to the render interrupt from rcs
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107500
    Fixes: c5498089463b ("drm/i915: Mask everything in ring HWSTAM on gen6+ in ringbuffer mode")
    References: d420a50c21ef ("drm/i915: Clean up the HWSTAM mess")
    Testcase: igt/gem_eio/reset-stress
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180808105101.913-2-chris@chris-wilson.co.uk

then

commit d6fee0dee09317d5e83e9b855316cb779dd679cf
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Aug 14 11:40:56 2018 +0100

    drm/i915: Kick waiters on resetting legacy rings
    
    This reapplies commit 39f3be162c46 ("drm/i915: Kick waiters on resetting
    legacy rings") after the improved gem_eio was run across all machines we
    found that gen3 and early gen4 still lost the immediate interrupt
    following reset, and the HWSTAM w/a applied to gen6+ is inadequate.
    
    Unlike the later gen, on gen3/4 the principle (and only tests to fail so
    far) are the wait vs reset test cases, whereas the reset stress case
    works fine (which was the predominantly failing case for gen6+). That is
    enough to suggest the underlying issue is sufficiently different to
    support the difference in HWSTAM efficacy.
    
    Testcase: igt/gem_eio/wait-10ms
    References: 39f3be162c46 ("drm/i915: Kick waiters on resetting legacy rings")
    References: a69ab52b0358 ("drm/i915: Remove extra waiter kick on legacy resets")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180814104056.27001-1-chris@chris-wilson.co.uk

However, there are later results reported here that do not have an explanation (nothing reported at all in dmesg for the missing interval).


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.