Bug 106215

Summary: [CI] igt@gem_wait@basic-wait-all - fail - Failed assertion: !"GPU hung"
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BXT i915 features: GEM/Other

Description Martin Peres 2018-04-24 14:38:12 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_25/fi-bxt-dsi/igt@gem_wait@basic-wait-all.html

(gem_wait:1729) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:481:
(gem_wait:1729) igt_aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-wait-all failed.
Comment 1 Chris Wilson 2018-05-02 21:40:22 UTC
The test was running for less than 2s before the hang was declared. Very premature. I think we are confusing hangcheck by resetting the seqno frequently (so it sees the same seqno over and over again).
Comment 2 Chris Wilson 2018-05-03 09:46:28 UTC
I think this should fix up the spurious hangs:

commit ea491b23b2ffba069537a8216060d4d3400931a7
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed May 2 23:03:12 2018 +0100

    drm/i915: Reset the hangcheck timestamp before repeating a seqno
    
    In the unusual circumstance where we reuse a seqno (for example, in
    igt), make sure that we reset the hangcheck timestamp before it sees the
    same seqno again.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=106215
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180502220313.6459-1-chris@chris-wilson.co.uk
Comment 3 Martin Peres 2018-05-22 20:36:06 UTC
(In reply to Chris Wilson from comment #2)
> I think this should fix up the spurious hangs:
> 
> commit ea491b23b2ffba069537a8216060d4d3400931a7
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed May 2 23:03:12 2018 +0100
> 
>     drm/i915: Reset the hangcheck timestamp before repeating a seqno
>     
>     In the unusual circumstance where we reuse a seqno (for example, in
>     igt), make sure that we reset the hangcheck timestamp before it sees the
>     same seqno again.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=106215
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20180502220313.6459-1-
> chris@chris-wilson.co.uk

Hard to know if it really fixed it given that it only failed once, but that makes sense that it would work better now. Let's close it, thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.