Bug 106215 - [CI] igt@gem_wait@basic-wait-all - fail - Failed assertion: !"GPU hung"
Summary: [CI] igt@gem_wait@basic-wait-all - fail - Failed assertion: !"GPU hung"
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-24 14:38 UTC by Martin Peres
Modified: 2018-05-22 20:36 UTC (History)
1 user (show)

See Also:
i915 platform: BXT
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-04-24 14:38:12 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_25/fi-bxt-dsi/igt@gem_wait@basic-wait-all.html

(gem_wait:1729) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:481:
(gem_wait:1729) igt_aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-wait-all failed.
Comment 1 Chris Wilson 2018-05-02 21:40:22 UTC
The test was running for less than 2s before the hang was declared. Very premature. I think we are confusing hangcheck by resetting the seqno frequently (so it sees the same seqno over and over again).
Comment 2 Chris Wilson 2018-05-03 09:46:28 UTC
I think this should fix up the spurious hangs:

commit ea491b23b2ffba069537a8216060d4d3400931a7
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed May 2 23:03:12 2018 +0100

    drm/i915: Reset the hangcheck timestamp before repeating a seqno
    
    In the unusual circumstance where we reuse a seqno (for example, in
    igt), make sure that we reset the hangcheck timestamp before it sees the
    same seqno again.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=106215
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180502220313.6459-1-chris@chris-wilson.co.uk
Comment 3 Martin Peres 2018-05-22 20:36:06 UTC
(In reply to Chris Wilson from comment #2)
> I think this should fix up the spurious hangs:
> 
> commit ea491b23b2ffba069537a8216060d4d3400931a7
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed May 2 23:03:12 2018 +0100
> 
>     drm/i915: Reset the hangcheck timestamp before repeating a seqno
>     
>     In the unusual circumstance where we reuse a seqno (for example, in
>     igt), make sure that we reset the hangcheck timestamp before it sees the
>     same seqno again.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=106215
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20180502220313.6459-1-
> chris@chris-wilson.co.uk

Hard to know if it really fixed it given that it only failed once, but that makes sense that it would work better now. Let's close it, thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.