Bug 105957 - [CI] igt@gem_eio@* - fail - Test assertion failure function trigger_reset - Failed assertion: igt_seconds_elapsed(&ts) < 2
Summary: [CI] igt@gem_eio@* - fail - Test assertion failure function trigger_reset - F...
Status: REOPENED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-09 12:39 UTC by Marta Löfstedt
Modified: 2019-06-03 06:13 UTC (History)
1 user (show)

See Also:
i915 platform: ALL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marta Löfstedt 2018-04-09 12:39:15 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_15/fi-byt-n2820/igt@gem_eio@unwedge-stress.html

(gem_eio:1302) CRITICAL: Test assertion failure function trigger_reset, file ../tests/gem_eio.c:81:
(gem_eio:1302) CRITICAL: Failed assertion: igt_seconds_elapsed(&ts) < 2
Subtest unwedge-stress failed.
Comment 1 Martin Peres 2018-04-20 12:29:13 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_23/fi-cfl-u/igt@gem_eio@in-flight-suspend.html
	
(gem_eio:1624) CRITICAL: Test assertion failure function trigger_reset, file ../tests/gem_eio.c:81:
(gem_eio:1624) CRITICAL: Failed assertion: igt_seconds_elapsed(&ts) < 2
Subtest in-flight-suspend failed.
Comment 2 Chris Wilson 2018-05-17 22:34:10 UTC
commit 89ae332745e31a075747a63ac5acc5baccf75769
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri May 11 18:58:59 2018 +0100

    tests/gem_eio: Only wait-for-idle inside trigger_reset()
    
    trigger_reset() imposes a tight time constraint (2s) so that we verify
    that the reset itself completes quickly. In the middle of this check, we
    call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to
    clear out the freed memory (DROP_FREED). Those barriers may have
    unbounded latency pushing beyond the 2s timeout, so restrict the
    operation to only wait-for-idle (DROP_ACTIVE).
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Optimistically marking as fixed to see what happens. It's doubtful that the rcu_barrier alone is causing the grief, so I suspect there might be an outside timing influence -- as far as I can tell, the driver is doing the right thing and isn't causing the delay itself.
Comment 4 Chris Wilson 2018-05-22 08:32:39 UTC
(In reply to Martin Peres from comment #3)
> It was definitely not fixed:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_46/fi-kbl-7567u/
> igt@gem_eio@in-flight-suspend.html

But that isn't the same bug. In that case there was an unexpected GPU hangcheck after resume.
Comment 6 Lakshmi 2018-10-19 15:29:28 UTC
Update: Last seen CI_DRM_4943_full (1 week, 6 days / 164 runs ago).
Comment 7 Andi 2018-12-07 16:54:01 UTC
This has a very low failure rate and I have been running the test list from IGT_4727 for quite a long time and didn't get any failure.

So far I have been running the test for over 48 hours, 236 times.

Is it OK to lower the "importance" of this bug to "lowest"?
Comment 8 Francesco Balestrieri 2018-12-11 09:13:38 UTC
Last seen 2 weeks ago on GLK:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5207/shard-glk7/igt@gem_eio@wait-wedge-immediate.html

Before that, it happened with weekly frequency.
Comment 9 Chris Wilson 2019-02-15 15:43:51 UTC
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 12 20:40:34 2019 +0000

    i915/gem_eio: Check average reset times
    
    As we have moved to rcu/srcu to serialise the resets, individual resets
    are subject to small variations in system grace periods. Allow for this
    by only expecting the median reset time to be within our target, thereby
    excluding noisy outliers from perturbing our results (but keep the
    maximum capped to prevent horrid failures!)
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Comment 10 CI Bug Log 2019-03-12 11:16:19 UTC
A CI Bug Log filter associated to this bug has been updated:

{- all machines: igt@gem_eio@* - fail - Failed assertion: igt_seconds_elapsed(&amp;ts) &lt; 2 -}
{+ all machines: igt@gem_eio@* - fail - Failed assertion: igt_seconds_elapsed(&amp;ts) &lt; 2 +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5729/shard-glk4/igt@gem_eio@context-create.html
Comment 11 Lakshmi 2019-03-12 11:17:10 UTC
This bug is reopened to due to the this failure
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5729/shard-glk4/igt@gem_eio@context-create.html

Starting subtest: context-create
(gem_eio:2574) CRITICAL: Test assertion failure function trigger_reset, file ../tests/i915/gem_eio.c:82:
(gem_eio:2574) CRITICAL: Failed assertion: igt_seconds_elapsed(&ts) < 2
Subtest context-create failed.
**** DEBUG ****
(gem_eio:2574) i915/gem_context-DEBUG: Test requirement passed: gem_has_contexts(fd)
(gem_eio:2574) igt_debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(gem_eio:2574) DEBUG: Disabling GPU reset
(gem_eio:2574) DEBUG: Test requirement passed: fd >= 0
(gem_eio:2574) DEBUG: Test requirement passed: i915_reset_control(false)
(gem_eio:2574) igt_debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(gem_eio:2574) DEBUG: Enabling GPU reset
(gem_eio:2574) DEBUG: Test requirement passed: fd >= 0
(gem_eio:2574) igt_gt-DEBUG: Triggering GPU reset
(gem_eio:2574) igt_debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(gem_eio:2574) DEBUG: Checking that the GPU recovered
(gem_eio:2574) igt_debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(gem_eio:2574) CRITICAL: Test assertion failure function trigger_reset, file ../tests/i915/gem_eio.c:82:
(gem_eio:2574) CRITICAL: Failed assertion: igt_seconds_elapsed(&ts) < 2
(gem_eio:2574) igt_core-INFO: Stack trace:
(gem_eio:2574) igt_core-INFO:   #0 ../lib/igt_core.c:1474 __igt_fail_assert()
(gem_eio:2574) igt_core-INFO:   #1 ../tests/i915/gem_eio.c:83 trigger_reset()
(gem_eio:2574) igt_core-INFO:   #2 ../tests/i915/gem_eio.c:132 test_context_create()
(gem_eio:2574) igt_core-INFO:   #3 ../tests/i915/gem_eio.c:835 __real_main814()
(gem_eio:2574) igt_core-INFO:   #4 ../tests/i915/gem_eio.c:814 main()
(gem_eio:2574) igt_core-INFO:   #5 ../csu/libc-start.c:344 __libc_start_main()
(gem_eio:2574) igt_core-INFO:   #6 [_start+0x2a]
****  END  ****
Comment 12 Francesco Balestrieri 2019-06-03 06:13:39 UTC
Latest occurrence from two weeks ago: 

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6087/shard-glk1/igt@gem_eio@wait-10ms.html


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.