Bug 105954

Summary: [CI] igt@gem_eio@* - fail - Failed assertion: __check_wait(fd, obj.handle, 100e3) == 0
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, martin.peres
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BSW/CHT, KBL i915 features: GEM/Other

Description Marta Löfstedt 2018-04-09 12:11:53 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_16/fi-bsw-n3050/igt@gem_eio@in-flight-contexts-immediate.html

(gem_eio:1711) CRITICAL: Test assertion failure function test_inflight_contexts, file ../tests/gem_eio.c:473:
(gem_eio:1711) CRITICAL: Failed assertion: __check_wait(fd, obj[1].handle, wait) == 0
(gem_eio:1711) CRITICAL: error: -62 != 0
Subtest in-flight-contexts-immediate failed.

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_16/fi-bsw-n3050/igt@gem_eio@unwedge-stress.html

(gem_eio:1391) CRITICAL: Test assertion failure function test_reset_stress, file ../tests/gem_eio.c:634:
(gem_eio:1391) CRITICAL: Failed assertion: __check_wait(fd, obj.handle, 100e3) == 0
(gem_eio:1391) CRITICAL: error: -62 != 0
Subtest unwedge-stress failed.
Comment 3 Martin Peres 2018-06-19 22:16:43 UTC
Used to happen at least once a week. Last seen was CI_DRM_4165_full (1 month, 1 week / 683 runs ago). Closing!
Comment 4 Martin Peres 2018-07-13 12:42:36 UTC
Guess who's back:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4435/shard-glk4/igt@gem_eio@in-flight-contexts-immediate.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4437/shard-glk7/igt@gem_eio@in-flight-contexts-immediate.html

(gem_eio:1188) CRITICAL: Test assertion failure function test_inflight_contexts, file ../tests/gem_eio.c:518:
(gem_eio:1188) CRITICAL: Failed assertion: __check_wait(fd, obj[1].handle, wait) == 0
(gem_eio:1188) CRITICAL: error: -62 != 0
Subtest in-flight-contexts-immediate failed.
Comment 5 Chris Wilson 2018-07-27 09:55:30 UTC
The key clue here is that we spawn a thread to inject the GPU reset, and in that thread we have a debug tale tell. That debug output is missing before we timeout -- ergo the reset thread is not being run. NOTOURFAULT, but still we need to be more graceful.
Comment 6 Chris Wilson 2018-07-27 13:09:02 UTC
*** Bug 107404 has been marked as a duplicate of this bug. ***
Comment 7 Chris Wilson 2018-08-03 14:28:03 UTC
commit 5d78c73d871525ec9caecd88ad7d9abe36637314 (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jul 27 12:10:52 2018 +0100

    igt/gem_eio: Measure reset delay from thread
    
    We assert that we complete a wedge within 250ms. However, when we use a
    thread to delay the wedging until after we start waiting, that thread
    itself is delayed longer than our wait timeout. This results in a false
    positive error where we fail the test before we even trigger the reset.
    
    Reorder the test so that we only ever measure the delay from triggering
    the reset until we wakeup, and assert that is in a timely fashion
    (less than 250ms).
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105954
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Comment 8 Francesco Balestrieri 2018-08-04 09:26:26 UTC
Martin, OK to close?
Comment 9 Martin Peres 2018-09-03 11:41:54 UTC
(In reply to Francesco Balestrieri from comment #8)
> Martin, OK to close?

Yep! Thanks!
Comment 10 Lakshmi 2018-10-04 17:17:35 UTC
This bug is fixed. Last seen 3 months ago.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.