Bug 103240 - [CI] igt@gem_eio@in-flight-suspend
Summary: [CI] igt@gem_eio@in-flight-suspend
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 103289 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-10-12 11:45 UTC by Marta Löfstedt
Modified: 2017-11-10 10:26 UTC (History)
1 user (show)

See Also:
i915 platform: SNB
i915 features: GEM/Other


Attachments

Description Marta Löfstedt 2017-10-12 11:45:25 UTC
CI_DRM_3215, CI_DRM_3222 SNB-shards incomplete on igt@gem_eio@in-flight-external

Note since SNB doesn't support pstore we have very litte info on what that happened:

Last dmesgs show signs of corruption:
CI_DRM_3215
<6>[   17.219831] Console: switching to colour dummy device 80x25
<7>[   17.219902] [IGT] prime_mmap_coherency: executing
<7>[   17.247285] [IGT] prime_mmap_coherency: starting subtest write-and-fail
<7>[   17.247473] [IGT] prime_mmap_coherency: exiting, ret=77
<6>[   17.291054] Console: switching to colour frame buffer device 128x48
^@^@^@^@^@^@

CI_DRM_3222
<7>[  130.453554] [IGT] gem_exec_store: starting subtest pages-blt
<7>[  130.491475] [IGT] gem_exec_store: exiting, ret=0
<6>[  130.521308] Console: switching to colour frame buffer device 128x48
<6>[  130.688792] Console: switching to colour dummy device 80x25
<7>[  130.688856] [IGT] gem_eio: executing
^@^@^@^@^@^@

dmesg timestamp when this happen 
* does not suggest system timeout (external timeout is 22 minutes)
* does not suggest owatch timeout (owatch timeout is 6 minutes on shards)

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3215/shard-snb4/igt@gem_eio@in-flight-external.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3222/shard-snb1/igt@gem_eio@in-flight-external.html
Comment 1 Chris Wilson 2017-10-12 23:54:37 UTC
One possibility: https://patchwork.freedesktop.org/series/31848/
Comment 2 Chris Wilson 2017-10-14 08:58:22 UTC
(In reply to Chris Wilson from comment #1)
> One possibility: https://patchwork.freedesktop.org/series/31848/

Swing and a miss.
Comment 3 Chris Wilson 2017-10-16 07:53:43 UTC
*** Bug 103289 has been marked as a duplicate of this bug. ***
Comment 4 Chris Wilson 2017-10-16 08:18:07 UTC
Fwiw, the fix is https://patchwork.freedesktop.org/series/31987/
Comment 5 Chris Wilson 2017-10-17 09:47:00 UTC
commit 6f74b36b92cf9ee6450258fa341cff7c455a138f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Oct 15 15:37:25 2017 +0100

    drm/i915: Skip HW reinitialisation on resume if still wedged
    
    If we fail to recover the HW state upon resume (i.e. our attempt to
    clear the wedged bit and reset during i915_gem_sanitize() fails), then
    skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
    HW restart when successfully unwedging and resetting the HW later,
    but attempting to restore a wedged device upon resume is risky as the HW
    is in an unknown state.
Comment 6 Chris Wilson 2017-10-17 22:12:28 UTC
And CI apears to be hitting a completely different issue, or a second issue, compared to my machine.
Comment 7 Marta Löfstedt 2017-10-25 09:11:41 UTC
CI_DRM_3277 shard-kbl3 igt@gem_eio@in-flight-suspend fail

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3277/shard-kbl3/igt@gem_eio@in-flight-suspend.html
Comment 11 Marta Löfstedt 2017-11-01 07:28:54 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3301/shard-glkb3/igt@gem_eio@in-flight-suspend.html

(gem_eio:9748) igt-aux-CRITICAL: Test assertion failure function suspend_via_sysfs, file igt_aux.c:816:
(gem_eio:9748) igt-aux-CRITICAL: Failed assertion: igt_sysfs_set(power_dir, "state", suspend_state_name[state])
(gem_eio:9748) igt-aux-CRITICAL: Last errno: 16, Device or resource busy
Subtest in-flight-suspend failed.
Comment 12 Marta Löfstedt 2017-11-02 06:28:24 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-hsw5/igt@gem_eio@in-flight-suspend.html

(gem_eio:3803) igt-aux-CRITICAL: Test assertion failure function suspend_via_sysfs, file igt_aux.c:816:
(gem_eio:3803) igt-aux-CRITICAL: Failed assertion: igt_sysfs_set(power_dir, "state", suspend_state_name[state])
(gem_eio:3803) igt-aux-CRITICAL: Last errno: 16, Device or resource busy
Subtest in-flight-suspend failed.
Comment 13 Marta Löfstedt 2017-11-07 08:06:05 UTC
The SNB-shards incomplete should be fixed by:

commit b9f2abda9503bd55690cf3c2ccf2f20e8fc19ab3
Author: Petri Latvala <petri.latvala@intel.com>
Date:   Mon Oct 30 11:48:19 2017 +0200

    tests/gem_eio: Nerf in-flight-suspend
    
    Use TEST_NONE instead of TEST_DEVICES to prevent a machine death that
    happens on a particular model of SNB (2600 is affected, 2520m is
    not). Reset is unreliable, but the exact setup to trigger the death
    and how to work around it are not found at this time. There is some
    kind of a race lurking, and this commit is a workaround that avoids
    it, leaving the test still exercising some of the codepaths.
    
    References: https://intel-gfx-ci.01.org/tree/drm-tip/igt@gem_eio@in-flight-suspend.html
    References: https://bugs.freedesktop.org/show_bug.cgi?id=103289
    Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Petri Latvala <petri.latvala@intel.com>
    CC: Daniel Vetter <daniel.vetter@ffwll.ch>
    CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    CC: "Lofstedt, Marta" <marta.lofstedt@intel.com>
    CC: Martin Peres <martin.peres@linux.intel.com>
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

This was integrated into CI_DRM_3317. So, this will have to marinate ~10 runs before I archive.
Comment 14 Marta Löfstedt 2017-11-10 10:26:35 UTC
This has looked good since integration, also note the faila on  APL, KBL and GLK are bug 103375


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.