CI_DRM_3215, CI_DRM_3222 SNB-shards incomplete on igt@gem_eio@in-flight-external Note since SNB doesn't support pstore we have very litte info on what that happened: Last dmesgs show signs of corruption: CI_DRM_3215 <6>[ 17.219831] Console: switching to colour dummy device 80x25 <7>[ 17.219902] [IGT] prime_mmap_coherency: executing <7>[ 17.247285] [IGT] prime_mmap_coherency: starting subtest write-and-fail <7>[ 17.247473] [IGT] prime_mmap_coherency: exiting, ret=77 <6>[ 17.291054] Console: switching to colour frame buffer device 128x48 ^@^@^@^@^@^@ CI_DRM_3222 <7>[ 130.453554] [IGT] gem_exec_store: starting subtest pages-blt <7>[ 130.491475] [IGT] gem_exec_store: exiting, ret=0 <6>[ 130.521308] Console: switching to colour frame buffer device 128x48 <6>[ 130.688792] Console: switching to colour dummy device 80x25 <7>[ 130.688856] [IGT] gem_eio: executing ^@^@^@^@^@^@ dmesg timestamp when this happen * does not suggest system timeout (external timeout is 22 minutes) * does not suggest owatch timeout (owatch timeout is 6 minutes on shards) https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3215/shard-snb4/igt@gem_eio@in-flight-external.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3222/shard-snb1/igt@gem_eio@in-flight-external.html
One possibility: https://patchwork.freedesktop.org/series/31848/
(In reply to Chris Wilson from comment #1) > One possibility: https://patchwork.freedesktop.org/series/31848/ Swing and a miss.
*** Bug 103289 has been marked as a duplicate of this bug. ***
Fwiw, the fix is https://patchwork.freedesktop.org/series/31987/
commit 6f74b36b92cf9ee6450258fa341cff7c455a138f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Oct 15 15:37:25 2017 +0100 drm/i915: Skip HW reinitialisation on resume if still wedged If we fail to recover the HW state upon resume (i.e. our attempt to clear the wedged bit and reset during i915_gem_sanitize() fails), then skip the HW restart inside i915_gem_init_hw(). We will ultimately do the HW restart when successfully unwedging and resetting the HW later, but attempting to restore a wedged device upon resume is risky as the HW is in an unknown state.
And CI apears to be hitting a completely different issue, or a second issue, compared to my machine.
CI_DRM_3277 shard-kbl3 igt@gem_eio@in-flight-suspend fail https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3277/shard-kbl3/igt@gem_eio@in-flight-suspend.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3297/shard-apl6/igt@gem_eio@in-flight-suspend.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3297/shard-kbl4/igt@gem_eio@in-flight-suspend.html
Also, note: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3300/shard-apl5/igt@gem_eio@in-flight-suspend.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3301/shard-glkb3/igt@gem_eio@in-flight-suspend.html (gem_eio:9748) igt-aux-CRITICAL: Test assertion failure function suspend_via_sysfs, file igt_aux.c:816: (gem_eio:9748) igt-aux-CRITICAL: Failed assertion: igt_sysfs_set(power_dir, "state", suspend_state_name[state]) (gem_eio:9748) igt-aux-CRITICAL: Last errno: 16, Device or resource busy Subtest in-flight-suspend failed.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-hsw5/igt@gem_eio@in-flight-suspend.html (gem_eio:3803) igt-aux-CRITICAL: Test assertion failure function suspend_via_sysfs, file igt_aux.c:816: (gem_eio:3803) igt-aux-CRITICAL: Failed assertion: igt_sysfs_set(power_dir, "state", suspend_state_name[state]) (gem_eio:3803) igt-aux-CRITICAL: Last errno: 16, Device or resource busy Subtest in-flight-suspend failed.
The SNB-shards incomplete should be fixed by: commit b9f2abda9503bd55690cf3c2ccf2f20e8fc19ab3 Author: Petri Latvala <petri.latvala@intel.com> Date: Mon Oct 30 11:48:19 2017 +0200 tests/gem_eio: Nerf in-flight-suspend Use TEST_NONE instead of TEST_DEVICES to prevent a machine death that happens on a particular model of SNB (2600 is affected, 2520m is not). Reset is unreliable, but the exact setup to trigger the death and how to work around it are not found at this time. There is some kind of a race lurking, and this commit is a workaround that avoids it, leaving the test still exercising some of the codepaths. References: https://intel-gfx-ci.01.org/tree/drm-tip/igt@gem_eio@in-flight-suspend.html References: https://bugs.freedesktop.org/show_bug.cgi?id=103289 Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Petri Latvala <petri.latvala@intel.com> CC: Daniel Vetter <daniel.vetter@ffwll.ch> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> CC: "Lofstedt, Marta" <marta.lofstedt@intel.com> CC: Martin Peres <martin.peres@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> This was integrated into CI_DRM_3317. So, this will have to marinate ~10 runs before I archive.
This has looked good since integration, also note the faila on APL, KBL and GLK are bug 103375
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.