Bug 103880 - [CI][SNB only] igt@* - incomplete - timout/system hang
Summary: [CI][SNB only] igt@* - incomplete - timout/system hang
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Marta Löfstedt
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-24 07:17 UTC by Marta Löfstedt
Modified: 2018-03-02 07:47 UTC (History)
1 user (show)

See Also:
i915 platform: SNB
i915 features:


Attachments

Description Marta Löfstedt 2017-11-24 07:17:05 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4009/shard-snb1/igt@drv_selftest@live_hangcheck.html

dmesg:
<5>[ 1124.912672] owatch: Using watchdog device /dev/watchdog0
<5>[ 1124.912727] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 1124.913273] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<4>[ 1221.407301] i915: probe of 0000:00:02.0 failed with error -25
<7>[ 1221.436150] [IGT] drv_selftest: exiting, ret=0
<7>[ 1221.498380] [IGT] drv_selftest: executing

run.log:
running: igt/drv_selftest/live_hangcheck

[24/43] pass: 23, dmesg-warn: 1 |       
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3376/shard-snb1/35 : FAILURE
CI_IGT_test runtime 451 seconds
Rebooting shard-snb1

Note, SNB doesn't support pstore so it could be NMI/kernel OOPS. However, if it was OWATCH it would be noticed in run.log.
Comment 1 Chris Wilson 2017-11-24 10:45:37 UTC
live_hangcheck should be capable of causing a machine hang in snb (because snb).
Comment 2 Marta Löfstedt 2017-12-01 12:37:04 UTC
I will use this bug to pile up SNB incompletes timeout/system hang as for other machines.
Comment 3 Marta Löfstedt 2017-12-01 12:38:38 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4030/shard-snb6/igt@kms_flip@vblank-vs-hang.html

<5>[ 5961.400179] owatch: Using watchdog device /dev/watchdog0
<5>[ 5961.400224] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 5961.400855] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[ 6167.786992] [drm:verify_connector_state.isra.76 [i915]] [CONNECTOR:48:VGA-1]
<7>[ 6167.787021] [drm:intel_atomic_commit_tail [i915]] [CRTC:47:pipe B]
<7>[ 6167.787060] [drm:verify_single_dpll_state.isra.77 [i915]] PCH DPLL A
followed by "stray"

run.log:
running: igt/kms_flip/vblank-vs-hang

[59/75] skip: 35, pass: 24 \        
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3427/shard-snb6/20 : FAILURE
CI_IGT_test runtime 677 seconds
Rebooting shard-snb6
Comment 4 Marta Löfstedt 2017-12-04 07:06:50 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3439/shard-snb4/igt@gem_eio@suspend.html

dmesg:
<7>[ 1079.345572] [IGT] gem_eio: executing
<4>[ 1079.361745] Setting dangerous option reset - tainting kernel
<7>[ 1079.362293] [drm:i915_reset_device [i915]] resetting chip
<5>[ 1079.362470] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[ 1079.362647] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 0
<7>[ 1079.363232] [IGT] gem_eio: starting subtest suspend
<4>[ 1079.363287] Setting dangerous option reset - tainting kernel
<7>[ 1079.363823] [drm:i915_reset_device [i915]] resetting chip
<5>[ 1079.363887] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[ 1079.364026] [drm:i915_reset [i915]] GPU reset disabled

run.log:
Completed CI_IGT_test CI_DRM_3439/shard-snb4/33 : FAILURE
CI_IGT_test runtime 154 seconds
Rebooting shard-snb4
Comment 6 Marta Löfstedt 2018-01-25 12:47:56 UTC
This is a Meta bug to capture all unexplained incompletes on SNB
Comment 7 Marta Löfstedt 2018-03-02 07:47:51 UTC
Last seem: CI_DRM_3700: 2018-01-30 / 259 runs ago


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.