Bug 112042 - [CI][BAT] igt@i915_selftest@live_gem_contexts - dmesg-fail - igt_shared_ctx_exec failed with error -5
Summary: [CI][BAT] igt@i915_selftest@live_gem_contexts - dmesg-fail - igt_shared_ctx_e...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-17 20:26 UTC by Lakshmi
Modified: 2019-11-29 19:41 UTC (History)
1 user (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other


Attachments

Description Lakshmi 2019-10-17 20:26:56 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7099/fi-icl-dsi/igt@i915_selftest@live_gem_contexts.html

(i915_selftest:5079) igt_kmod-WARNING: i915/i915_gem_context_live_selftests: igt_shared_ctx_exec failed with error -5
(i915_selftest:5079) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling always-on
(i915_selftest:5079) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling DC off
(i915_selftest:5079) igt_kmod-WARNING: [drm:gen9_set_dc_state [i915]] Setting DC state from 02 to 00
(i915_selftest:5079) igt_kmod-WARNING: [drm:intel_combo_phy_init [i915]] Combo PHY A already enabled, won't reprogram it.
(i915_selftest:5079) igt_kmod-WARNING: [drm:intel_combo_phy_init [i915]] Combo PHY B already enabled, won't reprogram it.
(i915_selftest:5079) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling power well 2
(i915_selftest:5079) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling power well 3
(i915_selftest:5079) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling power well 4
(i915_selftest:5079) igt_kmod-WARNING: i915: probe of 0000:00:02.0 failed with error -5
(i915_selftest:5079) igt_kmod-CRITICAL: Test assertion failure function igt_kselftest_execute, file ../lib/igt_kmod.c:548:
(i915_selftest:5079) igt_kmod-CRITICAL: Failed assertion: err == 0
(i915_selftest:5079) igt_kmod-CRITICAL: kselftest "i915 igt__31__live_gem_contexts=1 live_selftests=-1 disable_display=1 st_filter=" failed: Input/output error [5]
Subtest live_gem_contexts failed.
Comment 2 Chris Wilson 2019-10-17 20:36:45 UTC
The basic pattern is like

<7> [659.573044] hangcheck rcs0
<7> [659.573066] hangcheck 	Awake? 6
<7> [659.573070] hangcheck 	Hangcheck: 5954 ms ago
<7> [659.573074] hangcheck 	Reset count: 0 (global 0)
<7> [659.573077] hangcheck 	Requests:
<7> [659.573084] hangcheck 	MMIO base:  0x00002000
<7> [659.573879] hangcheck 	RING_START: 0x001d0000
<7> [659.573886] hangcheck 	RING_HEAD:  0x00000028
<7> [659.573893] hangcheck 	RING_TAIL:  0x00000068
<7> [659.573903] hangcheck 	RING_CTL:   0x00003001
<7> [659.573925] hangcheck 	RING_MODE:  0x00000000
<7> [659.574685] hangcheck 	RING_IMR: 00000000
<7> [659.574700] hangcheck 	ACTHD:  0x00000000_001773e4
<7> [659.574714] hangcheck 	BBADDR: 0x00000000_001773e5
<7> [659.574755] hangcheck 	DMA_FADDR: 0x00000000_001775c0
<7> [659.574762] hangcheck 	IPEIR: 0x00000000
<7> [659.574770] hangcheck 	IPEHR: 0xf77d32ef
<7> [659.575542] hangcheck 	Execlist status: 0x00202098 00000020, entries 12
<7> [659.575548] hangcheck 	Execlist CSB read 2, write 2, tasklet queued? no (enabled)
<7> [659.575561] hangcheck 		Active[0]: ring:{start:001d4000, hwsp:fae192c0, seqno:00000000}, rq:  20fd7:2  prio=3 @ 7541ms: [i915]
<7> [659.575567] hangcheck 		Active[1]: rq:  20fd6:4!+  prio=2 @ 7541ms: signaled
<7> [659.575734] hangcheck 		E  20fd7:2  prio=3 @ 7541ms: [i915]
<7> [659.575814] hangcheck 		Queue priority hint: 3
<7> [659.575820] hangcheck 		Q  20fd8:2  prio=3 @ 7540ms: [i915]
<7> [659.575826] hangcheck 		Q  20fd9:2  prio=3 @ 7540ms: [i915]
<7> [659.575832] hangcheck 		Q  20fda:2  prio=3 @ 7540ms: [i915]
<7> [659.575838] hangcheck 		Q  20fdb:2  prio=3 @ 7538ms: [i915]
<7> [659.575844] hangcheck 		Q  20fd7:4-  prio=2 @ 7541ms: [i915]
<7> [659.575850] hangcheck 		Q  20fd8:4  prio=2 @ 7540ms: [i915]
<7> [659.575855] hangcheck 		Q  20fd9:4  prio=2 @ 7540ms: [i915]
<7> [659.575861] hangcheck 		...skipping 2 queued requests...
<7> [659.575867] hangcheck 		Q  20fdc:2  prio=2 @ 7538ms: [i915]
<7> [659.575897] hangcheck HWSP:
<7> [659.575905] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [659.575909] hangcheck *
<7> [659.575915] hangcheck [0040] 10008002 00000000 10008002 00000000 10008002 00000020 10000014 00000060
<7> [659.575921] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000020 10000001 00000000
<7> [659.575926] hangcheck [0080] 10008002 00000040 10000014 00000040 10008002 00000060 10000014 00000060
<7> [659.575931] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000002
<7> [659.575937] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [659.575941] hangcheck *
<7> [659.575946] hangcheck Idle? no

The GPU is not executing the same context as is active, and the HEAD is off in nowhere land. Naturally it dies.
Comment 3 Francesco Balestrieri 2019-11-01 06:02:50 UTC
Last seen two weeks ago with a repro rate of 5 / 142 runs (3.5%)
Comment 4 Martin Peres 2019-11-29 19:41:44 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/522.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.