https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4297/shard-glkb2/igt@gem_ctx_isolation@rcs0-reset.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3829/shard-apl3/igt@gem_ctx_isolation@vcs0-reset.html pstores are mostly gem_execlist_submission ftraces
We hit an assert, can't see which and the trace looks like correct behaviour afaict.
(In reply to Chris Wilson from comment #1) > We hit an assert, can't see which and the trace looks like correct behaviour > afaict. Here is a new one: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4299/shard-kbl2/igt@gem_ctx_isolation@bcs0-reset.html
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4303/shard-kbl5/igt@gem_ctx_isolation@vcs1-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4306/shard-kbl3/igt@gem_ctx_isolation@vcs1-reset.html <0>[ 412.300836] i915/sig-562 3..s2 412290976us : execlists_submission_tasklet: vcs1 out[0]: ctx=13.1, seqno=e, prio=0 <0>[ 412.300844] --------------------------------- <4>[ 412.300848] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp crct10dif_pclmul crc32_pclmul snd_hwdep snd_hda_core ghash_clmulni_intel e1000e snd_pcm mei_me mei prime_numbers <4>[ 412.300880] CPU: 3 PID: 562 Comm: i915/signal:3 Tainted: G U 4.16.0-rc2-CI-CI_DRM_3838+ #1 <4>[ 412.300887] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017 <4>[ 412.300906] RIP: 0010:execlists_submission_tasklet+0x5ee/0xeb0 [i915] <4>[ 412.300912] RSP: 0018:ffff88027ed83ea8 EFLAGS: 00010296 <4>[ 412.300917] RAX: 0000000000000027 RBX: 0000000000000004 RCX: 0000000000000103 <4>[ 412.300923] RDX: 0000000080000103 RSI: ffffffff8211c277 RDI: 00000000ffffffff <4>[ 412.300928] RBP: ffff88027ed83f20 R08: 0000000000000001 R09: 0000000000000000 <4>[ 412.300933] R10: ffff8802713c5ec0 R11: 0000000000000000 R12: ffff880269060008 <4>[ 412.300939] R13: ffff880260fda040 R14: ffff880269060010 R15: ffff880269060008 <4>[ 412.300944] FS: 0000000000000000(0000) GS:ffff88027ed80000(0000) knlGS:0000000000000000 <4>[ 412.300951] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 412.300955] CR2: 00007f941f40d9e0 CR3: 0000000005210002 CR4: 00000000003606e0 <4>[ 412.300961] Call Trace: <4>[ 412.300965] <IRQ> <4>[ 412.300970] tasklet_hi_action+0x89/0x110 <4>[ 412.300976] __do_softirq+0xc1/0x4aa <4>[ 412.300982] irq_exit+0xa4/0xb0 <4>[ 412.300985] do_IRQ+0x67/0x120 <4>[ 412.300990] common_interrupt+0x84/0x84 <4>[ 412.300994] </IRQ> <4>[ 412.300997] RIP: 0010:_raw_spin_unlock_irq+0x2a/0x50 <4>[ 412.301001] RSP: 0018:ffffc90000543db0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffffdd <4>[ 412.301008] RAX: ffff88026841a840 RBX: ffff88027eda1740 RCX: 0000000000000001 <4>[ 412.301013] RDX: 0000000000000000 RSI: ffffffff8210fc21 RDI: 0000000000000001 <4>[ 412.301019] RBP: ffffc90000543e00 R08: 0000000000000001 R09: 0000000000000001 <4>[ 412.301024] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88027537d040 <4>[ 412.301030] R13: ffff88027276f740 R14: ffff88026841a840 R15: 0000000000000001 <4>[ 412.301038] finish_task_switch+0x98/0x240 <4>[ 412.301043] ? finish_task_switch+0x6a/0x240 <4>[ 412.301047] ? __clear_rsb+0x15/0x3d <4>[ 412.301051] ? __switch_to_asm+0x1d/0x30 <4>[ 412.301056] __schedule+0x3cf/0xb00 <4>[ 412.301061] ? _raw_spin_unlock_irqrestore+0x4c/0x60 <4>[ 412.301066] ? __kthread_parkme+0x39/0x90 <4>[ 412.301070] schedule+0x37/0x90 <4>[ 412.301074] __kthread_parkme+0x3e/0x90 <4>[ 412.301093] ? intel_breadcrumbs_signaler+0x59/0x4c0 [i915] <4>[ 412.301112] ? intel_breadcrumbs_signaler+0x59/0x4c0 [i915] <4>[ 412.301131] intel_breadcrumbs_signaler+0x4af/0x4c0 [i915] <4>[ 412.301138] kthread+0xfb/0x130 <4>[ 412.301155] ? __intel_engine_remove_signal+0xb0/0xb0 [i915] <4>[ 412.301160] ? _kthread_create_on_node+0x30/0x30 <4>[ 412.301166] ret_from_fork+0x3a/0x50 <4>[ 412.301171] Code: 7d c8 89 c7 c1 ef 08 83 e7 07 89 fb 41 89 bf c4 03 00 00 e9 e5 fa ff ff 48 c7 c6 59 d6 29 a0 48 c7 c7 b1 d4 29 a0 e8 97 b1 f1 e0 <0f> 0b 48 89 75 a8 4c 89 55 b0 e8 33 73 f3 e0 49 2b 84 24 08 15 <1>[ 412.301231] RIP: execlists_submission_tasklet+0x5ee/0xeb0 [i915] RSP: ffff88027ed83ea8 <4>[ 412.301250] ---[ end trace 5f6a45705eaa4f7f ]--- <0>[ 413.999230] Kernel panic - not syncing: Fatal exception in interrupt <0>[ 413.999245] Dumping ftrace buffer: <0>[ 413.999250] (ftrace buffer empty) <0>[ 413.999254] Kernel Offset: disabled
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_1/fi-kbl-r/igt@gem_ctx_isolation@bcs0-reset.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_4/fi-skl-guc/igt@gem_ctx_isolation@rcs0-reset.html no pstore "header" however first call trace: <4>[ 43.775799] Call Trace: <4>[ 43.775806] <IRQ> <4>[ 43.775831] guc_submission_tasklet+0x37b/0x940 [i915] <4>[ 43.775837] tasklet_hi_action+0x8e/0x110 <4>[ 43.775842] __do_softirq+0xc1/0x4aa <4>[ 43.775846] irq_exit+0xa4/0xb0 <4>[ 43.775849] do_IRQ+0x67/0x120 <4>[ 43.775854] common_interrupt+0xf/0xf
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4372/shard-kbl5/igt@gem_ctx_isolation@bcs0-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_3/fi-bdw-5557u/igt@gem_ctx_isolation@bcs0-s3.html run.log: pass: igt/gem_ctx_isolation/bcs0-s3 [15/97] skip: 8, pass: 7 - FATAL: command execution failed ... Completed CI_IGT_test drmtip_3/fi-bdw-5557u/34 : FAILURE CI_IGT_test runtime 240 seconds Rebooting fi-bdw-5557u last dmesg: <4>[ 40.493107] Setting dangerous option reset - tainting kernel <7>[ 40.497149] [IGT] gem_ctx_isolation: starting subtest bcs0-S3 <6>[ 40.567914] PM: suspend entry (deep)
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3903/fi-cnl-y3/igt@gem_ctx_isolation@vecs0-reset.html run.log: running: igt/gem_ctx_isolation/vecs0-reset [65/98] skip: 29, pass: 34, fail: 2 / FATAL: command execution failed ... Completed CI_IGT_test CI_DRM_3903/fi-cnl-y3/23 : FAILURE CI_IGT_test runtime 843 seconds Rebooting fi-cnl-y3 Last dmesg: <7>[ 529.085780] [drm:verify_single_dpll_state.isra.79 [i915]] DPLL 1 <6>[ 529.119263] Console: switching to colour frame buffer device 480x135 <6>[ 529.269811] Console: switching to colour dummy device 80x25 <7>[ 529.269862] [IGT] gem_ctx_isolation: executing Followed by stray
This should be fixed by commit 0f36a85c3bd5e0dfcbb49af203a96a933dae86cf Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Mar 22 07:35:33 2018 +0000 drm/i915: Flush pending interrupt following a GPU reset
Patch integrated to CI_DRM_3969, I will monitor to hopefully close, will take some time since BAT machines are affected from the shardlist on BAT runs.
(In reply to Chris Wilson from comment #10) > This should be fixed by > commit 0f36a85c3bd5e0dfcbb49af203a96a933dae86cf > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu Mar 22 07:35:33 2018 +0000 > > drm/i915: Flush pending interrupt following a GPU reset Ah, that was only the set-wedge path. Reset path: https://patchwork.freedesktop.org/series/40550/
(In reply to Chris Wilson from comment #12) > (In reply to Chris Wilson from comment #10) > > This should be fixed by > > commit 0f36a85c3bd5e0dfcbb49af203a96a933dae86cf > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Thu Mar 22 07:35:33 2018 +0000 > > > > drm/i915: Flush pending interrupt following a GPU reset > > Ah, that was only the set-wedge path. Reset path: > https://patchwork.freedesktop.org/series/40550/ Is there coming more... or should the bug be set to fixed?
I'll remark the bug as fixed when that patch lands. Hopefully today so we can start to get results over the w/e
commit 46b3617dfec875c1414c6ccbfcab371c97735562 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Mar 23 10:18:24 2018 +0000 drm/i915: Actually flush interrupts on reset not just wedging Commit 0f36a85c3bd5 ("drm/i915: Flush pending interrupt following a GPU reset") got confused and only applied the flush to the set-wedge path (which itself is proving troublesome), but we also need the serialisation on the regular reset path. Oops. Move the interrupt into reset_irq() and make it common to the reset and final set-wedge. v2: reset_irq() after port cancellation, as we assert that execlists->active is sane for cancellation (and is being reset by reset_irq). References: 0f36a85c3bd5 ("drm/i915: Flush pending interrupt following a GPU reset") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180323101824.14645-1-chris@chris-wilson.co.uk
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.