https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4539/fi-skl-6770hq/igt@gem_exec_nop@basic-parallel.html <0>[ 204.711005] <idle>-0 0..s1 204771804us : execlists_submission_tasklet: rcs0 awake?=1, active=0 <0>[ 204.711042] <idle>-0 0d.s2 204771889us : __execlists_submission_tasklet: __execlists_submission_tasklet:1121 GEM_BUG_ON(!engine->i915->gt.awake) <0>[ 204.711055] --------------------------------- <4>[ 204.711061] ---[ end trace 2cdd8fa3fccc65dc ]--- <4>[ 204.821158] RIP: 0010:__execlists_submission_tasklet+0xc8/0xcb0 [i915] <4>[ 204.821165] Code: e7 12 dc e0 48 8b 35 3f b1 19 00 49 c7 c0 69 df 41 a0 b9 61 04 00 00 48 c7 c2 20 41 40 a0 48 c7 c7 d3 8f 33 a0 e8 b8 a3 e2 e0 <0f> 0b 4c 8b bb 78 04 00 00 48 8d 83 70 04 00 00 4c 8b ab 40 04 00 <4>[ 204.821222] RSP: 0018:ffff8804bec03e98 EFLAGS: 00010082 <4>[ 204.821229] RAX: 000000000000000f RBX: ffff88049fa9a158 RCX: 0000000000000000 <4>[ 204.821236] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff8804acc2fa38 <4>[ 204.821243] RBP: ffff8804bec03ef8 R08: 000000000118fab5 R09: ffff8804acc36000 <4>[ 204.821250] R10: ffff8804bec03e80 R11: ffff8804acc2fa38 R12: 0000000000000000 <4>[ 204.821257] R13: ffff8804bec15540 R14: 0000000000000000 R15: 0000000000000001 <4>[ 204.821264] FS: 0000000000000000(0000) GS:ffff8804bec00000(0000) knlGS:0000000000000000 <4>[ 204.821272] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 204.821278] CR2: 00007f80cdab2228 CR3: 00000004a631e004 CR4: 00000000003606f0 <4>[ 204.821285] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[ 204.821292] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 <0>[ 204.821299] Kernel panic - not syncing: Fatal exception in interrupt <0>[ 205.892200] Shutting down cpus with NMI <0>[ 205.892208] Dumping ftrace buffer: <0>[ 205.892212] (ftrace buffer empty) <0>[ 205.892216] Kernel Offset: disabled This may be related to this bug: https://bugs.freedesktop.org/show_bug.cgi?id=106953
Nope, this is a separate race between the tasklet and idle work. <0>[ 204.710681] kworker/-7 2.... 204771752us : i915_gem_park: <0>[ 204.710712] kworker/-7 2.... 204771768us : i915_gem_switch_to_kernel_context: awake?=yes <0>[ 204.710746] kworker/-7 2.... 204771769us : i915_gem_idle_work_handler: active_requests=0 (after switch-to-kernel-context) <0>[ 204.710785] kworker/-7 2.... 204771771us : execlists_submission_tasklet: rcs0 awake?=1, active=5 <0>[ 204.710822] kworker/-7 2d..1 204771772us : process_csb: rcs0 cs-irq head=5, tail=0 <0>[ 204.710859] kworker/-7 2d..1 204771773us : process_csb: rcs0 csb[0]: status=0x00000018:0x00000000, active=0x5 <0>[ 204.710898] kworker/-7 2d..1 204771773us : process_csb: rcs0 out[0]: ctx=0.1, global=35845 (fence 5:43) (current 35845), prio=-1024 <0>[ 204.710936] kworker/-7 2d..1 204771779us : process_csb: rcs0 completed ctx=0 <0>[ 204.710970] kworker/-7 2.... 204771790us : i915_gem_idle_work_handler: <0>[ 204.711005] <idle>-0 0..s1 204771804us : execlists_submission_tasklet: rcs0 awake?=1, active=0 <0>[ 204.711042] <idle>-0 0d.s2 204771889us : __execlists_submission_tasklet: __execlists_submission_tasklet:1121 GEM_BUG_ON(!engine->i915->gt.awake)
It shouldn't be possible... It requires us to set gt.awake=false while the tasklet is running, but before we do we park the engines and flush the tasklets in the process. We must have kicked off another tasklet_schedule after intel_engines_park. I think... diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index db5351e6a3a5..6921406a7250 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1074,7 +1074,7 @@ static void execlists_submission_tasklet(unsigned long data) spin_lock_irqsave(&engine->timeline.lock, flags); - if (engine->i915->gt.awake) /* we may be delayed until after we idle! */ + if (engine->execlists.active) /* we may be delayed until after idle! */ __execlists_submission_tasklet(engine); spin_unlock_irqrestore(&engine->timeline.lock, flags);
commit d78d3343dce7787a5f7fd0b3d522a3510fd26ef9 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jul 19 08:50:29 2018 +0100 drm/i915/execlists: Move the assertion we have the rpm wakeref down There's a race between idling the engine and finishing off the last tasklet (as we may kick the tasklets after declaring an individual engine idle). However, since we do not need to access the device until we try to submit to the ELSP register (processing the CSB just requires normal CPU access to the HWSP, and when idle we should not need to submit!) we can defer the assertion unto that point. The assertion is still useful as it does verify that we do hold the longterm GT wakeref taken from request allocation until request completion. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107274 Fixes: 9512f985c32d ("drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180719075029.28643-1-chris@chris-wilson.co.uk
Martin, OK to close?
Not seen for a month, closing.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.