Bug 107274

Summary: [CI][SHARDS] igt@gem_exec_nop@basic-parallel - incomplete - GEM_BUG_ON(!engine->i915->gt.awake)
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: SKL i915 features: GEM/execlists

Description Martin Peres 2018-07-18 11:46:30 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4539/fi-skl-6770hq/igt@gem_exec_nop@basic-parallel.html

<0>[  204.711005]   <idle>-0       0..s1 204771804us : execlists_submission_tasklet: rcs0 awake?=1, active=0
<0>[  204.711042]   <idle>-0       0d.s2 204771889us : __execlists_submission_tasklet: __execlists_submission_tasklet:1121 GEM_BUG_ON(!engine->i915->gt.awake)
<0>[  204.711055] ---------------------------------
<4>[  204.711061] ---[ end trace 2cdd8fa3fccc65dc ]---
<4>[  204.821158] RIP: 0010:__execlists_submission_tasklet+0xc8/0xcb0 [i915]
<4>[  204.821165] Code: e7 12 dc e0 48 8b 35 3f b1 19 00 49 c7 c0 69 df 41 a0 b9 61 04 00 00 48 c7 c2 20 41 40 a0 48 c7 c7 d3 8f 33 a0 e8 b8 a3 e2 e0 <0f> 0b 4c 8b bb 78 04 00 00 48 8d 83 70 04 00 00 4c 8b ab 40 04 00 
<4>[  204.821222] RSP: 0018:ffff8804bec03e98 EFLAGS: 00010082
<4>[  204.821229] RAX: 000000000000000f RBX: ffff88049fa9a158 RCX: 0000000000000000
<4>[  204.821236] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff8804acc2fa38
<4>[  204.821243] RBP: ffff8804bec03ef8 R08: 000000000118fab5 R09: ffff8804acc36000
<4>[  204.821250] R10: ffff8804bec03e80 R11: ffff8804acc2fa38 R12: 0000000000000000
<4>[  204.821257] R13: ffff8804bec15540 R14: 0000000000000000 R15: 0000000000000001
<4>[  204.821264] FS:  0000000000000000(0000) GS:ffff8804bec00000(0000) knlGS:0000000000000000
<4>[  204.821272] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  204.821278] CR2: 00007f80cdab2228 CR3: 00000004a631e004 CR4: 00000000003606f0
<4>[  204.821285] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[  204.821292] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<0>[  204.821299] Kernel panic - not syncing: Fatal exception in interrupt
<0>[  205.892200] Shutting down cpus with NMI
<0>[  205.892208] Dumping ftrace buffer:
<0>[  205.892212]    (ftrace buffer empty)
<0>[  205.892216] Kernel Offset: disabled

This may be related to this bug: https://bugs.freedesktop.org/show_bug.cgi?id=106953
Comment 1 Chris Wilson 2018-07-18 11:55:52 UTC
Nope, this is a separate race between the tasklet and idle work.

<0>[  204.710681] kworker/-7       2.... 204771752us : i915_gem_park: 
<0>[  204.710712] kworker/-7       2.... 204771768us : i915_gem_switch_to_kernel_context: awake?=yes
<0>[  204.710746] kworker/-7       2.... 204771769us : i915_gem_idle_work_handler: active_requests=0 (after switch-to-kernel-context)
<0>[  204.710785] kworker/-7       2.... 204771771us : execlists_submission_tasklet: rcs0 awake?=1, active=5
<0>[  204.710822] kworker/-7       2d..1 204771772us : process_csb: rcs0 cs-irq head=5, tail=0
<0>[  204.710859] kworker/-7       2d..1 204771773us : process_csb: rcs0 csb[0]: status=0x00000018:0x00000000, active=0x5
<0>[  204.710898] kworker/-7       2d..1 204771773us : process_csb: rcs0 out[0]: ctx=0.1, global=35845 (fence 5:43) (current 35845), prio=-1024
<0>[  204.710936] kworker/-7       2d..1 204771779us : process_csb: rcs0 completed ctx=0
<0>[  204.710970] kworker/-7       2.... 204771790us : i915_gem_idle_work_handler: 
<0>[  204.711005]   <idle>-0       0..s1 204771804us : execlists_submission_tasklet: rcs0 awake?=1, active=0
<0>[  204.711042]   <idle>-0       0d.s2 204771889us : __execlists_submission_tasklet: __execlists_submission_tasklet:1121 GEM_BUG_ON(!engine->i915->gt.awake)
Comment 2 Chris Wilson 2018-07-18 12:03:18 UTC
It shouldn't be possible... It requires us to set gt.awake=false while the tasklet is running, but before we do we park the engines and flush the tasklets in the process. We must have kicked off another tasklet_schedule after intel_engines_park.

I think...
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index db5351e6a3a5..6921406a7250 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1074,7 +1074,7 @@ static void execlists_submission_tasklet(unsigned long data)
 
        spin_lock_irqsave(&engine->timeline.lock, flags);
 
-       if (engine->i915->gt.awake) /* we may be delayed until after we idle! */
+       if (engine->execlists.active) /* we may be delayed until after idle! */
                __execlists_submission_tasklet(engine);
 
        spin_unlock_irqrestore(&engine->timeline.lock, flags);
Comment 3 Chris Wilson 2018-07-19 12:26:58 UTC
commit d78d3343dce7787a5f7fd0b3d522a3510fd26ef9 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jul 19 08:50:29 2018 +0100

    drm/i915/execlists: Move the assertion we have the rpm wakeref down
    
    There's a race between idling the engine and finishing off the last
    tasklet (as we may kick the tasklets after declaring an individual
    engine idle). However, since we do not need to access the device until
    we try to submit to the ELSP register (processing the CSB just requires
    normal CPU access to the HWSP, and when idle we should not need to
    submit!) we can defer the assertion unto that point. The assertion is
    still useful as it does verify that we do hold the longterm GT wakeref
    taken from request allocation until request completion.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107274
    Fixes: 9512f985c32d ("drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180719075029.28643-1-chris@chris-wilson.co.uk
Comment 4 Francesco Balestrieri 2018-08-04 09:20:08 UTC
Martin, OK to close?
Comment 5 Francesco Balestrieri 2018-08-07 08:00:02 UTC
Not seen for a month, closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.