https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4460/fi-kbl-7500u/igt@drv_selftest@live_hangcheck.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4456/fi-kbl-7560u/igt@drv_selftest@live_hangcheck.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4465/fi-kbl-7567u/igt@drv_selftest@live_hangcheck.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4472/fi-glk-dsi/igt@drv_selftest@live_hangcheck.html <4>[ 639.552375] ------------[ cut here ]------------ <2>[ 639.552379] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:1040! <4>[ 639.552419] invalid opcode: 0000 [#1] PREEMPT SMP PTI <4>[ 639.552434] CPU: 3 PID: 31 Comm: ksoftirqd/3 Tainted: G U 4.18.0-rc4-CI-CI_DRM_4472+ #1 <4>[ 639.552454] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 639.552600] RIP: 0010:process_csb+0x4b3/0x770 [i915] <4>[ 639.552614] Code: bc 09 c5 e0 48 8b 35 4c b8 19 00 49 c7 c0 00 c8 5a a0 b9 10 04 00 00 48 c7 c2 50 4f 57 a0 48 c7 c7 de 99 4a a0 e8 8d 9a cb e0 <0f> 0b 48 8b 75 d0 4c 8d a6 30 16 00 00 4c 89 e7 e8 28 48 49 e1 48 <4>[ 639.552739] RSP: 0018:ffffc9000016bd38 EFLAGS: 00010082 <4>[ 639.552752] RAX: 000000000000000d RBX: ffff88016f26c2a8 RCX: 0000000000000000 <4>[ 639.552768] RDX: 0000000000000000 RSI: 000000000000004c RDI: 0000000000000000 <4>[ 639.552782] RBP: ffffc9000016bda0 R08: ffffffffa05ac800 R09: 0000000000000001 <4>[ 639.552798] R10: ffffc9000016bd90 R11: 0000000000000000 R12: ffff88016e72605c <4>[ 639.552813] R13: 0000000000000003 R14: ffff88016e726058 R15: ffff88016e726040 <4>[ 639.552829] FS: 0000000000000000(0000) GS:ffff88017fd80000(0000) knlGS:0000000000000000 <4>[ 639.552846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 639.552860] CR2: 000055abb55db5e8 CR3: 000000016943c000 CR4: 0000000000340ee0 <4>[ 639.552875] Call Trace: <4>[ 639.552968] __execlists_submission_tasklet+0x32/0xc00 [i915] <4>[ 639.553066] execlists_submission_tasklet+0x55/0x70 [i915] <4>[ 639.553088] tasklet_action_common.isra.5+0x47/0xb0 <4>[ 639.553102] ? smpboot_thread_fn+0x6b/0x280 <4>[ 639.553117] __do_softirq+0xd9/0x505 <4>[ 639.553129] ? smpboot_thread_fn+0x23/0x280 <4>[ 639.553142] ? smpboot_thread_fn+0x6b/0x280 <4>[ 639.553153] run_ksoftirqd+0x29/0x50 <4>[ 639.553164] smpboot_thread_fn+0x1d3/0x280 <4>[ 639.553176] ? sort_range+0x20/0x20 <4>[ 639.553187] kthread+0x119/0x130 <4>[ 639.553199] ? kthread_flush_work_fn+0x10/0x10 <4>[ 639.553213] ret_from_fork+0x3a/0x50
I think this the same bug as 106560 but with different symptoms. Should all be cleared up real soon now (tm).
Memory is hazy, but I do think we closed this BUG loop hole. Hmm, iirc, it was a double wedge. Ok, not quite closed yet as I still have a patch to prevent double wedges.
Hmm, we have commit 3970c65c2b47c450f917bc8a29c5849563a95dfe Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jul 23 15:53:35 2018 +0100 drm/i915: Skip repeated calls to i915_gem_set_wedged() If we already wedged, i915_gem_set_wedged() becomes a complicated no-op. References: https://bugs.freedesktop.org/show_bug.cgi?id=107343 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180723145335.24579-1-c hris@chris-wilson.co.uk + commit f1a498fa549e8e86895cda37e3fca867aae955b7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jul 16 09:03:30 2018 +0100 drm/i915/execlists: Disable submission tasklet upon wedging If we declare the driver wedged before the GPU truly is, then we may see the GPU complete some CS events following our cancellation. This leaves us quite confused as we deleted all the bookkeeping and thus complain about the inconsistent state. We can just ignore the remaining events and let the GPU idle by not feeding it, and so avoid trying to racily overwrite shared state. We rely on there being a full GPU reset before unwedging, giving us the opportunity to reset the shared state. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107188 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180716080332.32283-4-c hris@chris-wilson.co.uk I think accounts for it.
Closed as this seen 1 month ago.
This bug used to occur after 2-23 rounds. This issue was not seen last 221 rounds. Closing this issue.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.