https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4817/shard-glk2/igt@gem_eio@in-flight-suspend.html <3> [505.276217] process_csb:988 GEM_BUG_ON(!execlists_is_active(execlists, 0)) <4> [505.276399] ------------[ cut here ]------------ <2> [505.276402] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:988! <4> [505.276438] invalid opcode: 0000 [#1] PREEMPT SMP PTI <4> [505.276449] CPU: 1 PID: 2943 Comm: gem_eio Tainted: G U 4.19.0-rc3-CI-CI_DRM_4817+ #1 <4> [505.276462] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018 <4> [505.276527] RIP: 0010:process_csb+0x4a8/0x780 [i915] <4> [505.276536] Code: 57 70 f9 e0 48 8b 35 7f c5 19 00 49 c7 c0 90 6e 26 a0 b9 dc 03 00 00 48 c7 c2 d0 f4 22 a0 48 c7 c7 d3 2d 16 a0 e8 f8 fe ff e0 <0f> 0b 48 8b 75 d0 4c 8d a6 88 16 00 00 4c 89 e7 e8 f3 c7 7d e1 48 <4> [505.276562] RSP: 0018:ffffc90002f7ba48 EFLAGS: 00010086 <4> [505.276572] RAX: 000000000000000d RBX: ffff880268b12158 RCX: 0000000000000000 <4> [505.276582] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff880276d98ff8 <4> [505.276593] RBP: ffffc90002f7bab0 R08: 0000000000154618 R09: ffff88027666a000 <4> [505.276604] R10: ffffc90002f7ba38 R11: ffff880276d98ff8 R12: ffff88026504704c <4> [505.276615] R13: 0000000000000001 R14: ffff880265047048 R15: ffff880265047040 <4> [505.276626] FS: 00007fd29b4b7980(0000) GS:ffff880277e80000(0000) knlGS:0000000000000000 <4> [505.276638] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [505.276647] CR2: 00005602f1dbe490 CR3: 0000000249f9e000 CR4: 0000000000340ee0 <4> [505.276657] Call Trace: <4> [505.276716] execlists_reset_prepare+0x54/0x150 [i915] <4> [505.276772] i915_gem_reset_prepare_engine+0x20/0x40 [i915] <4> [505.276826] i915_gem_reset_prepare+0x2c/0x70 [i915] <4> [505.276876] i915_reset+0x117/0x280 [i915] <4> [505.276925] i915_reset_device+0x1fb/0x290 [i915] <4> [505.276976] ? __intel_get_crtc_scanline+0x1c0/0x1c0 [i915] <4> [505.276991] ? work_on_cpu_safe+0x50/0x50 <4> [505.277041] i915_handle_error+0x219/0x350 [i915] <4> [505.277097] ? reset_all_global_seqno.part.5+0x3c/0x260 [i915] <4> [505.277109] ? mark_held_locks+0x50/0x80 <4> [505.277159] ? i915_drop_caches_set+0x16e/0x260 [i915] <4> [505.277171] ? _raw_spin_unlock_irqrestore+0x39/0x60 <4> [505.277182] ? __mutex_unlock_slowpath+0x46/0x2b0 <4> [505.277234] i915_drop_caches_set+0x1c6/0x260 [i915] <4> [505.277246] simple_attr_write+0xb0/0xd0 <4> [505.277256] full_proxy_write+0x51/0x80 <4> [505.277267] __vfs_write+0x31/0x180 <4> [505.277275] ? rcu_lockdep_current_cpu_online+0x8f/0xd0 <4> [505.277286] ? rcu_read_lock_sched_held+0x6f/0x80 <4> [505.277295] ? rcu_sync_lockdep_assert+0x29/0x50 <4> [505.277305] ? __sb_start_write+0x152/0x1f0 <4> [505.277313] ? __sb_start_write+0x168/0x1f0 <4> [505.277322] vfs_write+0xbd/0x1b0 <4> [505.277331] ksys_write+0x50/0xc0 <4> [505.277340] do_syscall_64+0x55/0x190 <4> [505.277349] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [505.277358] RIP: 0033:0x7fd29aa312b7 <4> [505.277365] Code: 44 00 00 41 54 55 49 89 d4 53 48 89 f5 89 fb 48 83 ec 10 e8 5b fd ff ff 4c 89 e2 41 89 c0 48 89 ee 89 df b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 94 fd ff ff 48 <4> [505.277391] RSP: 002b:00007ffc240d9280 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 <4> [505.277404] RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007fd29aa312b7 <4> [505.277415] RDX: 0000000000000005 RSI: 00007ffc240d9330 RDI: 0000000000000009 <4> [505.277426] RBP: 00007ffc240d9330 R08: 0000000000000000 R09: 0000000000000000 <4> [505.277436] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000005 <4> [505.277447] R13: 0000000000000003 R14: 00007fd29aa1f628 R15: 00007fd29aa1bd80
Should hopefully be fixed by https://patchwork.freedesktop.org/patch/249316/
Might want to summon Petri here. An interesting case, WARN because the failure happens after the test itself finished, but we BUGed out. That should be a much more severe error, previously an incomplete, as the machine rebooted.
commit 8db601f09127eb974e6fcf7fb30c70344d5727f6 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Sep 14 09:00:17 2018 +0100 drm/i915/execlists: Reset CSB pointers on canceling requests (wedging) The prior assumption was that we did not need to reset the CSB on wedging when cancelling the outstanding requests as it would be cleaned up in the subsequent reset prior to restarting the GPU. However, what was not accounted for was that in preparing for the reset, we would try to process the outstanding CSB entries. If the GPU happened to complete a CS event just as we were performing the cancellation of requests, that event would be kept in the CSB until the reset -- but our bookkeeping was cleared, causing confusion when trying to complete the CS event. v2: Use a sanitize on unwedge to avoid interfering with eio suspend (where we intentionally disable GPU reset). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107925 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-3-chris@chris-wilson.co.uk
This issue occured only once with CI_DRM_4817_full (1 month / 400 runs ago).
(In reply to Lakshmi from comment #4) > This issue occured only once with CI_DRM_4817_full (1 month / 400 runs ago). In these cases, feel free to close and archive the issue in CI Bug log ;)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.