https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6576/shard-glk7/igt@prime_busy@wait-hang-vebox.html <4> [163.610318] ------------[ cut here ]------------ <2> [163.610336] kernel BUG at ./drivers/gpu/drm/i915/intel_wakeref.h:110! <4> [163.610405] invalid opcode: 0000 [#1] PREEMPT SMP PTI <4> [163.610415] CPU: 1 PID: 30 Comm: kworker/u8:1 Tainted: G U 5.3.0-rc2-CI-CI_DRM_6576+ #1 <4> [163.610429] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018 <4> [163.610531] Workqueue: i915 retire_work_handler [i915] <4> [163.610595] RIP: 0010:intel_engine_pm_put+0x44/0x50 [i915] <4> [163.610604] Code: 00 00 00 48 89 df e8 db cd 00 e1 85 c0 75 03 5b 5d c3 48 8d bd c8 bb 00 00 48 89 de 48 c7 c2 50 47 10 a0 5b 5d e9 6c 9b fe ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 48 81 c7 b8 00 00 00 48 c7 c6 <4> [163.610629] RSP: 0018:ffffc9000013bdb8 EFLAGS: 00010246 <4> [163.610638] RAX: 0000000000000000 RBX: ffff88825d8bf700 RCX: 0000000000000001 <4> [163.610648] RDX: 00000000000015b3 RSI: ffff888276083018 RDI: ffff888261bb42a8 <4> [163.610659] RBP: ffff888261ba0000 R08: ffff888276083018 R09: 0000000000000000 <4> [163.610669] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88825d8bf998 <4> [163.610680] R13: ffff88825d8bf990 R14: ffff88825d8bf760 R15: 0000000000000000 <4> [163.610690] FS: 0000000000000000(0000) GS:ffff888277e80000(0000) knlGS:0000000000000000 <4> [163.610703] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [163.610712] CR2: 00007f88a23e6000 CR3: 0000000272460000 CR4: 0000000000340ee0 <4> [163.610722] Call Trace: <4> [163.610788] i915_request_retire+0x35e/0x840 [i915] <4> [163.610859] ring_retire_requests+0x47/0x50 [i915] <4> [163.610927] i915_retire_requests+0x57/0xc0 [i915] <4> [163.610993] retire_work_handler+0x27/0x60 [i915] <4> [163.611006] process_one_work+0x245/0x610 <4> [163.611016] worker_thread+0x1d0/0x380 <4> [163.611025] ? process_one_work+0x610/0x610 <4> [163.611034] kthread+0x119/0x130 <4> [163.611042] ? kthread_park+0xa0/0xa0 <4> [163.611053] ret_from_fork+0x24/0x50 <4> [163.611064] Modules linked in: vgem mei_hdcp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic btusb btrtl btbcm btintel x86_pkg_temp_thermal coretemp bluetooth crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ecdh_generic ecc r8169 i915 realtek i2c_hid snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei pinctrl_geminilake prime_numbers pinctrl_intel <0> [163.611125] Dumping ftrace buffer: <0> [163.611132] --------------------------------- <0> [163.611206] CPU:1 [LOST 573578 EVENTS]
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * GLK: igt@prime_busy@wait-hang-vebox - dmesg-warn - kernel BUG at ./drivers/gpu/drm/i915/intel_wakeref.h:110! - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6576/shard-glk7/igt@prime_busy@wait-hang-vebox.html
Looks very innocuous. There's at least one change pending to i915_active that will affect this, so watch this space?
Setting priority to low based on Chris' comment.
A CI Bug Log filter associated to this bug has been updated: {- GLK: igt@prime_busy@wait-hang-vebox - dmesg-warn - kernel BUG at ./drivers/gpu/drm/i915/intel_wakeref.h:110! -} {+ SNB GLK: igt@prime_busy@wait-hang-vebox - dmesg-warn - kernel BUG at ./drivers/gpu/drm/i915/intel_wakeref.h:110! +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6601/shard-snb6/igt@gem_eio@in-flight-suspend.html
A CI Bug Log filter associated to this bug has been updated: {- SNB GLK: igt@prime_busy@wait-hang-vebox - dmesg-warn - kernel BUG at ./drivers/gpu/drm/i915/intel_wakeref.h:110! -} {+ SNB GLK: igt@prime_busy@wait-hang-vebox|igt@gem_eio@in-flight-suspend - dmesg-warn - kernel BUG at ./drivers/gpu/drm/i915/intel_wakeref.h:110! +} No new failures caught with the new filter
No causal link, but commit d8af05ff38ae7a42819b285ffef314942414ef8b (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Aug 2 11:00:15 2019 +0100 drm/i915: Allow sharing the idle-barrier from other kernel requests By placing our idle-barriers in the i915_active fence tree, we expose those for reuse by other components that are issuing requests along the kernel_context. Reusing the proto-barrier active_node is perfectly fine as the new request implies a context-switch, and so an opportune point to run the idle-barrier. However, the proto-barrier is not equivalent to a normal active_node and care must be taken to avoid dereferencing the ERR_PTR used as its request marker. is likely related. Watch this space.
commit c7302f204490f3eb4ef839bec228315bcd3ba43f (drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Aug 8 21:27:58 2019 +0100 drm/i915: Defer final intel_wakeref_put to process context As we need to acquire a mutex to serialise the final intel_wakeref_put, we need to ensure that we are in process context at that time. However, we want to allow operation on the intel_wakeref from inside timer and other hardirq context, which means that need to defer that final put to a workqueue. Inside the final wakeref puts, we are safe to operate in any context, as we are simply marking up the HW and state tracking for the potential sleep. It's only the serialisation with the potential sleeping getting that requires careful wait avoidance. This allows us to retain the immediate processing as before (we only need to sleep over the same races as the current mutex_lock). v2: Add a selftest to ensure we exercise the code while lockdep watches. v3: That test was extremely loud and complained about many things! v4: Not a whale! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111295 References: https://bugs.freedesktop.org/show_bug.cgi?id=111245 References: https://bugs.freedesktop.org/show_bug.cgi?id=111256 Fixes: 18398904ca9e ("drm/i915: Only recover active engines") Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190808202758.10453-1-chris@chris-wilson.co.uk Probably.
The reproduction rate of this issue is once in 8.3 runs. Last seen CI_DRM_6601_full (3 months, 4 weeks old) and current run is 7435. Archiving this bug.
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.