https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6847/shard-skl5/igt@perf_pmu@busy-idle-no-semaphores-bcs0.html [IGT] perf_pmu: executing <6> [1794.462364] [IGT] perf_pmu: starting subtest busy-idle-no-semaphores-bcs0 <4> [1795.080225] <4> [1795.080255] ======================================================== <4> [1795.080277] WARNING: possible irq lock inversion dependency detected <4> [1795.080299] 5.3.0-rc7-CI-CI_DRM_6847+ #1 Tainted: G U <4> [1795.080316] -------------------------------------------------------- <4> [1795.080334] kworker/0:0H/3056 just changed the state of lock: <4> [1795.080352] 0000000062a190ff (&timeline->mutex/2){-...}, at: __engine_park+0x3e/0x320 [i915] <4> [1795.080638] but this lock took another, HARDIRQ-unsafe lock in the past: <4> [1795.080655] (&(&lock->wait_lock)->rlock){+.+.} <4> [1795.080660] and interrupts could create inverse lock ordering between them. <4> [1795.080698] other info that might help us debug this: <4> [1795.080716] Chain exists of: &timeline->mutex/2 --> &(&timelines->lock)->rlock --> &(&lock->wait_lock)->rlock <4> [1795.080755] Possible interrupt unsafe locking scenario: <4> [1795.080774] CPU0 CPU1 <4> [1795.080788] ---- ---- <4> [1795.080802] lock(&(&lock->wait_lock)->rlock); <4> [1795.080819] local_irq_disable(); <4> [1795.080835] lock(&timeline->mutex/2); <4> [1795.080855] lock(&(&timelines->lock)->rlock); <4> [1795.080876] <Interrupt> <4> [1795.080886] lock(&timeline->mutex/2); <4> [1795.080904] *** DEADLOCK ***
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * SKL: igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6847/shard-skl5/igt@perf_pmu@busy-idle-no-semaphores-bcs0.html
That looks to be a usb lockchain, if I am reading that correctly.
A CI Bug Log filter associated to this bug has been updated: {- SKL: igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ SKL: igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6878/shard-skl4/igt@perf_pmu@busy-idle-no-semaphores-vecs0.html
What about intel_engine_pm_put from i915_pmu.c/engines_sample (hrtimer)? Do we need to hold an engine pm ref for the duration of the event being enabled? However those hooks are also potentially from irqoff context...
I was confused by mutex under hardirq but looks that's deliberate from __timeline_mark_lock (d67739268cf0ee928cdc5e8224090c59efacd653).
Yes, we need a runtime-pm reference, and we use the active engine-pm as that also tells if it was idle and we can skip. The engine-pm put(wakeref) should be irqsafe for this purpose. I don't know what the lock->wait_lock is -- it doesn't look like one of ours, which makes it more baffling how it got coupled into our wakeref.
A CI Bug Log filter associated to this bug has been updated: {- SKL: igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ HSW SKL: igt@perf_pmu@busy-idle-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6896/shard-hsw6/igt@perf_pmu@busy-no-semaphores-bcs0.html
A CI Bug Log filter associated to this bug has been updated: {- HSW SKL: igt@perf_pmu@busy-idle-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ HSW SKL: igt@perf_pmu@busy-idle-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6898/shard-skl3/igt@perf_pmu@busy-no-semaphores-vcs0.html
A CI Bug Log filter associated to this bug has been updated: {- HSW SKL: igt@perf_pmu@busy-idle-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ BYT HSW SKL: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_376/fi-byt-j1900/igt@perf_pmu@busy-no-semaphores-bcs0.html
*** Bug 111783 has been marked as a duplicate of this bug. ***
A CI Bug Log filter associated to this bug has been updated: {- BYT HSW SKL: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ BYT HSW SKL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6977/shard-glk8/igt@perf_pmu@busy-idle-no-semaphores-vcs0.html
A CI Bug Log filter associated to this bug has been updated: {- BYT HSW SKL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ BYT HSW SKL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6987/shard-skl5/igt@perf_pmu@busy-no-semaphores-vecs0.html
A CI Bug Log filter associated to this bug has been updated: {- BYT HSW SKL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ BYT HSW APL SKL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6990/shard-apl7/igt@perf_pmu@busy-idle-no-semaphores-vcs0.html
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * BYT: igt@perf_pmu@busy-check-all-vcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_385/fi-byt-j1900/igt@perf_pmu@busy-check-all-vcs0.html
A CI Bug Log filter associated to this bug has been updated: {- BYT HSW APL SKL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ BYT HSW APL SKL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7052/shard-apl1/igt@perf_pmu@busy-no-semaphores-rcs0.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_387/fi-byt-n2820/igt@perf_pmu@busy-hang-rcs0.html
*** Bug 111945 has been marked as a duplicate of this bug. ***
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * SKL: igt@gem_persistent_relocs@forked-faulting-reloc-thrashing - dmesg-warn - WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7060/shard-skl1/igt@gem_persistent_relocs@forked-faulting-reloc-thrashing.html
A CI Bug Log filter associated to this bug has been updated: {- BYT HSW APL SKL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ BYT HSW APL SKL KBL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7084/shard-kbl4/igt@perf_pmu@busy-idle-no-semaphores-vecs0.html
A CI Bug Log filter associated to this bug has been updated: {- BYT HSW APL SKL KBL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ BYT HSW APL SKL KBL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7103/shard-skl7/igt@perf_pmu@semaphore-busy-vecs0.html
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * SKL: igt@gem_ctx_shared@q-smoketest-bsd1- dmesg-warn - WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7120/shard-skl9/igt@gem_ctx_shared@q-smoketest-bsd1.html
A CI Bug Log filter associated to this bug has been updated: {- SKL: igt@gem_ctx_shared@q-smoketest-bsd1- dmesg-warn - WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected -} {+ SKL KBL: All tests- dmesg-warn - WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7135/shard-kbl7/igt@gem_persistent_relocs@forked-interruptible-thrash-inactive.html
*** Bug 112153 has been marked as a duplicate of this bug. ***
Sigh, The following commit has been merged into the locking/core branch of tip: Commit-ID: a0855d24fc22d49cdc25664fb224caee16998683 Gitweb: https://git.kernel.org/tip/a0855d24fc22d49cdc25664fb224caee16998683 Author: Davidlohr Bueso <dave@stgolabs.net> AuthorDate: Thu, 24 Oct 2019 20:36:34 -07:00 Committer: Ingo Molnar <mingo@kernel.org> CommitterDate: Tue, 29 Oct 2019 12:22:52 +01:00 locking/mutex: Complain upon mutex API misuse in IRQ contexts Add warning checks if mutex_trylock() or mutex_unlock() are used in IRQ contexts, under CONFIG_DEBUG_MUTEXES=y. While the mutex rules and semantics are explicitly documented, this allows to expose any abusers and robustifies the whole thing. While trylock and unlock are non-blocking, calling from IRQ context is still forbidden (lock must be within the same context as unlock). Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Link: https://lkml.kernel.org/r/20191025033634.3330-1-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org> next kernel this warning will get louder.
A CI Bug Log filter associated to this bug has been updated: {- BYT HSW APL SKL KBL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ BYT HSW APL SKL KBL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7214/shard-skl7/igt@perf_pmu@semaphore-busy-bcs0.html
A CI Bug Log filter associated to this bug has been updated: {- BYT: igt@perf_pmu@busy-check-all-vcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ BYT: igt@perf_pmu@busy-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_395/fi-byt-j1900/igt@perf_pmu@busy-hang-bcs0.html
A CI Bug Log filter associated to this bug has been updated: {- SKL: igt@gem_persistent_relocs@forked-faulting-reloc-thrashing - dmesg-warn - WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected -} {+ SKL ICL: igt@gem_persistent_relocs@forked-faulting-reloc-thrashing - dmesg-warn - WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected, acquire at: __mutex_unlock_slowpath, holding by: intel_gt_retire_requests_timeout +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5265/shard-iclb3/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrashing.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7280/shard-iclb3/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrashing.html
A CI Bug Log filter associated to this bug has been updated: {- BYT HSW APL SKL KBL GLK: igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -} {+ BYT HSW APL SKL KBL GLK: igt@perf_pmu@* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7315/shard-skl6/igt@perf_pmu@semaphore-busy-vcs0.html
commit c9ad602feabe4271d2adf1bdae5d8b20c2dc84f1 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Nov 14 17:25:35 2019 +0000 drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree As we want to be able to run inside atomic context for retiring the i915_active, and we are no longer allowed to abuse mutex_trylock, split the tree management portion of i915_active.mutex into an irq-safe spinlock. References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Fixes: 274cbf20fd10 ("drm/i915: Push the i915_active.retire into a worker") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191114172535.1116-1-chris@chris-wilson.co.uk will get rid of most of the culprits, but we still have the intel_engine_pm_put() from inside i915_pmu.c to worry about. I might just stop pushing for that to be immediate and always schedule the work.
A CI Bug Log filter associated to this bug has been updated: {- SKL ICL: igt@gem_persistent_relocs@forked-faulting-reloc-thrashing - dmesg-warn - WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected, acquire at: __mutex_unlock_slowpath, holding by: intel_gt_retire_requests_timeout -} {+ BYT SKL ICL: igt@gem_persistent_relocs@forked-faulting-reloc-thrashing - dmesg-warn - WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected, acquire at: __mutex_unlock_slowpath, holding by: intel_gt_retire_requests_timeout +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_407/fi-byt-j1900/igt@gem_persistent_relocs@forked-faulting-reloc-thrashing.html
commit 07779a76ee1f93f930cf697b22be73d16e14f50c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 20 12:54:33 2019 +0000 drm/i915: Mark up the calling context for intel_wakeref_put() Previously, we assumed we could use mutex_trylock() within an atomic context, falling back to a worker if contended. However, such trickery is illegal inside interrupt context, and so we need to always use a worker under such circumstances. As we normally are in process context, we can typically use a plain mutex, and only defer to a work when we know we are being called from an interrupt path. Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk Good things come to those wait, or so I am told.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.