Bug 111626 - [CI][SHARDS] igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected
Summary: [CI][SHARDS] igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNIN...
Status: REOPENED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: not set not set
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 111783 111945 112153 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-09-10 06:45 UTC by Lakshmi
Modified: 2019-11-14 23:52 UTC (History)
1 user (show)

See Also:
i915 platform: BXT, GLK, HSW, ICL, KBL, SKL
i915 features: GEM/Other, Perf/PMU


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lakshmi 2019-09-10 06:45:39 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6847/shard-skl5/igt@perf_pmu@busy-idle-no-semaphores-bcs0.html

[IGT] perf_pmu: executing
<6> [1794.462364] [IGT] perf_pmu: starting subtest busy-idle-no-semaphores-bcs0
<4> [1795.080225] 
<4> [1795.080255] ========================================================
<4> [1795.080277] WARNING: possible irq lock inversion dependency detected
<4> [1795.080299] 5.3.0-rc7-CI-CI_DRM_6847+ #1 Tainted: G     U           
<4> [1795.080316] --------------------------------------------------------
<4> [1795.080334] kworker/0:0H/3056 just changed the state of lock:
<4> [1795.080352] 0000000062a190ff (&timeline->mutex/2){-...}, at: __engine_park+0x3e/0x320 [i915]
<4> [1795.080638] but this lock took another, HARDIRQ-unsafe lock in the past:
<4> [1795.080655]  (&(&lock->wait_lock)->rlock){+.+.}
<4> [1795.080660] 

and interrupts could create inverse lock ordering between them.

<4> [1795.080698] 
other info that might help us debug this:
<4> [1795.080716] Chain exists of:
  &timeline->mutex/2 --> &(&timelines->lock)->rlock --> &(&lock->wait_lock)->rlock

<4> [1795.080755]  Possible interrupt unsafe locking scenario:

<4> [1795.080774]        CPU0                    CPU1
<4> [1795.080788]        ----                    ----
<4> [1795.080802]   lock(&(&lock->wait_lock)->rlock);
<4> [1795.080819]                                local_irq_disable();
<4> [1795.080835]                                lock(&timeline->mutex/2);
<4> [1795.080855]                                lock(&(&timelines->lock)->rlock);
<4> [1795.080876]   <Interrupt>
<4> [1795.080886]     lock(&timeline->mutex/2);
<4> [1795.080904] 
 *** DEADLOCK ***
Comment 1 CI Bug Log 2019-09-10 06:46:50 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SKL:  igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6847/shard-skl5/igt@perf_pmu@busy-idle-no-semaphores-bcs0.html
Comment 2 Chris Wilson 2019-09-10 08:29:42 UTC
That looks to be a usb lockchain, if I am reading that correctly.
Comment 3 CI Bug Log 2019-09-13 08:43:58 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL:  igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ SKL:  igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6878/shard-skl4/igt@perf_pmu@busy-idle-no-semaphores-vecs0.html
Comment 4 Tvrtko Ursulin 2019-09-13 13:53:48 UTC
What about intel_engine_pm_put from i915_pmu.c/engines_sample (hrtimer)? Do we need to hold an engine pm ref for the duration of the event being enabled? However those hooks are also potentially from irqoff context...
Comment 5 Tvrtko Ursulin 2019-09-13 14:01:26 UTC
I was confused by mutex under hardirq but looks that's deliberate from __timeline_mark_lock (d67739268cf0ee928cdc5e8224090c59efacd653).
Comment 6 Chris Wilson 2019-09-13 14:03:41 UTC
Yes, we need a runtime-pm reference, and we use the active engine-pm as that also tells if it was idle and we can skip. The engine-pm put(wakeref) should be irqsafe for this purpose.

I don't know what the lock->wait_lock is -- it doesn't look like one of ours, which makes it more baffling how it got coupled into our wakeref.
Comment 7 CI Bug Log 2019-09-16 10:10:07 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL:  igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ HSW SKL:  igt@perf_pmu@busy-idle-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6896/shard-hsw6/igt@perf_pmu@busy-no-semaphores-bcs0.html
Comment 8 CI Bug Log 2019-09-17 09:20:41 UTC
A CI Bug Log filter associated to this bug has been updated:

{- HSW SKL:  igt@perf_pmu@busy-idle-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ HSW SKL:  igt@perf_pmu@busy-idle-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6898/shard-skl3/igt@perf_pmu@busy-no-semaphores-vcs0.html
Comment 9 CI Bug Log 2019-09-23 13:21:12 UTC
A CI Bug Log filter associated to this bug has been updated:

{- HSW SKL:  igt@perf_pmu@busy-idle-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ BYT HSW SKL:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_376/fi-byt-j1900/igt@perf_pmu@busy-no-semaphores-bcs0.html
Comment 10 Chris Wilson 2019-09-23 13:32:53 UTC
*** Bug 111783 has been marked as a duplicate of this bug. ***
Comment 11 CI Bug Log 2019-10-01 10:20:10 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW SKL:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ BYT HSW SKL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6977/shard-glk8/igt@perf_pmu@busy-idle-no-semaphores-vcs0.html
Comment 12 CI Bug Log 2019-10-03 08:41:20 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW SKL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ BYT HSW SKL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6987/shard-skl5/igt@perf_pmu@busy-no-semaphores-vecs0.html
Comment 13 CI Bug Log 2019-10-03 12:51:37 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW SKL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ BYT HSW  APL SKL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6990/shard-apl7/igt@perf_pmu@busy-idle-no-semaphores-vcs0.html
Comment 14 CI Bug Log 2019-10-09 17:39:43 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* BYT: igt@perf_pmu@busy-check-all-vcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_385/fi-byt-j1900/igt@perf_pmu@busy-check-all-vcs0.html
Comment 15 CI Bug Log 2019-10-11 10:03:38 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW  APL SKL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ BYT HSW  APL SKL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7052/shard-apl1/igt@perf_pmu@busy-no-semaphores-rcs0.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_387/fi-byt-n2820/igt@perf_pmu@busy-hang-rcs0.html
Comment 16 Chris Wilson 2019-10-11 10:26:44 UTC
*** Bug 111945 has been marked as a duplicate of this bug. ***
Comment 17 CI Bug Log 2019-10-14 12:51:11 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SKL: igt@gem_persistent_relocs@forked-faulting-reloc-thrashing - dmesg-warn - WARNING: HARDIRQ-safe -&gt; HARDIRQ-unsafe lock order detected
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7060/shard-skl1/igt@gem_persistent_relocs@forked-faulting-reloc-thrashing.html
Comment 18 CI Bug Log 2019-10-17 13:16:03 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW  APL SKL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ BYT HSW  APL SKL KBL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7084/shard-kbl4/igt@perf_pmu@busy-idle-no-semaphores-vecs0.html
Comment 19 CI Bug Log 2019-10-18 12:56:58 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW  APL SKL KBL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ BYT HSW  APL SKL KBL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7103/shard-skl7/igt@perf_pmu@semaphore-busy-vecs0.html
Comment 20 CI Bug Log 2019-10-18 13:47:26 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SKL: igt@gem_ctx_shared@q-smoketest-bsd1- dmesg-warn - WARNING: HARDIRQ-safe -&gt; HARDIRQ-unsafe lock order detected
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7120/shard-skl9/igt@gem_ctx_shared@q-smoketest-bsd1.html
Comment 21 CI Bug Log 2019-10-22 10:03:37 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL: igt@gem_ctx_shared@q-smoketest-bsd1- dmesg-warn - WARNING: HARDIRQ-safe -&gt; HARDIRQ-unsafe lock order detected -}
{+ SKL KBL: All tests- dmesg-warn - WARNING: HARDIRQ-safe -&gt; HARDIRQ-unsafe lock order detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7135/shard-kbl7/igt@gem_persistent_relocs@forked-interruptible-thrash-inactive.html
Comment 22 Chris Wilson 2019-10-28 14:21:15 UTC
*** Bug 112153 has been marked as a duplicate of this bug. ***
Comment 23 Chris Wilson 2019-10-30 10:08:51 UTC
Sigh,

The following commit has been merged into the locking/core branch of tip:
  
  Commit-ID:     a0855d24fc22d49cdc25664fb224caee16998683
  Gitweb:        https://git.kernel.org/tip/a0855d24fc22d49cdc25664fb224caee16998683
  Author:        Davidlohr Bueso <dave@stgolabs.net>
  AuthorDate:    Thu, 24 Oct 2019 20:36:34 -07:00
  Committer:     Ingo Molnar <mingo@kernel.org>
  CommitterDate: Tue, 29 Oct 2019 12:22:52 +01:00
  
  locking/mutex: Complain upon mutex API misuse in IRQ contexts
  
  Add warning checks if mutex_trylock() or mutex_unlock() are used in
  IRQ contexts, under CONFIG_DEBUG_MUTEXES=y.
  
  While the mutex rules and semantics are explicitly documented, this allows
  to expose any abusers and robustifies the whole thing.
  
  While trylock and unlock are non-blocking, calling from IRQ context
  is still forbidden (lock must be within the same context as unlock).
  
  Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
  Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
  Cc: Linus Torvalds <torvalds@linux-foundation.org>
  Cc: Peter Zijlstra <peterz@infradead.org>
  Cc: Thomas Gleixner <tglx@linutronix.de>
  Cc: dave@stgolabs.net
  Link: https://lkml.kernel.org/r/20191025033634.3330-1-dave@stgolabs.net
  Signed-off-by: Ingo Molnar <mingo@kernel.org>

next kernel this warning will get louder.
Comment 24 CI Bug Log 2019-10-30 15:25:14 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW  APL SKL KBL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ BYT HSW  APL SKL KBL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7214/shard-skl7/igt@perf_pmu@semaphore-busy-bcs0.html
Comment 25 CI Bug Log 2019-10-31 12:10:33 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT: igt@perf_pmu@busy-check-all-vcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ BYT: igt@perf_pmu@busy-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_395/fi-byt-j1900/igt@perf_pmu@busy-hang-bcs0.html
Comment 26 CI Bug Log 2019-11-11 11:57:32 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL: igt@gem_persistent_relocs@forked-faulting-reloc-thrashing - dmesg-warn - WARNING: HARDIRQ-safe -&gt; HARDIRQ-unsafe lock order detected -}
{+ SKL ICL: igt@gem_persistent_relocs@forked-faulting-reloc-thrashing - dmesg-warn - WARNING: HARDIRQ-safe -&gt; HARDIRQ-unsafe lock order detected, acquire at: __mutex_unlock_slowpath, holding by: intel_gt_retire_requests_timeout +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5265/shard-iclb3/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrashing.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7280/shard-iclb3/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrashing.html
Comment 27 CI Bug Log 2019-11-13 11:26:29 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW  APL SKL KBL GLK:  igt@perf_pmu@busy-(idle)?-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ BYT HSW  APL SKL KBL GLK:  igt@perf_pmu@* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7315/shard-skl6/igt@perf_pmu@semaphore-busy-vcs0.html
Comment 28 Chris Wilson 2019-11-14 23:52:32 UTC
commit c9ad602feabe4271d2adf1bdae5d8b20c2dc84f1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Nov 14 17:25:35 2019 +0000

    drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree
    
    As we want to be able to run inside atomic context for retiring the
    i915_active, and we are no longer allowed to abuse mutex_trylock, split
    the tree management portion of i915_active.mutex into an irq-safe
    spinlock.
    
    References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts")
    References: https://bugs.freedesktop.org/show_bug.cgi?id=111626
    Fixes: 274cbf20fd10 ("drm/i915: Push the i915_active.retire into a worker")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20191114172535.1116-1-chris@chris-wilson.co.uk

will get rid of most of the culprits, but we still have the intel_engine_pm_put() from inside i915_pmu.c to worry about. I might just stop pushing for that to be immediate and always schedule the work.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.