Bug 111626 - [CI][SHARDS] igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected
Summary: [CI][SHARDS] igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNIN...
Status: RESOLVED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: not set not set
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-10 06:45 UTC by Lakshmi
Modified: 2019-09-17 09:20 UTC (History)
1 user (show)

See Also:
i915 platform: SKL
i915 features: Perf/PMU


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lakshmi 2019-09-10 06:45:39 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6847/shard-skl5/igt@perf_pmu@busy-idle-no-semaphores-bcs0.html

[IGT] perf_pmu: executing
<6> [1794.462364] [IGT] perf_pmu: starting subtest busy-idle-no-semaphores-bcs0
<4> [1795.080225] 
<4> [1795.080255] ========================================================
<4> [1795.080277] WARNING: possible irq lock inversion dependency detected
<4> [1795.080299] 5.3.0-rc7-CI-CI_DRM_6847+ #1 Tainted: G     U           
<4> [1795.080316] --------------------------------------------------------
<4> [1795.080334] kworker/0:0H/3056 just changed the state of lock:
<4> [1795.080352] 0000000062a190ff (&timeline->mutex/2){-...}, at: __engine_park+0x3e/0x320 [i915]
<4> [1795.080638] but this lock took another, HARDIRQ-unsafe lock in the past:
<4> [1795.080655]  (&(&lock->wait_lock)->rlock){+.+.}
<4> [1795.080660] 

and interrupts could create inverse lock ordering between them.

<4> [1795.080698] 
other info that might help us debug this:
<4> [1795.080716] Chain exists of:
  &timeline->mutex/2 --> &(&timelines->lock)->rlock --> &(&lock->wait_lock)->rlock

<4> [1795.080755]  Possible interrupt unsafe locking scenario:

<4> [1795.080774]        CPU0                    CPU1
<4> [1795.080788]        ----                    ----
<4> [1795.080802]   lock(&(&lock->wait_lock)->rlock);
<4> [1795.080819]                                local_irq_disable();
<4> [1795.080835]                                lock(&timeline->mutex/2);
<4> [1795.080855]                                lock(&(&timelines->lock)->rlock);
<4> [1795.080876]   <Interrupt>
<4> [1795.080886]     lock(&timeline->mutex/2);
<4> [1795.080904] 
 *** DEADLOCK ***
Comment 1 CI Bug Log 2019-09-10 06:46:50 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SKL:  igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6847/shard-skl5/igt@perf_pmu@busy-idle-no-semaphores-bcs0.html
Comment 2 Chris Wilson 2019-09-10 08:29:42 UTC
That looks to be a usb lockchain, if I am reading that correctly.
Comment 3 CI Bug Log 2019-09-13 08:43:58 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL:  igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ SKL:  igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6878/shard-skl4/igt@perf_pmu@busy-idle-no-semaphores-vecs0.html
Comment 4 Tvrtko Ursulin 2019-09-13 13:53:48 UTC
What about intel_engine_pm_put from i915_pmu.c/engines_sample (hrtimer)? Do we need to hold an engine pm ref for the duration of the event being enabled? However those hooks are also potentially from irqoff context...
Comment 5 Tvrtko Ursulin 2019-09-13 14:01:26 UTC
I was confused by mutex under hardirq but looks that's deliberate from __timeline_mark_lock (d67739268cf0ee928cdc5e8224090c59efacd653).
Comment 6 Chris Wilson 2019-09-13 14:03:41 UTC
Yes, we need a runtime-pm reference, and we use the active engine-pm as that also tells if it was idle and we can skip. The engine-pm put(wakeref) should be irqsafe for this purpose.

I don't know what the lock->wait_lock is -- it doesn't look like one of ours, which makes it more baffling how it got coupled into our wakeref.
Comment 7 CI Bug Log 2019-09-16 10:10:07 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL:  igt@perf_pmu@busy-idle-no-semaphores-bcs0 - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ HSW SKL:  igt@perf_pmu@busy-idle-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6896/shard-hsw6/igt@perf_pmu@busy-no-semaphores-bcs0.html
Comment 8 CI Bug Log 2019-09-17 09:20:41 UTC
A CI Bug Log filter associated to this bug has been updated:

{- HSW SKL:  igt@perf_pmu@busy-idle-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected -}
{+ HSW SKL:  igt@perf_pmu@busy-idle-no-semaphores-* - dmesg-warn - WARNING: possible irq lock inversion dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6898/shard-skl3/igt@perf_pmu@busy-no-semaphores-vcs0.html


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.