Bug 111945

Summary: [CI][SHARDS] igt@gem_ctx_switch@queue-heavy|igt@gem_exec_flush@basic-wb-prw-default - dmesg-warn - WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
Product: DRI Reporter: Lakshmi <lakshminarayana.vudum>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: low CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: KBL, SKL i915 features: GEM/Other

Description Lakshmi 2019-10-09 19:09:33 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7033/shard-kbl7/igt@gem_ctx_switch@queue-heavy.html
<6> [1065.900423] Console: switching to colour dummy device 80x25
<6> [1065.900479] [IGT] gem_ctx_switch: executing
<5> [1065.903630] Setting dangerous option reset - tainting kernel
<6> [1065.912493] [IGT] gem_ctx_switch: starting subtest queue-heavy
<4> [1087.386382] 
<4> [1087.386392] =====================================================
<4> [1087.386403] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
<4> [1087.386416] 5.4.0-rc2-CI-CI_DRM_7033+ #1 Tainted: G     U           
<4> [1087.386429] -----------------------------------------------------
<4> [1087.386444] kworker/2:3/423 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
<4> [1087.386460] ffff88826f4250c8 (&(&lock->wait_lock)->rlock){+.+.}, at: __mutex_unlock_slowpath+0x18e/0x2b0
<4> [1087.386488] 
and this task is already holding:
<4> [1087.386500] ffff88825875c298 (&(&timelines->lock)->rlock){-...}, at: intel_gt_retire_requests_timeout+0x15c/0x520 [i915]
<4> [1087.386655] which would create a new lock dependency:
<4> [1087.386662]  (&(&timelines->lock)->rlock){-...} -> (&(&lock->wait_lock)->rlock){+.+.}
<4> [1087.386677] 
but this new dependency connects a HARDIRQ-irq-safe lock:
<4> [1087.386692]  (&(&timelines->lock)->rlock){-...}
<4> [1087.386695] 
... which became HARDIRQ-irq-safe at:
<4> [1087.386724]   lock_acquire+0xa7/0x1c0
<4> [1087.386739]   _raw_spin_lock_irqsave+0x33/0x50
<4> [1087.386876]   intel_timeline_enter+0x64/0x150 [i915]
<4> [1087.387000]   __engine_park+0x1db/0x400 [i915]
<4> [1087.387120]   ____intel_wakeref_put_last+0x1c/0x70 [i915]
<4> [1087.387234]   i915_sample+0x2de/0x300 [i915]
<4> [1087.387249]   __hrtimer_run_queues+0x121/0x4a0
<4> [1087.387262]   hrtimer_interrupt+0xea/0x250
<4> [1087.387276]   smp_apic_timer_interrupt+0x96/0x280
<4> [1087.387289]   apic_timer_interrupt+0xf/0x20
<4> [1087.387303]   cpuidle_enter_state+0xb2/0x450
<4> [1087.387315]   cpuidle_enter+0x24/0x40
<4> [1087.387326]   do_idle+0x1e7/0x250
<4> [1087.387336]   cpu_startup_entry+0x14/0x20
<4> [1087.387347]   start_kernel+0x4d2/0x4f4
<4> [1087.387357]   secondary_startup_64+0xa4/0xb0
<4> [1087.387368] 
to a HARDIRQ-irq-unsafe lock:
<4> [1087.387381]  (&(&lock->wait_lock)->rlock){+.+.}
<4> [1087.387384] 
... which became HARDIRQ-irq-unsafe at:
<4> [1087.387410] ...
<4> [1087.387416]   lock_acquire+0xa7/0x1c0
<4> [1087.387434]   _raw_spin_lock+0x2a/0x40
<4> [1087.387447]   __mutex_lock+0x198/0x9d0
<4> [1087.387461]   pipe_wait+0x8f/0xc0
<4> [1087.387470]   pipe_read+0x235/0x310
<4> [1087.387480]   new_sync_read+0x10f/0x1a0
<4> [1087.387490]   vfs_read+0x96/0x160
<4> [1087.387497]   ksys_read+0x9f/0xe0
<4> [1087.387509]   do_syscall_64+0x4f/0x210
<4> [1087.387523]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [1087.387535] 
other info that might help us debug this:

<4> [1087.387553]  Possible interrupt unsafe locking scenario:

<4> [1087.387568]        CPU0                    CPU1
<4> [1087.387580]        ----                    ----
<4> [1087.387590]   lock(&(&lock->wait_lock)->rlock);
<4> [1087.387601]                                local_irq_disable();
<4> [1087.387610]                                lock(&(&timelines->lock)->rlock);
<4> [1087.387621]                                lock(&(&lock->wait_lock)->rlock);
<4> [1087.387632]   <Interrupt>
<4> [1087.387637]     lock(&(&timelines->lock)->rlock);
<4> [1087.387646] 
 *** DEADLOCK ***

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7039/shard-skl3/igt@gem_exec_flush@basic-wb-prw-default.html
<4> [2139.620735] =====================================================
<4> [2139.620762] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
<4> [2139.620795] 5.4.0-rc2-CI-CI_DRM_7039+ #1 Tainted: G     U           
<4> [2139.620828] -----------------------------------------------------
<4> [2139.620857] kworker/1:0/5423 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
<4> [2139.620882] ffff888176b25e08 (&(&lock->wait_lock)->rlock){+.+.}, at: __mutex_unlock_slowpath+0xa6/0x2b0
<4> [2139.620936] 
and this task is already holding:
<4> [2139.620958] ffff88816d5fc288 (&(&timelines->lock)->rlock){-...}, at: intel_gt_retire_requests_timeout+0x17d/0x540 [i915]
<4> [2139.621273] which would create a new lock dependency:
<4> [2139.621291]  (&(&timelines->lock)->rlock){-...} -> (&(&lock->wait_lock)->rlock){+.+.}
<4> [2139.621328] 
but this new dependency connects a HARDIRQ-irq-safe lock:
<4> [2139.621354]  (&(&timelines->lock)->rlock){-...}
<4> [2139.621360] 
... which became HARDIRQ-irq-safe at:
<4> [2139.621410]   lock_acquire+0xa7/0x1c0
<4> [2139.621436]   _raw_spin_lock_irqsave+0x33/0x50
<4> [2139.621732]   intel_timeline_enter+0x64/0x150 [i915]
<4> [2139.622007]   __engine_park+0x1db/0x400 [i915]
<4> [2139.622258]   ____intel_wakeref_put_last+0x1c/0x70 [i915]
<4> [2139.622513]   i915_sample+0x2de/0x300 [i915]
<4> [2139.622539]   __hrtimer_run_queues+0x121/0x4a0
<4> [2139.622562]   hrtimer_interrupt+0xea/0x250
<4> [2139.622586]   smp_apic_timer_interrupt+0x96/0x280
<4> [2139.622610]   apic_timer_interrupt+0xf/0x20
<4> [2139.622634]   cpuidle_enter_state+0xb2/0x450
<4> [2139.622656]   cpuidle_enter+0x24/0x40
<4> [2139.622682]   do_idle+0x1e7/0x250
<4> [2139.622706]   cpu_startup_entry+0x14/0x20
<4> [2139.622735]   start_kernel+0x4d2/0x4f4
<4> [2139.622757]   secondary_startup_64+0xa4/0xb0
<4> [2139.622775] 
to a HARDIRQ-irq-unsafe lock:
<4> [2139.622806]  (&(&lock->wait_lock)->rlock){+.+.}
<4> [2139.622812] 
... which became HARDIRQ-irq-unsafe at:
<4> [2139.622852] ...
<4> [2139.622864]   lock_acquire+0xa7/0x1c0
<4> [2139.622897]   _raw_spin_lock+0x2a/0x40
<4> [2139.622920]   __mutex_lock+0x198/0x9d0
<4> [2139.622943]   hub_port_init+0x70/0xcd0
<4> [2139.622965]   hub_event+0x797/0x16d0
<4> [2139.622987]   process_one_work+0x26a/0x620
<4> [2139.623009]   worker_thread+0x37/0x380
<4> [2139.623033]   kthread+0x119/0x130
<4> [2139.623056]   ret_from_fork+0x3a/0x50
<4> [2139.623072] 
other info that might help us debug this:

<4> [2139.623102]  Possible interrupt unsafe locking scenario:

<4> [2139.623127]        CPU0                    CPU1
<4> [2139.623145]        ----                    ----
<4> [2139.623163]   lock(&(&lock->wait_lock)->rlock);
<4> [2139.623185]                                local_irq_disable();
<4> [2139.623206]                                lock(&(&timelines->lock)->rlock);
<4> [2139.623234]                                lock(&(&lock->wait_lock)->rlock);
<4> [2139.623261]   <Interrupt>
<4> [2139.623274]     lock(&(&timelines->lock)->rlock);
<4> [2139.623296] 
 *** DEADLOCK ***
Comment 1 CI Bug Log 2019-10-09 19:11:49 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SKL KBL: igt@gem_ctx_switch@queue-heavy|igt@gem_exec_flush@basic-wb-prw-default - dmesg-warn - WARNING: HARDIRQ-safe -&gt; HARDIRQ-unsafe lock order detected
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7033/shard-kbl7/igt@gem_ctx_switch@queue-heavy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7039/shard-skl3/igt@gem_exec_flush@basic-wb-prw-default.html
Comment 2 Chris Wilson 2019-10-09 19:29:32 UTC
It's the same annoying problem as before, mutex_unlock() from inside irq-context upsets lockdep if it woke up a waiter.
Comment 3 Chris Wilson 2019-10-11 10:26:44 UTC

*** This bug has been marked as a duplicate of bug 111626 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.