https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_261/fi-pnv-d510/igt@gem_userptr_blits@coherency-sync.html <6> [138.626463] Console: switching to colour dummy device 80x25 <6> [138.626818] [IGT] gem_userptr_blits: executing <6> [138.696844] [IGT] gem_userptr_blits: starting subtest coherency-sync <6> [138.698192] gem_userptr_bli (2136): drop_caches: 4 <4> [162.324467] <4> [162.324494] ====================================================== <4> [162.324509] WARNING: possible circular locking dependency detected <4> [162.324527] 5.1.0-rc5-g9b6a59cae931-drmtip_261+ #1 Tainted: G U <4> [162.324543] ------------------------------------------------------ <4> [162.324557] kswapd0/50 is trying to acquire lock: <4> [162.324572] 00000000cdcc63cb (&dev->struct_mutex/1){+.+.}, at: userptr_mn_invalidate_range_start+0x173/0x270 [i915] <4> [162.324779] but task is already holding lock: <4> [162.324794] 000000005d3bddaa (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe6/0x2a0 <4> [162.324822] which lock already depends on the new lock. <4> [162.324839] the existing dependency chain (in reverse order) is: <4> [162.324854] -> #2 (&anon_vma->rwsem){++++}: <4> [162.324875] down_write+0x33/0x60 <4> [162.324888] __vma_adjust+0x390/0x6c0 <4> [162.324904] __split_vma+0x16a/0x180 <4> [162.324918] mprotect_fixup+0x2a5/0x320 <4> [162.324932] do_mprotect_pkey+0x208/0x2e0 <4> [162.324947] __x64_sys_mprotect+0x16/0x20 <4> [162.324962] do_syscall_64+0x55/0x190 <4> [162.324977] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [162.324991] -> #1 (&mapping->i_mmap_rwsem){++++}: <4> [162.325012] down_write+0x33/0x60 <4> [162.325027] unmap_mapping_pages+0x48/0x130 <4> [162.325215] i915_vma_revoke_mmap+0x7e/0x1c0 [i915] <4> [162.325237] i915_vma_unbind+0xbb/0x550 [i915] <4> [162.325237] i915_gem_object_unbind+0xfa/0x190 [i915] <4> [162.325237] i915_gem_shrink+0x2dc/0x590 [i915] <4> [162.325237] i915_gem_shrink_all+0x2c/0x50 [i915] <4> [162.325237] i915_drop_caches_set+0x1b6/0x270 [i915] <4> [162.325237] simple_attr_write+0xb0/0xd0 <4> [162.325237] full_proxy_write+0x51/0x80 <4> [162.325237] vfs_write+0xbd/0x1b0 <4> [162.325237] ksys_write+0x55/0xe0 <4> [162.325237] do_syscall_64+0x55/0x190 <4> [162.325237] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [162.325237] -> #0 (&dev->struct_mutex/1){+.+.}: <4> [162.325237] lock_acquire+0xa6/0x1c0 <4> [162.325237] __mutex_lock+0x8c/0x960 <4> [162.325237] userptr_mn_invalidate_range_start+0x173/0x270 [i915] <4> [162.325237] __mmu_notifier_invalidate_range_start+0x84/0x110 <4> [162.325237] try_to_unmap_one+0x747/0x840 <4> [162.325237] rmap_walk_anon+0x104/0x280 <4> [162.325237] try_to_unmap+0xc0/0xf0 <4> [162.325237] shrink_page_list+0x5ce/0xcb0 <4> [162.325237] shrink_inactive_list+0x331/0x710 <4> [162.325237] shrink_node_memcg+0x37b/0x770 <4> [162.325237] shrink_node+0xc9/0x460 <4> [162.325237] balance_pgdat+0x239/0x580 <4> [162.325237] kswapd+0x186/0x570 <4> [162.325237] kthread+0x119/0x130 <4> [162.325237] ret_from_fork+0x24/0x50 <4> [162.325237] other info that might help us debug this: <4> [162.325237] Chain exists of: &dev->struct_mutex/1 --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem <4> [162.325237] Possible unsafe locking scenario: <4> [162.325237] CPU0 CPU1 <4> [162.325237] ---- ---- <4> [162.325237] lock(&anon_vma->rwsem); <4> [162.325237] lock(&mapping->i_mmap_rwsem); <4> [162.325237] lock(&anon_vma->rwsem); <4> [162.325237] lock(&dev->struct_mutex/1); <4> [162.325237] *** DEADLOCK ***
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * PNV: igt@gem_userptr_blits@coherency-sync - dmesg-warn - WARNING: possible circular locking dependency detected (No new failures associated)
i915_vma_revoke_mmap() removes the user GGTT mmap and so should not callback (and lock) via the userptr mmu-notifier. However, since we have used the same lockclass, lockdep thinks it might. Quick and dirty fix, give userptr it's own struct_mutex subclass. I whither at just the though of Tvrtko's scrutiny over such a hack.
Given that the bug was reported 3 months ago, and CI says "no occurrences", I'm closing this. Chris, any chance that the recent changes around struct_mutex have fixed this?
No new failures are under this bug from the time this bug has been created, which was 3 months ago. Closing this bug as WORKSFORME.
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.