https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_126/fi-pnv-d510/igt@gem_userptr_blits@coherency-sync.html <6> [387.029820] [IGT] gem_userptr_blits: starting subtest coherency-sync <4> [419.324985] <4> [419.325012] ====================================================== <4> [419.325036] WARNING: possible circular locking dependency detected <4> [419.325065] 4.19.0-rc7-gdec9886eff39-drmtip_126+ #1 Tainted: G U <4> [419.325090] ------------------------------------------------------ <4> [419.325114] kworker/u8:3/113 is trying to acquire lock: <4> [419.325141] 000000004f6d1966 (&obj->mm.lock){+.+.}, at: cancel_userptr+0x19/0xf0 [i915] <4> [419.325542] \x0abut task is already holding lock: <4> [419.325588] 00000000e3332cfd ((work_completion)(&mo->work)){+.+.}, at: process_one_work+0x1bf/0x610 <4> [419.325588] \x0awhich lock already depends on the new lock.\x0a <4> [419.325588] \x0athe existing dependency chain (in reverse order) is: <4> [419.325588] \x0a-> #5 ((work_completion)(&mo->work)){+.+.}: <4> [419.325762] worker_thread+0x37/0x380 <4> [419.325762] kthread+0x119/0x130 <4> [419.325762] ret_from_fork+0x24/0x50 <4> [419.325762] \x0a-> #4 ((wq_completion)"i915-userptr-release"){+.+.}: <4> [419.325762] i915_gem_userptr_mn_invalidate_range_start+0x1bb/0x1e0 [i915] <4> [419.325762] __mmu_notifier_invalidate_range_start+0xa6/0x120 <4> [419.325762] try_to_unmap_one+0x706/0x800 <4> [419.325762] rmap_walk_anon+0x101/0x270 <4> [419.325762] try_to_unmap+0xbe/0xf0 <4> [419.325762] shrink_page_list+0x60e/0xcc0 <4> [419.325762] shrink_inactive_list+0x36d/0x740 <4> [419.325762] shrink_node_memcg+0x368/0x740 <4> [419.325762] shrink_node+0xc9/0x450 <4> [419.325762] kswapd+0x2f8/0x8f0 <4> [419.325762] kthread+0x119/0x130 <4> [419.325762] ret_from_fork+0x24/0x50 <4> [419.325762] \x0a-> #3 (&anon_vma->rwsem){++++}: <4> [419.325762] __vma_adjust+0x390/0x6c0 <4> [419.326467] __split_vma+0x16a/0x180 <4> [419.326467] mprotect_fixup+0x2a5/0x320 <4> [419.326467] do_mprotect_pkey+0x208/0x2e0 <4> [419.326467] __x64_sys_mprotect+0x16/0x20 <4> [419.326467] do_syscall_64+0x55/0x190 <4> [419.326467] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [419.326467] \x0a-> #2 (&mapping->i_mmap_rwsem){++++}: <4> [419.326467] rmap_walk_file+0x20f/0x250 <4> [419.326467] page_referenced+0x14a/0x170 <4> [419.326467] shrink_active_list+0x216/0x510 <4> [419.326467] shrink_node_memcg+0x226/0x740 <4> [419.326467] shrink_node+0xc9/0x450 <4> [419.326467] kswapd+0x2f8/0x8f0 <4> [419.326467] kthread+0x119/0x130 <4> [419.326467] ret_from_fork+0x24/0x50 <4> [419.326467] \x0a-> #1 (fs_reclaim){+.+.}: <4> [419.326467] kmem_cache_alloc_trace+0x2a/0x290 <4> [419.326467] i915_gem_object_get_pages_stolen+0xa2/0x130 [i915] <4> [419.326467] ____i915_gem_object_get_pages+0x1d/0xa0 [i915] <4> [419.326467] __i915_gem_object_get_pages+0x59/0xb0 [i915] <4> [419.326467] _i915_gem_object_create_stolen+0xd6/0x100 [i915] <4> [419.326467] i915_gem_object_create_stolen+0x67/0xb0 [i915] <4> [419.326467] intel_engine_create_ring+0xdc/0x2b0 [i915] <4> [419.326467] intel_init_ring_buffer+0x38/0x130 [i915] <4> [419.326467] intel_engines_init+0x4f/0x120 [i915] <4> [419.326467] i915_gem_init+0x244/0x850 [i915] <4> [419.329470] i915_driver_load+0xc81/0x1520 [i915] <4> [419.329470] i915_pci_probe+0x29/0xa0 [i915] <4> [419.329470] pci_device_probe+0xa1/0x130 <4> [419.329470] really_probe+0x25d/0x3c0 <4> [419.329470] driver_probe_device+0x10a/0x120 <4> [419.329470] __driver_attach+0xdb/0x100 <4> [419.329470] bus_for_each_dev+0x74/0xc0 <4> [419.329470] bus_add_driver+0x15f/0x250 <4> [419.329470] driver_register+0x56/0xe0 <4> [419.329470] do_one_initcall+0x58/0x2e0 <4> [419.329470] do_init_module+0x56/0x1ea <4> [419.329470] load_module+0x26f5/0x29d0 <4> [419.329470] __se_sys_finit_module+0xd3/0xf0 <4> [419.329470] do_syscall_64+0x55/0x190 <4> [419.329470] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [419.329470] \x0a-> #0 (&obj->mm.lock){+.+.}: <4> [419.329470] __mutex_lock+0x89/0x970 <4> [419.329470] cancel_userptr+0x19/0xf0 [i915] <4> [419.330583] process_one_work+0x245/0x610 <4> [419.330583] worker_thread+0x37/0x380 <4> [419.330583] kthread+0x119/0x130 <4> [419.330583] ret_from_fork+0x24/0x50 <4> [419.330583] \x0aother info that might help us debug this:\x0a <4> [419.330583] Chain exists of:\x0a &obj->mm.lock --> (wq_completion)"i915-userptr-release" --> (work_completion)(&mo->work)\x0a <4> [419.330583] Possible unsafe locking scenario:\x0a <4> [419.330583] CPU0 CPU1 <4> [419.330583] ---- ---- <4> [419.330583] lock((work_completion)(&mo->work)); <4> [419.330583] lock((wq_completion)"i915-userptr-release"); <4> [419.330583] lock((work_completion)(&mo->work)); <4> [419.330583] lock(&obj->mm.lock); <4> [419.330583] \x0a *** DEADLOCK ***\x0a <4> [419.330583] 2 locks held by kworker/u8:3/113: <4> [419.330583] #0: 00000000735f844e ((wq_completion)"i915-userptr-release"){+.+.}, at: process_one_work+0x1bf/0x610 <4> [419.330583] #1: 00000000e3332cfd ((work_completion)(&mo->work)){+.+.}, at: process_one_work+0x1bf/0x610 <4> [419.330583] \x0astack backtrace: <4> [419.330583] CPU: 3 PID: 113 Comm: kworker/u8:3 Tainted: G U 4.19.0-rc7-gdec9886eff39-drmtip_126+ #1 <4> [419.330583] Hardware name: /D510MO, BIOS MOPNV10J.86A.0311.2010.0802.2346 08/02/2010 <4> [419.330583] Workqueue: i915-userptr-release cancel_userptr [i915] <4> [419.330583] Call Trace: <4> [419.330583] dump_stack+0x67/0x9b <4> [419.330583] print_circular_bug.isra.16+0x1c8/0x2b0 <4> [419.330583] __lock_acquire+0x1897/0x1b50 <4> [419.330583] ? preempt_count_add+0x44/0x90 <4> [419.330583] ? mark_held_locks+0x50/0x80 <4> [419.330583] ? lock_acquire+0xa6/0x1c0 <4> [419.330583] lock_acquire+0xa6/0x1c0 <4> [419.330583] ? cancel_userptr+0x19/0xf0 [i915] <4> [419.330583] __mutex_lock+0x89/0x970 <4> [419.330583] ? cancel_userptr+0x19/0xf0 [i915] <4> [419.330583] ? cancel_userptr+0x19/0xf0 [i915] <4> [419.330583] ? lock_acquire+0xbc/0x1c0 <4> [419.330583] ? cancel_userptr+0x19/0xf0 [i915] <4> [419.330583] cancel_userptr+0x19/0xf0 [i915] <4> [419.330583] process_one_work+0x245/0x610 <4> [419.330583] worker_thread+0x37/0x380 <4> [419.330583] ? process_one_work+0x610/0x610 <4> [419.330583] kthread+0x119/0x130 <4> [419.330583] ? kthread_park+0x80/0x80 <4> [419.330583] ret_from_fork+0x24/0x50 <6> [425.368078] [IGT] gem_userptr_blits: exiting, ret=0
The claimed dep link (esp #0 -> #1) is very confusing. Fwiw, the key link here is the shrinker and so that path looks worth investigating with the interaction with the flush_work() in invalidate_range.
Raising priority due to possible connection with Bug 108456
commit 484d9a844d0d0aeaa4cd3cec20885b7de9986a55 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jan 15 12:44:42 2019 +0000 drm/i915/userptr: Avoid struct_mutex recursion for mmu_invalidate_range_start Since commit 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") we have been able to report failure from mmu_invalidate_range_start which allows us to use a trylock on the struct_mutex to avoid potential recursion and report -EBUSY instead. Furthermore, this allows us to pull the work into the main callback and avoid the sleight-of-hand in using a workqueue to avoid lockdep. However, not all paths to mmu_invalidate_range_start are prepared to handle failure, so instead of reporting the recursion, deal with it by propagating the failure upwards, who can decide themselves to handle it or report it. v2: Mark up the recursive lock behaviour and comment on the various weak points. v3: Follow commit 3824e41975ae ("drm/i915: Use mutex_lock_killable() from inside the shrinker") and also use mutex_lock_killable(). v3.1: No leak on EINTR. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108375 References: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190115124442.3500-1-chris@chris-wilson.co.uk
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.