https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6983/shard-glk7/igt@gem_eio@kms.html <4> [1429.580580] ====================================================== <4> [1429.580583] WARNING: possible circular locking dependency detected <4> [1429.580588] 5.4.0-rc1-CI-CI_DRM_6983+ #1 Tainted: G U <4> [1429.580592] ------------------------------------------------------ <4> [1429.580596] gem_eio/3607 is trying to acquire lock: <4> [1429.580599] ffff88825cc502d8 (&mapping->i_mmap_rwsem){++++}, at: unmap_mapping_pages+0x48/0x130 <4> [1429.580612] but task is already holding lock: <4> [1429.580616] ffff88826169c5b0 (>->reset.mutex){+.+.}, at: intel_gt_reset+0x5f/0x3c0 [i915] <4> [1429.580710] which lock already depends on the new lock. <4> [1429.580714] the existing dependency chain (in reverse order) is: <4> [1429.580718] -> #3 (>->reset.mutex){+.+.}: <4> [1429.580788] i915_request_wait+0xf8/0x870 [i915] <4> [1429.580854] i915_active_wait+0x13c/0x280 [i915] <4> [1429.580921] i915_vma_unbind+0x1c5/0x4a0 [i915] <4> [1429.580986] eb_lookup_vmas+0x4ea/0x11a0 [i915] <4> [1429.581049] i915_gem_do_execbuffer+0x601/0x2360 [i915] <4> [1429.581113] i915_gem_execbuffer2_ioctl+0x11b/0x460 [i915] <4> [1429.581119] drm_ioctl_kernel+0xa7/0xf0 <4> [1429.581123] drm_ioctl+0x2e1/0x390 <4> [1429.581128] do_vfs_ioctl+0xa0/0x6f0 <4> [1429.581132] ksys_ioctl+0x35/0x60 <4> [1429.581135] __x64_sys_ioctl+0x11/0x20 <4> [1429.581140] do_syscall_64+0x4f/0x210 <4> [1429.581146] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [1429.581150] -> #2 (i915_active){+.+.}: <4> [1429.581216] i915_active_wait+0x4a/0x280 [i915] <4> [1429.581283] i915_vma_unbind+0x1c5/0x4a0 [i915] <4> [1429.581348] i915_gem_object_unbind+0x153/0x1c0 [i915] <4> [1429.581415] userptr_mn_invalidate_range_start+0x9f/0x200 [i915] <4> [1429.581421] __mmu_notifier_invalidate_range_start+0xa3/0x180 <4> [1429.581426] unmap_vmas+0x143/0x150 <4> [1429.581430] unmap_region+0xa3/0x100 <4> [1429.581434] __do_munmap+0x25d/0x490 <4> [1429.581437] __vm_munmap+0x6e/0xc0 <4> [1429.581441] __x64_sys_munmap+0x12/0x20 <4> [1429.581445] do_syscall_64+0x4f/0x210 <4> [1429.581449] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [1429.581452] -> #1 (mmu_notifier_invalidate_range_start){+.+.}: <4> [1429.581459] page_mkclean_one+0xda/0x210 <4> [1429.581463] rmap_walk_file+0xff/0x260 <4> [1429.581467] page_mkclean+0x9f/0xb0 <4> [1429.581471] clear_page_dirty_for_io+0xa2/0x300 <4> [1429.581477] mpage_submit_page+0x1a/0x70 <4> [1429.581481] mpage_process_page_bufs+0xe7/0x110 <4> [1429.581485] mpage_prepare_extent_to_map+0x1d2/0x2b0 <4> [1429.581490] ext4_writepages+0x592/0x1230 <4> [1429.581493] do_writepages+0x46/0xe0 <4> [1429.581498] __filemap_fdatawrite_range+0xc6/0x100 <4> [1429.581502] file_write_and_wait_range+0x3c/0x90 <4> [1429.581506] ext4_sync_file+0x154/0x500 <4> [1429.581511] do_fsync+0x33/0x60 <4> [1429.581514] __x64_sys_fsync+0xb/0x10 <4> [1429.581518] do_syscall_64+0x4f/0x210 <4> [1429.581522] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [1429.581526] -> #0 (&mapping->i_mmap_rwsem){++++}: <4> [1429.581533] __lock_acquire+0x1328/0x15d0 <4> [1429.581537] lock_acquire+0xa7/0x1c0 <4> [1429.581541] down_write+0x33/0x70 <4> [1429.581545] unmap_mapping_pages+0x48/0x130 <4> [1429.581606] intel_gt_reset+0x142/0x3c0 [i915] <4> [1429.581668] intel_gt_reset_global+0xe0/0x150 [i915] <4> [1429.581729] intel_gt_handle_error+0x184/0x3b0 [i915] <4> [1429.581788] i915_wedged_set+0x5b/0xc0 [i915] <4> [1429.581793] simple_attr_write+0xb0/0xd0 <4> [1429.581799] full_proxy_write+0x51/0x80 <4> [1429.581803] vfs_write+0xb9/0x1d0 <4> [1429.581807] ksys_write+0x9f/0xe0 <4> [1429.581810] do_syscall_64+0x4f/0x210 <4> [1429.581814] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [1429.581818] other info that might help us debug this: <4> [1429.581823] Chain exists of: &mapping->i_mmap_rwsem --> i915_active --> >->reset.mutex <4> [1429.581830] Possible unsafe locking scenario: <4> [1429.581834] CPU0 CPU1 <4> [1429.581837] ---- ---- <4> [1429.581840] lock(>->reset.mutex); <4> [1429.581843] lock(i915_active); <4> [1429.581847] lock(>->reset.mutex); <4> [1429.581851] lock(&mapping->i_mmap_rwsem); <4> [1429.581854] *** DEADLOCK *** <4> [1429.581859] 3 locks held by gem_eio/3607: <4> [1429.581862] #0: ffff88827652a6b0 (sb_writers#14){.+.+}, at: vfs_write+0x1a4/0x1d0 <4> [1429.581869] #1: ffff88826168ea58 (&attr->mutex){+.+.}, at: simple_attr_write+0x36/0xd0 <4> [1429.581877] #2: ffff88826169c5b0 (>->reset.mutex){+.+.}, at: intel_gt_reset+0x5f/0x3c0 [i915] <4> [1429.581941] stack backtrace: <4> [1429.581946] CPU: 1 PID: 3607 Comm: gem_eio Tainted: G U 5.4.0-rc1-CI-CI_DRM_6983+ #1 <4> [1429.581952] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018 <4> [1429.581958] Call Trace: <4> [1429.581964] dump_stack+0x67/0x9b <4> [1429.581969] check_noncircular+0x19b/0x1c0 <4> [1429.581975] ? __lock_acquire+0x1328/0x15d0 <4> [1429.581979] __lock_acquire+0x1328/0x15d0 <4> [1429.581985] ? trace_vbprintk+0x16a/0x210 <4> [1429.581989] lock_acquire+0xa7/0x1c0 <4> [1429.581994] ? unmap_mapping_pages+0x48/0x130 <4> [1429.581999] down_write+0x33/0x70 <4> [1429.582002] ? unmap_mapping_pages+0x48/0x130 <4> [1429.582007] unmap_mapping_pages+0x48/0x130 <4> [1429.582068] intel_gt_reset+0x142/0x3c0 [i915] <4> [1429.582129] intel_gt_reset_global+0xe0/0x150 [i915] <4> [1429.582191] ? intel_gt_set_wedged+0x70/0x70 [i915] <4> [1429.582197] ? queue_work_node+0x70/0x70 <4> [1429.582258] intel_gt_handle_error+0x184/0x3b0 [i915] <4> [1429.582320] i915_wedged_set+0x5b/0xc0 [i915] <4> [1429.582326] ? simple_strtoull+0x3b/0x70 <4> [1429.582331] simple_attr_write+0xb0/0xd0 <4> [1429.582335] full_proxy_write+0x51/0x80 <4> [1429.582340] vfs_write+0xb9/0x1d0 <4> [1429.582343] ksys_write+0x9f/0xe0 <4> [1429.582347] do_syscall_64+0x4f/0x210 <4> [1429.582352] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [1429.582357] RIP: 0033:0x7fbe9a73c281 <4> [1429.582362] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53 <4> [1429.582371] RSP: 002b:00007ffe335e7b78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 <4> [1429.582377] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbe9a73c281 <4> [1429.582381] RDX: 0000000000000002 RSI: 00007fbe9adcf0e2 RDI: 0000000000000009 <4> [1429.582385] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000034 <4> [1429.582389] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fbe9adcf0e2 <4> [1429.582394] R13: 0000000000000009 R14: 00007ffe335e7d60 R15: 0000000000000040 <7> [1429.589338] [IGT] Checking that the GPU recovered
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * GLK: igt@eio@kms - dmesg-warn - WARNING: possible circular locking dependency detected - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14597/shard-glk2/igt@gem_eio@kms.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6980/shard-glk4/igt@gem_eio@kms.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6981/shard-glk7/igt@gem_eio@kms.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14602/shard-glk5/igt@gem_eio@kms.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6983/shard-glk7/igt@gem_eio@kms.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14611/shard-glk9/igt@gem_eio@kms.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14615/shard-glk9/igt@gem_eio@kms.html
The lock inside i915_active_wait() is due for elimination, which will break this cycle. But... Not sure yet if we don't end up replacing it with another cycle instead.
commit b1e3177bd1d8f41e2a9cc847e56a96cdc0eefe62 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 4 14:40:00 2019 +0100 drm/i915: Coordinate i915_active with its own mutex Forgo the struct_mutex serialisation for i915_active, and interpose its own mutex handling for active/retire. This is a multi-layered sleight-of-hand. First, we had to ensure that no active/retire callbacks accidentally inverted the mutex ordering rules, nor assumed that they were themselves serialised by struct_mutex. More challenging though, is the rule over updating elements of the active rbtree. Instead of the whole i915_active now being serialised by struct_mutex, allocations/rotations of the tree are serialised by the i915_active.mutex and individual nodes are serialised by the caller using the i915_timeline.mutex (we need to use nested spinlocks to interact with the dma_fence callback lists). The pain point here is that instead of a single mutex around execbuf, we now have to take a mutex for active tracker (one for each vma, context, etc) and a couple of spinlocks for each fence update. The improvement in fine grained locking allowing for multiple concurrent clients (eventually!) should be worth it in typical loads. v2: Add some comments that barely elucidate anything :( Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk
This issue occurred thrice in 4 runs and last seen CI_DRM_6983_full (3 weeks old), current run is 7155. Closing and archiving this bug.
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.