On CI_DRM_2877, the machine fi-gdg-551 hit the following potential deadlock: [ 20.250373] ============================================ [ 20.255664] WARNING: possible recursive locking detected [ 20.260956] 4.13.0-rc2-CI-CI_DRM_2877+ #1 Not tainted [ 20.265986] -------------------------------------------- [ 20.271276] systemd-udevd/192 is trying to acquire lock: [ 20.276566] (&obj->mm.lock){+.+.+.}, at: [<ffffffffa009d01d>] __i915_gem_object_get_pages+0x1d/0x70 [i915] [ 20.286342] but task is already holding lock: [ 20.292152] (&obj->mm.lock){+.+.+.}, at: [<ffffffffa009d0d6>] i915_gem_object_attach_phys+0x66/0x130 [i915] [ 20.302011] other info that might help us debug this: [ 20.308513] Possible unsafe locking scenario: [ 20.314408] CPU0 [ 20.316840] ---- [ 20.319272] lock(&obj->mm.lock); [ 20.322659] lock(&obj->mm.lock); [ 20.326047] *** DEADLOCK *** [ 20.331943] May be due to missing lock nesting notation [ 20.338706] 4 locks held by systemd-udevd/192: [ 20.343131] #0: (&dev->mutex){......}, at: [<ffffffff815b8a8a>] __driver_attach+0x5a/0xe0 [ 20.351458] #1: (&dev->mutex){......}, at: [<ffffffff815b8a98>] __driver_attach+0x68/0xe0 [ 20.359783] #2: (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa01027cf>] intel_setup_overlay+0x4f/0x340 [i915] [ 20.369823] #3: (&obj->mm.lock){+.+.+.}, at: [<ffffffffa009d0d6>] i915_gem_object_attach_phys+0x66/0x130 [i915] [ 20.380115] stack backtrace: [ 20.384455] CPU: 0 PID: 192 Comm: systemd-udevd Not tainted 4.13.0-rc2-CI-CI_DRM_2877+ #1 [ 20.392602] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 [ 20.402565] Call Trace: [ 20.405003] dump_stack+0x68/0x9f [ 20.408305] __lock_acquire+0xb80/0x1a10 [ 20.412214] lock_acquire+0xb0/0x200 [ 20.415774] ? lock_acquire+0xb0/0x200 [ 20.419571] ? __i915_gem_object_get_pages+0x1d/0x70 [i915] [ 20.425124] __mutex_lock+0x81/0x9a0 [ 20.428749] ? __i915_gem_object_get_pages+0x1d/0x70 [i915] [ 20.434365] ? __i915_gem_object_get_pages+0x1d/0x70 [i915] [ 20.439917] ? __mutex_lock+0x42e/0x9a0 [ 20.443801] ? i915_gem_object_attach_phys+0x66/0x130 [i915] [ 20.449501] ? i915_gem_object_wait+0x2fd/0x3a0 [i915] [ 20.454619] mutex_lock_interruptible_nested+0x16/0x20 [ 20.459736] ? mutex_lock_interruptible_nested+0x16/0x20 [ 20.465092] __i915_gem_object_get_pages+0x1d/0x70 [i915] [ 20.470533] i915_gem_object_attach_phys+0xca/0x130 [i915] [ 20.476069] intel_setup_overlay+0xad/0x340 [i915] [ 20.480914] intel_modeset_gem_init+0x15/0x20 [i915] [ 20.485915] i915_driver_load+0xa0a/0x16b0 [i915] [ 20.490658] i915_pci_probe+0x32/0x90 [i915] [ 20.494911] pci_device_probe+0xa3/0x130 [ 20.498819] driver_probe_device+0x297/0x450 [ 20.503072] __driver_attach+0xde/0xe0 [ 20.506804] ? driver_probe_device+0x450/0x450 [ 20.511229] bus_for_each_dev+0x5d/0x90 [ 20.515049] driver_attach+0x19/0x20 [ 20.518609] bus_add_driver+0x16e/0x270 [ 20.522429] driver_register+0x5b/0xd0 [ 20.526161] __pci_register_driver+0x5b/0x60 [ 20.530474] i915_init+0x6f/0x78 [i915] [ 20.534294] ? 0xffffffffa0236000 [ 20.537595] do_one_initcall+0x3e/0x170 [ 20.541417] ? rcu_read_lock_sched_held+0x45/0x80 [ 20.546102] ? kmem_cache_alloc_trace+0x25c/0x2c0 [ 20.550787] do_init_module+0x5a/0x1fb [ 20.554519] load_module+0x2508/0x2d50 [ 20.558252] ? show_coresize+0x30/0x30 [ 20.561986] SyS_finit_module+0xbc/0xf0 [ 20.565805] ? SyS_finit_module+0xbc/0xf0 [ 20.569799] do_syscall_64+0x5e/0x120 [ 20.573446] entry_SYSCALL64_slow_path+0x25/0x25 [ 20.578045] RIP: 0033:0x7f3042a119f9 [ 20.581605] RSP: 002b:00007fff09bc22d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 20.589146] RAX: ffffffffffffffda RBX: 0000004167f71160 RCX: 00007f3042a119f9 [ 20.596254] RDX: 0000000000000000 RSI: 00007f3043334e23 RDI: 000000000000000f [ 20.603361] RBP: 00007f3043334e23 R08: 0000000000000000 R09: 0000000000000000 [ 20.610469] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000 [ 20.617576] R13: 0000004167f34260 R14: 0000000000020000 R15: 00000041664c1e30 This seems to be the cause for multiple issues further down the fastfeedback testlist. Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2877/fi-gdg-551/dmesg-before.log
Here is the fix: https://patchwork.freedesktop.org/patch/169193/ The fix already got pushed 19 minutes ago, waiting for testing!
Problem verified fixed, closing. Thanks a lot to Ickle for his amazing reaction time!
commit 245fef70dd90b1385af80b66552538606e214f5b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jul 26 19:16:02 2017 +0100 drm/i915: Call the unlocked version of i915_gem_object_get_pages() When we hold for the lock for swapping out the shmem pages for the physically contiguous pages, we have to call the unlocked version of get_pages! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101934 Fixes: 35d23516946e ("drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriately") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-2-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.