101934 – [BAT][GDG] WARNING: possible recursive locking detected in i915_gem_object_attach_phys

Bug 101934 - [BAT][GDG] WARNING: possible recursive locking detected in i915_gem_object_attach_phys

Summary: [BAT][GDG] WARNING: possible recursive locking detected in i915_gem_object_at...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	DRI git
Hardware:	Other All

Importance:	high critical
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2017-07-26 17:59 UTC by Martin Peres
Modified:	2017-07-26 19:28 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	I915GM
i915 features:	GEM/Other

Attachments

Description Martin Peres 2017-07-26 17:59:47 UTC

On CI_DRM_2877, the machine fi-gdg-551 hit the following potential deadlock:

[   20.250373] ============================================
[   20.255664] WARNING: possible recursive locking detected
[   20.260956] 4.13.0-rc2-CI-CI_DRM_2877+ #1 Not tainted
[   20.265986] --------------------------------------------
[   20.271276] systemd-udevd/192 is trying to acquire lock:
[   20.276566]  (&obj->mm.lock){+.+.+.}, at: [<ffffffffa009d01d>] __i915_gem_object_get_pages+0x1d/0x70 [i915]
[   20.286342] 
               but task is already holding lock:
[   20.292152]  (&obj->mm.lock){+.+.+.}, at: [<ffffffffa009d0d6>] i915_gem_object_attach_phys+0x66/0x130 [i915]
[   20.302011] 
               other info that might help us debug this:
[   20.308513]  Possible unsafe locking scenario:

[   20.314408]        CPU0
[   20.316840]        ----
[   20.319272]   lock(&obj->mm.lock);
[   20.322659]   lock(&obj->mm.lock);
[   20.326047] 
                *** DEADLOCK ***

[   20.331943]  May be due to missing lock nesting notation

[   20.338706] 4 locks held by systemd-udevd/192:
[   20.343131]  #0:  (&dev->mutex){......}, at: [<ffffffff815b8a8a>] __driver_attach+0x5a/0xe0
[   20.351458]  #1:  (&dev->mutex){......}, at: [<ffffffff815b8a98>] __driver_attach+0x68/0xe0
[   20.359783]  #2:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa01027cf>] intel_setup_overlay+0x4f/0x340 [i915]
[   20.369823]  #3:  (&obj->mm.lock){+.+.+.}, at: [<ffffffffa009d0d6>] i915_gem_object_attach_phys+0x66/0x130 [i915]
[   20.380115] 
               stack backtrace:
[   20.384455] CPU: 0 PID: 192 Comm: systemd-udevd Not tainted 4.13.0-rc2-CI-CI_DRM_2877+ #1
[   20.392602] Hardware name: Dell Inc.                 OptiPlex GX280               /0G8310, BIOS A04 02/09/2005
[   20.402565] Call Trace:
[   20.405003]  dump_stack+0x68/0x9f
[   20.408305]  __lock_acquire+0xb80/0x1a10
[   20.412214]  lock_acquire+0xb0/0x200
[   20.415774]  ? lock_acquire+0xb0/0x200
[   20.419571]  ? __i915_gem_object_get_pages+0x1d/0x70 [i915]
[   20.425124]  __mutex_lock+0x81/0x9a0
[   20.428749]  ? __i915_gem_object_get_pages+0x1d/0x70 [i915]
[   20.434365]  ? __i915_gem_object_get_pages+0x1d/0x70 [i915]
[   20.439917]  ? __mutex_lock+0x42e/0x9a0
[   20.443801]  ? i915_gem_object_attach_phys+0x66/0x130 [i915]
[   20.449501]  ? i915_gem_object_wait+0x2fd/0x3a0 [i915]
[   20.454619]  mutex_lock_interruptible_nested+0x16/0x20
[   20.459736]  ? mutex_lock_interruptible_nested+0x16/0x20
[   20.465092]  __i915_gem_object_get_pages+0x1d/0x70 [i915]
[   20.470533]  i915_gem_object_attach_phys+0xca/0x130 [i915]
[   20.476069]  intel_setup_overlay+0xad/0x340 [i915]
[   20.480914]  intel_modeset_gem_init+0x15/0x20 [i915]
[   20.485915]  i915_driver_load+0xa0a/0x16b0 [i915]
[   20.490658]  i915_pci_probe+0x32/0x90 [i915]
[   20.494911]  pci_device_probe+0xa3/0x130
[   20.498819]  driver_probe_device+0x297/0x450
[   20.503072]  __driver_attach+0xde/0xe0
[   20.506804]  ? driver_probe_device+0x450/0x450
[   20.511229]  bus_for_each_dev+0x5d/0x90
[   20.515049]  driver_attach+0x19/0x20
[   20.518609]  bus_add_driver+0x16e/0x270
[   20.522429]  driver_register+0x5b/0xd0
[   20.526161]  __pci_register_driver+0x5b/0x60
[   20.530474]  i915_init+0x6f/0x78 [i915]
[   20.534294]  ? 0xffffffffa0236000
[   20.537595]  do_one_initcall+0x3e/0x170
[   20.541417]  ? rcu_read_lock_sched_held+0x45/0x80
[   20.546102]  ? kmem_cache_alloc_trace+0x25c/0x2c0
[   20.550787]  do_init_module+0x5a/0x1fb
[   20.554519]  load_module+0x2508/0x2d50
[   20.558252]  ? show_coresize+0x30/0x30
[   20.561986]  SyS_finit_module+0xbc/0xf0
[   20.565805]  ? SyS_finit_module+0xbc/0xf0
[   20.569799]  do_syscall_64+0x5e/0x120
[   20.573446]  entry_SYSCALL64_slow_path+0x25/0x25
[   20.578045] RIP: 0033:0x7f3042a119f9
[   20.581605] RSP: 002b:00007fff09bc22d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   20.589146] RAX: ffffffffffffffda RBX: 0000004167f71160 RCX: 00007f3042a119f9
[   20.596254] RDX: 0000000000000000 RSI: 00007f3043334e23 RDI: 000000000000000f
[   20.603361] RBP: 00007f3043334e23 R08: 0000000000000000 R09: 0000000000000000
[   20.610469] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000
[   20.617576] R13: 0000004167f34260 R14: 0000000000020000 R15: 00000041664c1e30

This seems to be the cause for multiple issues further down the fastfeedback testlist.

Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2877/fi-gdg-551/dmesg-before.log

Comment 1 Martin Peres 2017-07-26 19:19:24 UTC

Here is the fix: https://patchwork.freedesktop.org/patch/169193/

The fix already got pushed 19 minutes ago, waiting for testing!

Comment 2 Martin Peres 2017-07-26 19:26:16 UTC

Problem verified fixed, closing. Thanks a lot to Ickle for his amazing reaction time!

Comment 3 Chris Wilson 2017-07-26 19:28:12 UTC

commit 245fef70dd90b1385af80b66552538606e214f5b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 26 19:16:02 2017 +0100

    drm/i915: Call the unlocked version of i915_gem_object_get_pages()
    
    When we hold for the lock for swapping out the shmem pages for the
    physically contiguous pages, we have to call the unlocked version of
    get_pages!
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101934
    Fixes: 35d23516946e ("drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriately")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-2-chris@chris-wilson.co.uk
    Reviewed-by: Matthew Auld <matthew.auld@intel.com>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.