Bug 103251

Summary: [CI] igt@gem_userptr_blits@sync-unmap-cycles - dmesg-warn - WARNING: possible circular locking dependency detected
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: chris, intel-gfx-bugs, joonas.lahtinen, tvrtko.ursulin
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: HSW, SNB i915 features: GEM/Other

Description Marta Löfstedt 2017-10-13 06:31:13 UTC
a new lockdep? on CI_DRM_3225 HSW-shards:

Note, a fix for BUG 102886 was integrated to CI_DRM_3215, a fix for BUG 102939 was integrated to CI_DRM_3202. 

So, this looks like a new thing, therefore I file a new bug. Feel free to re-open and duplicate to the old ones. 

I have checked dmesgs from surrounding runs where we don't hit the issues, but there is no other splat before, so this looks more sporadic than the previous ones.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3225/shard-hsw6/igt@gem_userptr_blits@sync-unmap-cycles.html

this is also after CI_DRM_3215, but I believe I had not archived BUG 102939, before it hit. So it was missed by cibuglog.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3216/shard-snb1/igt@gem_userptr_blits@map-fixed-invalidate-gup.html

[   20.805315] ======================================================
[   20.805316] WARNING: possible circular locking dependency detected
[   20.805319] 4.14.0-rc4-CI-CI_DRM_3225+ #1 Tainted: G     U         
[   20.805320] ------------------------------------------------------
[   20.805322] kworker/6:1H/1438 is trying to acquire lock:
[   20.805324]  (&mm->mmap_sem){++++}, at: [<ffffffffa01c8e01>] __i915_gem_userptr_get_pages_worker+0x141/0x240 [i915]
[   20.805355] 
               but now in release context of a crosslock acquired at the following:
[   20.805357]  ((complete)&this_flusher.done){+.+.}, at: [<ffffffff8190b06d>] wait_for_completion+0x1d/0x20
[   20.805363] 
               which lock already depends on the new lock.

[   20.805365] 
               the existing dependency chain (in reverse order) is:
[   20.805367] 
               -> #1 ((complete)&this_flusher.done){+.+.}:
[   20.805372]        __lock_acquire+0x1420/0x15e0
[   20.805374]        lock_acquire+0xb0/0x200
[   20.805376]        wait_for_common+0x58/0x210
[   20.805378]        wait_for_completion+0x1d/0x20
[   20.805381]        flush_workqueue+0x1af/0x540
[   20.805400]        i915_gem_userptr_mn_invalidate_range_start+0x13c/0x150 [i915]
[   20.805404]        __mmu_notifier_invalidate_range_start+0x76/0xc0
[   20.805406]        unmap_vmas+0x7d/0xa0
[   20.805408]        unmap_region+0xae/0x110
[   20.805410]        do_munmap+0x276/0x3f0
[   20.805411]        vm_munmap+0x67/0x90
[   20.805413]        SyS_munmap+0xe/0x20
[   20.805415]        entry_SYSCALL_64_fastpath+0x1c/0xb1
[   20.805416] 
               -> #0 (&mm->mmap_sem){++++}:
[   20.805419]        down_read+0x3e/0x70
[   20.805435]        __i915_gem_userptr_get_pages_worker+0x141/0x240 [i915]
[   20.805438]        process_one_work+0x233/0x660
[   20.805440]        worker_thread+0x4e/0x3b0
[   20.805441]        kthread+0x152/0x190
[   20.805442] 
               other info that might help us debug this:

[   20.805445]  Possible unsafe locking scenario by crosslock:

[   20.805447]        CPU0                    CPU1
[   20.805448]        ----                    ----
[   20.805449]   lock(&mm->mmap_sem);
[   20.805451]   lock((complete)&this_flusher.done);
[   20.805453]                                lock(&mm->mmap_sem);
[   20.805455]                                unlock((complete)&this_flusher.done);
[   20.805457] 
                *** DEADLOCK ***

[   20.805460] 2 locks held by kworker/6:1H/1438:
[   20.805461]  #0:  (&(&pool->lock)->rlock){-.-.}, at: [<ffffffff8109c94c>] process_one_work+0x2dc/0x660
[   20.805465]  #1:  (&x->wait#10){....}, at: [<ffffffff810cd69d>] complete+0x1d/0x60
[   20.805469] 
               stack backtrace:
[   20.805472] CPU: 6 PID: 1438 Comm: kworker/6:1H Tainted: G     U          4.14.0-rc4-CI-CI_DRM_3225+ #1
[   20.805474] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
[   20.805480] Call Trace:
[   20.805483]  dump_stack+0x68/0x9f
[   20.805486]  print_circular_bug+0x235/0x3c0
[   20.805488]  ? HARDIRQ_verbose+0x10/0x10
[   20.805490]  check_prev_add+0x430/0x840
[   20.805492]  ? ret_from_fork+0x27/0x40
[   20.805494]  lock_commit_crosslock+0x3ee/0x660
[   20.805496]  ? lock_commit_crosslock+0x3ee/0x660
[   20.805498]  complete+0x29/0x60
[   20.805500]  pwq_dec_nr_in_flight+0x9c/0xa0
[   20.805502]  ? _raw_spin_lock_irq+0x40/0x50
[   20.805504]  process_one_work+0x335/0x660
[   20.805506]  worker_thread+0x4e/0x3b0
[   20.805508]  kthread+0x152/0x190
[   20.805509]  ? process_one_work+0x660/0x660
[   20.805511]  ? kthread_create_on_node+0x40/0x40
[   20.805513]  ret_from_fork+0x27/0x40
[   22.417432] random: crng init done
Comment 1 Tvrtko Ursulin 2017-10-16 09:23:49 UTC
This is a false positive caused by an upstream bug.

Feature got disabled in 4.14.0-rc5:

commit b483cf3bc249d7af706390efa63d6671e80d1c09
Author: Ingo Molnar <mingo@kernel.org>
Date:   Sat Oct 14 09:26:59 2017 +0200

    locking/lockdep: Disable cross-release features for now

I also have a proposal for a fix (https://patchwork.freedesktop.org/series/31937/) but overall it is now a matter of keeping an eye on what will be happening with the feature in the upstream, and progressing some flavour of a fix at a suitable time.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.