Bug 111744 - [CI][DRMTIP] igt@gem_userptr_blits@coherency-sync - dmesg-warn - WARNING: possible circular locking dependency detected
Summary: [CI][DRMTIP] igt@gem_userptr_blits@coherency-sync - dmesg-warn - WARNING: pos...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: not set not set
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-19 09:32 UTC by Lakshmi
Modified: 2019-10-02 09:43 UTC (History)
1 user (show)

See Also:
i915 platform: PNV
i915 features: GEM/Other


Attachments

Description Lakshmi 2019-09-19 09:32:41 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_372/fi-pnv-d510/igt@gem_userptr_blits@coherency-sync.html

<6> [211.865924] Console: switching to colour dummy device 80x25
<6> [211.866296] [IGT] gem_userptr_blits: executing
<6> [211.942018] [IGT] gem_userptr_blits: starting subtest coherency-sync
<6> [211.943453] gem_userptr_bli (1252): drop_caches: 4
<6> [217.143491] perf: interrupt took too long (3996 > 3986), lowering kernel.perf_event_max_sample_rate to 50000
<4> [246.793940] 
<4> [246.793958] ======================================================
<4> [246.793972] WARNING: possible circular locking dependency detected
<4> [246.793989] 5.3.0-gbd6c56f50d15-drmtip_372+ #1 Tainted: G     U           
<4> [246.794003] ------------------------------------------------------
<4> [246.794017] kswapd0/145 is trying to acquire lock:
<4> [246.794030] 000000003f565be6 (&dev->struct_mutex/1){+.+.}, at: userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4> [246.794250] 
but task is already holding lock:
<4> [246.794263] 000000001799cef9 (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe6/0x2a0
<4> [246.794291] 
which lock already depends on the new lock.

<4> [246.794307] 
the existing dependency chain (in reverse order) is:
<4> [246.794322] 
-> #3 (&anon_vma->rwsem){++++}:
<4> [246.794344]        down_write+0x33/0x70
<4> [246.794357]        __vma_adjust+0x3d9/0x7b0
<4> [246.794370]        __split_vma+0x16a/0x180
<4> [246.794385]        mprotect_fixup+0x2a5/0x320
<4> [246.794399]        do_mprotect_pkey+0x208/0x2e0
<4> [246.794413]        __x64_sys_mprotect+0x16/0x20
<4> [246.794429]        do_syscall_64+0x55/0x1c0
<4> [246.794443]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [246.794456] 
-> #2 (&mapping->i_mmap_rwsem){++++}:
<4> [246.794478]        down_write+0x33/0x70
<4> [246.794493]        unmap_mapping_pages+0x48/0x130
<4> [246.794519]        i915_vma_revoke_mmap+0x81/0x1b0 [i915]
<4> [246.794519]        i915_vma_unbind+0x11d/0x4a0 [i915]
<4> [246.794519]        i915_vma_destroy+0x31/0x300 [i915]
<4> [246.794519]        __i915_gem_free_objects+0xb8/0x4b0 [i915]
<4> [246.794519]        drm_file_free.part.0+0x1e6/0x290
<4> [246.794519]        drm_release+0xa6/0xe0
<4> [246.794519]        __fput+0xc2/0x250
<4> [246.794519]        task_work_run+0x82/0xb0
<4> [246.794519]        do_exit+0x35b/0xdb0
<4> [246.794519]        do_group_exit+0x34/0xb0
<4> [246.794519]        __x64_sys_exit_group+0xf/0x10
<4> [246.794519]        do_syscall_64+0x55/0x1c0
<4> [246.794519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [246.794519] 
-> #1 (&vm->mutex){+.+.}:
<4> [246.794519]        i915_gem_shrinker_taints_mutex+0x6d/0xe0 [i915]
<4> [246.794519]        i915_address_space_init+0x9f/0x160 [i915]
<4> [246.794519]        i915_ggtt_init_hw+0x55/0x170 [i915]
<4> [246.794519]        i915_driver_probe+0xc9f/0x1620 [i915]
<4> [246.794519]        i915_pci_probe+0x43/0x1b0 [i915]
<4> [246.794519]        pci_device_probe+0x9e/0x120
<4> [246.794519]        really_probe+0xea/0x3d0
<4> [246.794519]        driver_probe_device+0x10b/0x120
<4> [246.794519]        device_driver_attach+0x4a/0x50
<4> [246.794519]        __driver_attach+0x97/0x130
<4> [246.794519]        bus_for_each_dev+0x74/0xc0
<4> [246.794519]        bus_add_driver+0x13f/0x210
<4> [246.794519]        driver_register+0x56/0xe0
<4> [246.794519]        do_one_initcall+0x58/0x300
<4> [246.794519]        do_init_module+0x56/0x1f6
<4> [246.794519]        load_module+0x25bd/0x2a40
<4> [246.794519]        __se_sys_finit_module+0xd3/0xf0
<4> [246.794519]        do_syscall_64+0x55/0x1c0
<4> [246.794519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [246.794519] 
-> #0 (&dev->struct_mutex/1){+.+.}:
<4> [246.794519]        __lock_acquire+0x15d8/0x1e90
<4> [246.794519]        lock_acquire+0xa6/0x1c0
<4> [246.794519]        __mutex_lock+0x9d/0x9b0
<4> [246.794519]        userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4> [246.794519]        __mmu_notifier_invalidate_range_start+0x85/0x110
<4> [246.794519]        try_to_unmap_one+0x76b/0x860
<4> [246.794519]        rmap_walk_anon+0x104/0x280
<4> [246.794519]        try_to_unmap+0xc0/0xf0
<4> [246.794519]        shrink_page_list+0x561/0xc10
<4> [246.794519]        shrink_inactive_list+0x220/0x440
<4> [246.794519]        shrink_node_memcg+0x36e/0x740
<4> [246.794519]        shrink_node+0xcb/0x490
<4> [246.794519]        balance_pgdat+0x241/0x580
<4> [246.794519]        kswapd+0x16c/0x530
<4> [246.794519]        kthread+0x119/0x130
<4> [246.794519]        ret_from_fork+0x24/0x50
<4> [246.794519] 
other info that might help us debug this:

<4> [246.794519] Chain exists of:
  &dev->struct_mutex/1 --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem

<4> [246.794519]  Possible unsafe locking scenario:

<4> [246.794519]        CPU0                    CPU1
<4> [246.794519]        ----                    ----
<4> [246.794519]   lock(&anon_vma->rwsem);
<4> [246.794519]                                lock(&mapping->i_mmap_rwsem);
<4> [246.794519]                                lock(&anon_vma->rwsem);
<4> [246.794519]   lock(&dev->struct_mutex/1);
<4> [246.794519] 
 *** DEADLOCK ***
Comment 1 CI Bug Log 2019-09-19 09:33:30 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* PNV: igt@gem_userptr_blits@coherency-sync - dmesg-warn - WARNING: possible circular locking dependency detected
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_372/fi-pnv-d510/igt@gem_userptr_blits@coherency-sync.html
Comment 2 Chris Wilson 2019-09-19 09:59:51 UTC
Hmm, I think this is potential deadlock speculated upon and Daniel added more lockdep tracking to catch in ordinary usage.
Comment 3 Chris Wilson 2019-10-02 09:43:28 UTC
commit a4311745bba9763e3c965643d4531bd5765b0513 (HEAD -> drm-intel-next-queued, 
drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Sep 28 09:25:46 2019 +0100

    drm/i915/userptr: Never allow userptr into the mappable GGTT
    
    Daniel Vetter uncovered a nasty cycle in using the mmu-notifiers to
    invalidate userptr objects which also happen to be pulled into GGTT
    mmaps. That is when we unbind the userptr object (on mmu invalidation),
    we revoke all CPU mmaps, which may then recurse into mmu invalidation.
    
    We looked for ways of breaking the cycle, but the revocation on
    invalidation is required and cannot be avoided. The only solution we
    could see was to not allow such GGTT bindings of userptr objects in the
    first place. In practice, no one really wants to use a GGTT mmapping of
    a CPU pointer...


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.