Bug 111891 - [CI][SHARDS] aliasing-ppgtt vs userptr - dmesg-warn - WARNING: possible circular locking dependency detected
Summary: [CI][SHARDS] aliasing-ppgtt vs userptr - dmesg-warn - WARNING: possible circu...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 111892 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-10-03 11:13 UTC by Lakshmi
Modified: 2019-10-10 06:12 UTC (History)
1 user (show)

See Also:
i915 platform: SNB
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lakshmi 2019-10-03 11:13:39 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6988/shard-snb6/igt@gem_exec_basic@gtt-rcs0.html
<4> [45.725639] ======================================================
<4> [45.725642] WARNING: possible circular locking dependency detected
<4> [45.725647] 5.4.0-rc1-CI-CI_DRM_6988+ #1 Tainted: G     U           
<4> [45.725652] ------------------------------------------------------
<4> [45.725657] kworker/u16:6/200 is trying to acquire lock:
<4> [45.725669] ffff888205bd7958 (&mapping->i_mmap_rwsem){++++}, at: unmap_mapping_pages+0x48/0x130
<4> [45.725680] 
but task is already holding lock:
<4> [45.725685] ffff88820d2d93a0 (&vm->mutex){+.+.}, at: i915_vma_unbind+0xe6/0x4a0 [i915]
<4> [45.725764] 
which lock already depends on the new lock.

<4> [45.725769] 
the existing dependency chain (in reverse order) is:
<4> [45.725774] 
-> #2 (&vm->mutex){+.+.}:
<4> [45.725782]        __mutex_lock+0x9a/0x9d0
<4> [45.725843]        i915_vma_remove+0x53/0x250 [i915]
<4> [45.725904]        i915_vma_unbind+0x19c/0x4a0 [i915]
<4> [45.725965]        i915_gem_object_unbind+0x153/0x1c0 [i915]
<4> [45.726025]        userptr_mn_invalidate_range_start+0x9f/0x200 [i915]
<4> [45.726033]        __mmu_notifier_invalidate_range_start+0xa3/0x180
<4> [45.726039]        unmap_vmas+0x143/0x150
<4> [45.726044]        unmap_region+0xa3/0x100
<4> [45.726049]        __do_munmap+0x25d/0x490
<4> [45.726053]        __vm_munmap+0x6e/0xc0
<4> [45.726058]        __x64_sys_munmap+0x12/0x20
<4> [45.726063]        do_syscall_64+0x4f/0x210
<4> [45.726069]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [45.726073] 
-> #1 (mmu_notifier_invalidate_range_start){+.+.}:
<4> [45.726082]        page_mkclean_one+0xda/0x210
<4> [45.726087]        rmap_walk_file+0xff/0x260
<4> [45.726092]        page_mkclean+0x9f/0xb0
<4> [45.726097]        clear_page_dirty_for_io+0xa2/0x300
<4> [45.726103]        mpage_submit_page+0x1a/0x70
<4> [45.726108]        mpage_process_page_bufs+0xe7/0x110
<4> [45.726113]        mpage_prepare_extent_to_map+0x1d2/0x2b0
<4> [45.726119]        ext4_writepages+0x592/0x1230
<4> [45.726124]        do_writepages+0x46/0xe0
<4> [45.726130]        __filemap_fdatawrite_range+0xc6/0x100
<4> [45.726135]        file_write_and_wait_range+0x3c/0x90
<4> [45.726140]        ext4_sync_file+0x154/0x500
<4> [45.726146]        do_fsync+0x33/0x60
<4> [45.726150]        __x64_sys_fsync+0xb/0x10
<4> [45.726155]        do_syscall_64+0x4f/0x210
<4> [45.726160]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [45.726164] 
-> #0 (&mapping->i_mmap_rwsem){++++}:
<4> [45.726173]        __lock_acquire+0x1328/0x15d0
<4> [45.726178]        lock_acquire+0xa7/0x1c0
<4> [45.726183]        down_write+0x33/0x70
<4> [45.726188]        unmap_mapping_pages+0x48/0x130
<4> [45.726250]        i915_vma_revoke_mmap+0x81/0x1b0 [i915]
<4> [45.726312]        i915_vma_unbind+0xee/0x4a0 [i915]
<4> [45.726374]        i915_vma_destroy+0x31/0x2f0 [i915]
<4> [45.726431]        __i915_gem_free_objects+0xb8/0x4b0 [i915]
<4> [45.726438]        process_one_work+0x26a/0x620
<4> [45.726442]        worker_thread+0x37/0x380
<4> [45.726448]        kthread+0x119/0x130
<4> [45.726452]        ret_from_fork+0x3a/0x50
<4> [45.726456] 
other info that might help us debug this:

<4> [45.726463] Chain exists of:
  &mapping->i_mmap_rwsem --> mmu_notifier_invalidate_range_start --> &vm->mutex

<4> [45.726474]  Possible unsafe locking scenario:

<4> [45.726479]        CPU0                    CPU1
<4> [45.726483]        ----                    ----
<4> [45.726487]   lock(&vm->mutex);
<4> [45.726498]                                lock(mmu_notifier_invalidate_range_start);
<4> [45.726505]                                lock(&vm->mutex);
<4> [45.726510]   lock(&mapping->i_mmap_rwsem);
<4> [45.726514] 
 *** DEADLOCK ***

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6992/shard-snb1/igt@gem_mmap_gtt@basic-small-copy.html
<4> [33.386129] ======================================================
<4> [33.386132] WARNING: possible circular locking dependency detected
<4> [33.386135] 5.4.0-rc1-CI-CI_DRM_6992+ #1 Tainted: G     U           
<4> [33.386138] ------------------------------------------------------
<4> [33.386141] kworker/u16:3/197 is trying to acquire lock:
<4> [33.386143] ffff8882034802d8 (&mapping->i_mmap_rwsem){++++}, at: unmap_mapping_pages+0x48/0x130
<4> [33.386153] 
but task is already holding lock:
<4> [33.386155] ffff8882155793a0 (&vm->mutex){+.+.}, at: i915_vma_unbind+0xe6/0x4a0 [i915]
<4> [33.386214] 
which lock already depends on the new lock.

<4> [33.386217] 
the existing dependency chain (in reverse order) is:
<4> [33.386220] 
-> #2 (&vm->mutex){+.+.}:
<4> [33.386225]        __mutex_lock+0x9a/0x9d0
<4> [33.386266]        i915_vma_remove+0x53/0x250 [i915]
<4> [33.386306]        i915_vma_unbind+0x19c/0x4a0 [i915]
<4> [33.386346]        i915_gem_object_unbind+0x153/0x1c0 [i915]
<4> [33.386383]        userptr_mn_invalidate_range_start+0x9f/0x200 [i915]
<4> [33.386388]        __mmu_notifier_invalidate_range_start+0xa3/0x180
<4> [33.386391]        unmap_vmas+0x143/0x150
<4> [33.386394]        unmap_region+0xa3/0x100
<4> [33.386397]        __do_munmap+0x25d/0x490
<4> [33.386399]        __vm_munmap+0x6e/0xc0
<4> [33.386402]        __x64_sys_munmap+0x12/0x20
<4> [33.386405]        do_syscall_64+0x4f/0x210
<4> [33.386409]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [33.386411] 
-> #1 (mmu_notifier_invalidate_range_start){+.+.}:
<4> [33.386416]        page_mkclean_one+0xda/0x210
<4> [33.386419]        rmap_walk_file+0xff/0x260
<4> [33.386422]        page_mkclean+0x9f/0xb0
<4> [33.386425]        clear_page_dirty_for_io+0xa2/0x300
<4> [33.386429]        mpage_submit_page+0x1a/0x70
<4> [33.386432]        mpage_process_page_bufs+0xe7/0x110
<4> [33.386435]        mpage_prepare_extent_to_map+0x1d2/0x2b0
<4> [33.386438]        ext4_writepages+0x592/0x1230
<4> [33.386441]        do_writepages+0x46/0xe0
<4> [33.386444]        __filemap_fdatawrite_range+0xc6/0x100
<4> [33.386448]        file_write_and_wait_range+0x3c/0x90
<4> [33.386450]        ext4_sync_file+0x154/0x500
<4> [33.386454]        do_fsync+0x33/0x60
<4> [33.386457]        __x64_sys_fsync+0xb/0x10
<4> [33.386459]        do_syscall_64+0x4f/0x210
<4> [33.386462]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [33.386465] 
-> #0 (&mapping->i_mmap_rwsem){++++}:
<4> [33.386470]        __lock_acquire+0x1328/0x15d0
<4> [33.386473]        lock_acquire+0xa7/0x1c0
<4> [33.386476]        down_write+0x33/0x70
<4> [33.386479]        unmap_mapping_pages+0x48/0x130
<4> [33.386518]        i915_vma_revoke_mmap+0x81/0x1b0 [i915]
<4> [33.386558]        i915_vma_unbind+0xee/0x4a0 [i915]
<4> [33.386597]        i915_vma_destroy+0x31/0x2f0 [i915]
<4> [33.386633]        __i915_gem_free_objects+0xb8/0x4b0 [i915]
<4> [33.386637]        process_one_work+0x26a/0x620
<4> [33.386639]        worker_thread+0x37/0x380
<4> [33.386642]        kthread+0x119/0x130
<4> [33.386645]        ret_from_fork+0x3a/0x50
<4> [33.386647] 
other info that might help us debug this:

<4> [33.386651] Chain exists of:
  &mapping->i_mmap_rwsem --> mmu_notifier_invalidate_range_start --> &vm->mutex

<4> [33.386657]  Possible unsafe locking scenario:

<4> [33.386660]        CPU0                    CPU1
<4> [33.386662]        ----                    ----
<4> [33.386664]   lock(&vm->mutex);
<4> [33.386666]                                lock(mmu_notifier_invalidate_range_start);
<4> [33.386671]                                lock(&vm->mutex);
<4> [33.386674]   lock(&mapping->i_mmap_rwsem);
<4> [33.386676] 
 *** DEADLOCK ***
Comment 2 Chris Wilson 2019-10-03 11:16:17 UTC
This one is particularly nasty here due to the inclusive of the ggtt->mutex from using the aliasing-ppgtt. That means we cannot simply break the cycle by removing struct_mutex and then only using full-ppgtt vma for userptr.
Comment 3 Chris Wilson 2019-10-03 11:19:51 UTC
*** Bug 111892 has been marked as a duplicate of this bug. ***
Comment 4 CI Bug Log 2019-10-04 07:11:53 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* HSW: igt@kms_flip@2x-flip-vs-panning - dmesg-warn - WARNING: possible circular locking dependency detected
  (No new failures associated)
Comment 5 CI Bug Log 2019-10-04 07:28:29 UTC
A CI Bug Log filter associated to this bug has been updated:

{- HSW: igt@kms_flip@2x-flip-vs-panning - dmesg-warn - WARNING: possible circular locking dependency detected -}
{+ HSW: igt@kms_flip@2x-flip-vs-panning|igt@gem_mmap_gtt@basic-small-copy - dmesg-warn - WARNING: possible circular locking dependency detected +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6996/shard-hsw6/igt@gem_mmap_gtt@basic-small-copy.html
Comment 6 Chris Wilson 2019-10-04 16:10:00 UTC
Note while the full-ppgtt lockdep was fixed by

commit 2850748ef8763ab46958e43a4d1c445f29eeb37d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Oct 4 14:39:58 2019 +0100

    drm/i915: Pull i915_vma_pin under the vm->mutex

here with the aliasing-ppgtt conflating with the ggtt->mutex, it is not so simple.
Comment 7 Francesco Balestrieri 2019-10-10 06:12:14 UTC
Happened in 20% of the runs, looks like a major issue.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.