Bug 100099

Summary:	[BAT] [KBL] nvme driver is using a mutex inside an rcu callback causing igt@gem_exec_suspend@basic-s4-devices to fail with dmesg-warn
Product:	DRI	Reporter:	Jari Tahvanainen <jari.tahvanainen>
Component:	DRM/Intel	Assignee:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Status:	CLOSED FIXED	QA Contact:	Jairo Miramontes <jairo.daniel.miramontes.caton>
Severity:	critical
Priority:	highest	CC:	intel-gfx-bugs
Version:	DRI git
Hardware:	x86-64 (AMD64)
OS:	Linux (All)
Whiteboard:
i915 platform:	KBL	i915 features:	GEM/Other

Description Jari Tahvanainen 2017-03-07 12:02:21 UTC

Starting from CI_DRM_2252 igt@gem_exec_suspend@basic-s4-devices have had dmesg-warn as end state.
See https://intel-gfx-ci.01.org/CI/CI_DRM_2296/fi-kbl-7500u/igt@gem_exec_suspend@basic-s4-devices.html for the latest output and links to dmesg files. Some data copied below:

Command	
/opt/igt/tests/gem_exec_suspend --run-subtest basic-S4-devices
Dmesg	
[  245.192104] Suspending console(s) (use no_console_suspend to debug)

[  245.294768] ======================================================
[  245.294769] [ INFO: possible circular locking dependency detected ]
[  245.294771] 4.11.0-rc1-CI-CI_DRM_2296+ #1 Not tainted
[  245.294772] -------------------------------------------------------
[  245.294773] kworker/u8:1/72 is trying to acquire lock:
[  245.294774]  (rcu_preempt_state.barrier_mutex){+.+...}, at: [<ffffffff81100731>] _rcu_barrier+0x31/0x160
[  245.294782] 
               but task is already holding lock:
[  245.294783]  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa00bbc48>] i915_gem_freeze+0x18/0x30 [i915]
[  245.294827] 
               which lock already depends on the new lock.

[  245.294828] 
               the existing dependency chain (in reverse order) is:
[  245.294829] 
               -> #5 (&dev->struct_mutex){+.+.+.}:
[  245.294836]        lock_acquire+0xc9/0x220
[  245.294842]        drm_gem_object_put_unlocked+0x40/0xc0
[  245.294845]        drm_gem_mmap+0xd1/0x100
[  245.294849]        mmap_region+0x39e/0x600
[  245.294851]        do_mmap+0x44c/0x500
[  245.294853]        vm_mmap_pgoff+0x9d/0xd0
[  245.294856]        SyS_mmap_pgoff+0x182/0x220
[  245.294858]        SyS_mmap+0x16/0x20
[  245.294862]        entry_SYSCALL_64_fastpath+0x1c/0xb1
[  245.294863] 
               -> #4 (&mm->mmap_sem){++++++}:
[  245.294868]        lock_acquire+0xc9/0x220
[  245.294871]        __might_fault+0x6b/0x90
[  245.294873]        filldir+0xb2/0x140
[  245.294876]        dcache_readdir+0xf4/0x170
[  245.294877]        iterate_dir+0x7a/0x170
[  245.294879]        SyS_getdents+0xa2/0x150
[  245.294881]        entry_SYSCALL_64_fastpath+0x1c/0xb1
[  245.294882] 
               -> #3 (&sb->s_type->i_mutex_key#3){+++++.}:
[  245.294888]        lock_acquire+0xc9/0x220
[  245.294890]        down_write+0x3f/0x70
[  245.294894]        start_creating+0x5a/0x100
[  245.294897]        debugfs_create_dir+0xc/0xd0
[  245.294900]        blk_mq_debugfs_register+0x1c/0x70
[  245.294904]        blk_mq_register_dev+0xbd/0x120
[  245.294908]        blk_register_queue+0x4c/0x150
[  245.294910]        device_add_disk+0x23b/0x4f0
[  245.294914]        nvme_validate_ns+0x1fa/0x260
[  245.294917]        nvme_scan_work+0x188/0x300
[  245.294921]        process_one_work+0x1f4/0x6d0
[  245.294924]        worker_thread+0x49/0x4a0
[  245.294927]        kthread+0x107/0x140
[  245.294929]        ret_from_fork+0x2e/0x40
[  245.294929] 
               -> #2 (all_q_mutex){+.+.+.}:
[  245.294935]        lock_acquire+0xc9/0x220
[  245.294938]        __mutex_lock+0x6e/0x990
[  245.294941]        mutex_lock_nested+0x16/0x20
[  245.294944]        blk_mq_queue_reinit_work+0x13/0x110
[  245.294947]        blk_mq_queue_reinit_dead+0x17/0x20
[  245.294950]        cpuhp_invoke_callback+0x9c/0x8a0
[  245.294953]        cpuhp_down_callbacks+0x3d/0x80
[  245.294956]        _cpu_down+0xad/0xe0
[  245.294959]        freeze_secondary_cpus+0xb1/0x3d0
[  245.294962]        suspend_devices_and_enter+0x4a0/0xca0
[  245.294964]        pm_suspend+0x59e/0xa50
[  245.294966]        state_store+0x7d/0xf0
[  245.294969]        kobj_attr_store+0xf/0x20
[  245.294972]        sysfs_kf_write+0x40/0x50
[  245.294974]        kernfs_fop_write+0x130/0x1b0
[  245.294977]        __vfs_write+0x23/0x120
[  245.294979]        vfs_write+0xc6/0x1f0
[  245.294982]        SyS_write+0x44/0xb0
[  245.294984]        entry_SYSCALL_64_fastpath+0x1c/0xb1
[  245.294985] 
               -> #1 (cpu_hotplug.lock){+.+.+.}:
[  245.294990]        lock_acquire+0xc9/0x220
[  245.294993]        __mutex_lock+0x6e/0x990
[  245.294996]        mutex_lock_nested+0x16/0x20
[  245.294999]        get_online_cpus+0x61/0x80
[  245.295002]        _rcu_barrier+0x9f/0x160
[  245.295004]        rcu_barrier+0x10/0x20
[  245.295008]        netdev_run_todo+0x5f/0x310
[  245.295011]        rtnl_unlock+0x9/0x10
[  245.295013]        default_device_exit_batch+0x133/0x150
[  245.295017]        ops_exit_list.isra.0+0x4d/0x60
[  245.295020]        cleanup_net+0x1d8/0x2c0
[  245.295023]        process_one_work+0x1f4/0x6d0
[  245.295026]        worker_thread+0x49/0x4a0
[  245.295028]        kthread+0x107/0x140
[  245.295030]        ret_from_fork+0x2e/0x40
[  245.295030] 
               -> #0 (rcu_preempt_state.barrier_mutex){+.+...}:
[  245.295036]        __lock_acquire+0x1b44/0x1b50
[  245.295039]        lock_acquire+0xc9/0x220
[  245.295042]        __mutex_lock+0x6e/0x990
[  245.295045]        mutex_lock_nested+0x16/0x20
[  245.295047]        _rcu_barrier+0x31/0x160
[  245.295049]        rcu_barrier+0x10/0x20
[  245.295089]        i915_gem_shrink_all+0x33/0x40 [i915]
[  245.295126]        i915_gem_freeze+0x20/0x30 [i915]
[  245.295152]        i915_pm_freeze+0x1d/0x20 [i915]
[  245.295155]        pci_pm_freeze+0x59/0xe0
[  245.295157]        dpm_run_callback+0x6f/0x330
[  245.295159]        __device_suspend+0xea/0x330
[  245.295161]        async_suspend+0x1a/0x90
[  245.295165]        async_run_entry_fn+0x34/0x160
[  245.295168]        process_one_work+0x1f4/0x6d0
[  245.295171]        worker_thread+0x49/0x4a0
[  245.295173]        kthread+0x107/0x140
[  245.295175]        ret_from_fork+0x2e/0x40
[  245.295176] 
               other info that might help us debug this:

[  245.295177] Chain exists of:
                 rcu_preempt_state.barrier_mutex --> &mm->mmap_sem --> &dev->struct_mutex

[  245.295181]  Possible unsafe locking scenario:

[  245.295182]        CPU0                    CPU1
[  245.295183]        ----                    ----
[  245.295183]   lock(&dev->struct_mutex);
[  245.295185]                                lock(&mm->mmap_sem);
[  245.295187]                                lock(&dev->struct_mutex);
[  245.295189]   lock(rcu_preempt_state.barrier_mutex);
[  245.295191] 
                *** DEADLOCK ***

[  245.295192] 4 locks held by kworker/u8:1/72:
[  245.295193]  #0:  ("events_unbound"){.+.+.+}, at: [<ffffffff8109dcae>] process_one_work+0x16e/0x6d0
[  245.295199]  #1:  ((&entry->work)){+.+.+.}, at: [<ffffffff8109dcae>] process_one_work+0x16e/0x6d0
[  245.295204]  #2:  (&dev->mutex){......}, at: [<ffffffff815d5e34>] __device_suspend+0xb4/0x330
[  245.295208]  #3:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa00bbc48>] i915_gem_freeze+0x18/0x30 [i915]
[  245.295247] 
               stack backtrace:
[  245.295250] CPU: 2 PID: 72 Comm: kworker/u8:1 Not tainted 4.11.0-rc1-CI-CI_DRM_2296+ #1
[  245.295251] Hardware name: GIGABYTE GB-BKi7A-7500/MFLP7AP-00, BIOS F1 07/27/2016
[  245.295256] Workqueue: events_unbound async_run_entry_fn
[  245.295258] Call Trace:
[  245.295262]  dump_stack+0x67/0x92
[  245.295266]  print_circular_bug+0x1e0/0x2e0
[  245.295269]  __lock_acquire+0x1b44/0x1b50
[  245.295273]  ? trace_hardirqs_on+0xd/0x10
[  245.295277]  lock_acquire+0xc9/0x220
[  245.295280]  ? _rcu_barrier+0x31/0x160
[  245.295282]  ? _rcu_barrier+0x31/0x160
[  245.295286]  __mutex_lock+0x6e/0x990
[  245.295288]  ? _rcu_barrier+0x31/0x160
[  245.295290]  ? _rcu_barrier+0x31/0x160
[  245.295293]  ? synchronize_rcu_expedited+0x35/0xb0
[  245.295296]  ? _raw_spin_unlock_irqrestore+0x52/0x60
[  245.295300]  mutex_lock_nested+0x16/0x20
[  245.295302]  _rcu_barrier+0x31/0x160
[  245.295305]  rcu_barrier+0x10/0x20
[  245.295343]  i915_gem_shrink_all+0x33/0x40 [i915]
[  245.295378]  i915_gem_freeze+0x20/0x30 [i915]
[  245.295404]  i915_pm_freeze+0x1d/0x20 [i915]
[  245.295406]  pci_pm_freeze+0x59/0xe0
[  245.295409]  dpm_run_callback+0x6f/0x330
[  245.295411]  ? pci_pm_poweroff+0xf0/0xf0
[  245.295413]  __device_suspend+0xea/0x330
[  245.295415]  async_suspend+0x1a/0x90
[  245.295419]  async_run_entry_fn+0x34/0x160
[  245.295422]  process_one_work+0x1f4/0x6d0
[  245.295425]  ? process_one_work+0x16e/0x6d0
[  245.295429]  worker_thread+0x49/0x4a0
[  245.295432]  kthread+0x107/0x140
[  245.295435]  ? process_one_work+0x6d0/0x6d0
[  245.295437]  ? kthread_create_on_node+0x40/0x40
[  245.295440]  ret_from_fork+0x2e/0x40
[  250.434263] usb usb1: root hub lost power or was reset
[  250.434265] usb usb2: root hub lost power or was reset
[  250.434719] usb usb3: root hub lost power or was reset
[  250.434720] usb usb4: root hub lost power or was reset

Comment 1 Chris Wilson 2017-03-07 12:08:30 UTC

A quick way to break this cycle (that may or may not be liked much by upstream) would be to add unique classes for each mm.

Comment 2 Chris Wilson 2017-03-07 12:16:17 UTC

Also

[  245.294829] 
               -> #5 (&dev->struct_mutex){+.+.+.}:
[  245.294836]        lock_acquire+0xc9/0x220
[  245.294842]        drm_gem_object_put_unlocked+0x40/0xc0
[  245.294845]        drm_gem_mmap+0xd1/0x100
[  245.294849]        mmap_region+0x39e/0x600
[  245.294851]        do_mmap+0x44c/0x500
[  245.294853]        vm_mmap_pgoff+0x9d/0xd0
[  245.294856]        SyS_mmap_pgoff+0x182/0x220
[  245.294858]        SyS_mmap+0x16/0x20
[  245.294862]        entry_SYSCALL_64_fastpath+0x1c/0xb1

is a false chain. We still do take the struct_mutex underneath mm->mmap_sem via i915_gem_fault() though.

Comment 3 Chris Wilson 2017-03-29 22:12:15 UTC

Rearrange code so that we don't call rcu_barrier() from the shrinker context, for another reason, which is enough to break this cycle. Though I still consider the dependency chain within nvme to be not our bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.