Bug 94644

Summary: [BSW BAT] cpu hotplug lockdep splat
Product: DRI Reporter: Daniel Vetter <daniel>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: intel-gfx-bugs, marius.c.vlad
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: BSW/CHT i915 features:

Description Daniel Vetter 2016-03-21 08:46:10 UTC
Seems to be rather sporadic, but does happen occasionally on various tests on bsw-nuc-2. See http://benchsrv.fi.intel.com/archive/results/CI_IGT_test/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c.html


Might be related to bug #93294, but that one is fixed.

[  424.350873] ======================================================
[  424.350874] [ INFO: possible circular locking dependency detected ]
[  424.350878] 4.5.0-rc6-gfxbench+ #1 Tainted: G     U         
[  424.350879] -------------------------------------------------------
[  424.350881] rtcwake/7413 is trying to acquire lock:
[  424.350895]  (s_active#43){++++.+}, at: [<ffffffff8124ec70>] kernfs_remove_by_name_ns+0x40/0xa0
[  424.350896] 
but task is already holding lock:
[  424.350903]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81078c4d>] cpu_hotplug_begin+0x6d/0xc0
[  424.350904] 
which lock already depends on the new lock.

[  424.350905] 
the existing dependency chain (in reverse order) is:
[  424.350908] 
-> #3 (cpu_hotplug.lock){+.+.+.}:
[  424.350913]        [<ffffffff810cd09b>] lock_acquire+0xdb/0x1f0
[  424.350918]        [<ffffffff817be562>] mutex_lock_nested+0x62/0x3b0
[  424.350920]        [<ffffffff81078911>] get_online_cpus+0x61/0x80
[  424.350924]        [<ffffffff81117f1b>] stop_machine+0x1b/0xe0
[  424.350929]        [<ffffffffa016079d>] 0xffffffffa016079d
[  424.350931]        [<ffffffffa0164d86>] 0xffffffffa0164d86
[  424.350933]        [<ffffffffa0166589>] 0xffffffffa0166589
[  424.350935]        [<ffffffffa016de47>] 0xffffffffa016de47
[  424.350937]        [<ffffffffa016e0d8>] 0xffffffffa016e0d8
[  424.350939]        [<ffffffffa0181d9e>] 0xffffffffa0181d9e
[  424.350942]        [<ffffffffa017edc2>] 0xffffffffa017edc2
[  424.350944]        [<ffffffffa016eb73>] 0xffffffffa016eb73
[  424.350946]        [<ffffffffa01f2a21>] 0xffffffffa01f2a21
[  424.350951]        [<ffffffff81515ea4>] drm_dev_register+0xa4/0xb0
[  424.350954]        [<ffffffff815180ae>] drm_get_pci_dev+0xce/0x1e0
[  424.350956]        [<ffffffffa012e2ff>] 0xffffffffa012e2ff
[  424.350961]        [<ffffffff814430f7>] pci_device_probe+0x87/0xf0
[  424.350965]        [<ffffffff81539999>] driver_probe_device+0x229/0x450
[  424.350968]        [<ffffffff81539c43>] __driver_attach+0x83/0x90
[  424.350971]        [<ffffffff81537671>] bus_for_each_dev+0x61/0xa0
[  424.350974]        [<ffffffff81539289>] driver_attach+0x19/0x20
[  424.350977]        [<ffffffff81538d6f>] bus_add_driver+0x1ef/0x290
[  424.350980]        [<ffffffff8153a90b>] driver_register+0x5b/0xe0
[  424.350983]        [<ffffffff8144202b>] __pci_register_driver+0x5b/0x60
[  424.350986]        [<ffffffff81518296>] drm_pci_init+0xd6/0x100
[  424.350988]        [<ffffffffa0265094>] 0xffffffffa0265094
[  424.350993]        [<ffffffff810003de>] do_one_initcall+0xae/0x1d0
[  424.350996]        [<ffffffff8115a5a5>] do_init_module+0x5b/0x1c6
[  424.351000]        [<ffffffff81106f80>] load_module+0x1c20/0x2490
[  424.351003]        [<ffffffff811079de>] SyS_finit_module+0x7e/0xa0
[  424.351006]        [<ffffffff817c2e1b>] entry_SYSCALL_64_fastpath+0x16/0x73
[  424.351010] 
-> #2 (&dev->struct_mutex){+.+.+.}:
[  424.351013]        [<ffffffff810cd09b>] lock_acquire+0xdb/0x1f0
[  424.351016]        [<ffffffff815113e7>] drm_gem_mmap+0x1c7/0x270
[  424.351020]        [<ffffffff81197ec4>] mmap_region+0x334/0x580
[  424.351023]        [<ffffffff81198474>] do_mmap+0x364/0x410
[  424.351027]        [<ffffffff8117c65d>] vm_mmap_pgoff+0x6d/0xa0
[  424.351030]        [<ffffffff811965a4>] SyS_mmap_pgoff+0x184/0x220
[  424.351033]        [<ffffffff8100a1ed>] SyS_mmap+0x1d/0x20
[  424.351037]        [<ffffffff817c2e1b>] entry_SYSCALL_64_fastpath+0x16/0x73
[  424.351040] 
-> #1 (&mm->mmap_sem){++++++}:
[  424.351043]        [<ffffffff810cd09b>] lock_acquire+0xdb/0x1f0
[  424.351046]        [<ffffffff8118d0e5>] __might_fault+0x75/0xa0
[  424.351049]        [<ffffffff8124f67a>] kernfs_fop_write+0x8a/0x180
[  424.351052]        [<ffffffff811d2653>] __vfs_write+0x23/0xe0
[  424.351055]        [<ffffffff811d33b4>] vfs_write+0xa4/0x190
[  424.351058]        [<ffffffff811d4254>] SyS_write+0x44/0xb0
[  424.351061]        [<ffffffff817c2e1b>] entry_SYSCALL_64_fastpath+0x16/0x73
[  424.351065] 
-> #0 (s_active#43){++++.+}:
[  424.351068]        [<ffffffff810cc659>] __lock_acquire+0x1fc9/0x20f0
[  424.351070]        [<ffffffff810cd09b>] lock_acquire+0xdb/0x1f0
[  424.351073]        [<ffffffff8124dca0>] __kernfs_remove+0x210/0x2f0
[  424.351076]        [<ffffffff8124ec70>] kernfs_remove_by_name_ns+0x40/0xa0
[  424.351079]        [<ffffffff8125101d>] sysfs_unmerge_group+0x3d/0x60
[  424.351083]        [<ffffffff81541704>] dpm_sysfs_remove+0x34/0x60
[  424.351086]        [<ffffffff81535204>] device_del+0x44/0x250
[  424.351088]        [<ffffffff81535429>] device_unregister+0x19/0x60
[  424.351091]        [<ffffffff8153fac1>] cpu_cache_sysfs_exit+0x51/0xb0
[  424.351094]        [<ffffffff81540098>] cacheinfo_cpu_callback+0x38/0x70
[  424.351098]        [<ffffffff8109b499>] notifier_call_chain+0x39/0xa0
[  424.351100]        [<ffffffff8109b509>] __raw_notifier_call_chain+0x9/0x10
[  424.351103]        [<ffffffff81078b2e>] cpu_notify+0x1e/0x40
[  424.351106]        [<ffffffff81078bc9>] cpu_notify_nofail+0x9/0x20
[  424.351109]        [<ffffffff81078f13>] _cpu_down+0x233/0x340
[  424.351112]        [<ffffffff81079469>] disable_nonboot_cpus+0xc9/0x380
[  424.351116]        [<ffffffff810d36fe>] suspend_devices_and_enter+0x58e/0xbb0
[  424.351118]        [<ffffffff810d42ec>] pm_suspend+0x5cc/0x970
[  424.351121]        [<ffffffff810d2447>] state_store+0x77/0xe0
[  424.351125]        [<ffffffff813fdc1f>] kobj_attr_store+0xf/0x20
[  424.351128]        [<ffffffff81250370>] sysfs_kf_write+0x40/0x50
[  424.351131]        [<ffffffff8124f72c>] kernfs_fop_write+0x13c/0x180
[  424.351133]        [<ffffffff811d2653>] __vfs_write+0x23/0xe0
[  424.351136]        [<ffffffff811d33b4>] vfs_write+0xa4/0x190
[  424.351139]        [<ffffffff811d4254>] SyS_write+0x44/0xb0
[  424.351142]        [<ffffffff817c2e1b>] entry_SYSCALL_64_fastpath+0x16/0x73
[  424.351143] 
other info that might help us debug this:

[  424.351148] Chain exists of:
  s_active#43 --> &dev->struct_mutex --> cpu_hotplug.lock

[  424.351149]  Possible unsafe locking scenario:

[  424.351150]        CPU0                    CPU1
[  424.351151]        ----                    ----
[  424.351153]   lock(cpu_hotplug.lock);
[  424.351155]                                lock(&dev->struct_mutex);
[  424.351157]                                lock(cpu_hotplug.lock);
[  424.351159]   lock(s_active#43);
[  424.351160] 
 *** DEADLOCK ***

[  424.351162] 8 locks held by rtcwake/7413:
[  424.351168]  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffff811d69e4>] __sb_start_write+0xd4/0xf0
[  424.351174]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8124f651>] kernfs_fop_write+0x61/0x180
[  424.351180]  #2:  (s_active#227){.+.+.+}, at: [<ffffffff8124f659>] kernfs_fop_write+0x69/0x180
[  424.351185]  #3:  (pm_mutex){+.+...}, at: [<ffffffff810d3fbe>] pm_suspend+0x29e/0x970
[  424.351192]  #4:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff814762fb>] acpi_scan_lock_acquire+0x12/0x14
[  424.351198]  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810793c4>] disable_nonboot_cpus+0x24/0x380
[  424.351203]  #6:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffff81078be0>] cpu_hotplug_begin+0x0/0xc0
[  424.351208]  #7:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81078c4d>] cpu_hotplug_begin+0x6d/0xc0
[  424.351209] 
stack backtrace:
[  424.351212] CPU: 0 PID: 7413 Comm: rtcwake Tainted: G     U          4.5.0-rc6-gfxbench+ #1
[  424.351213] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0040.2015.0814.1353 08/14/2015
[  424.351219]  0000000000000000 ffff88003668b820 ffffffff813fba95 ffffffff825f5670
[  424.351222]  ffffffff825a1220 ffff88003668b860 ffffffff810c8cac ffff88003668b8c0
[  424.351226]  ffff88017b3c2ec0 ffff88017b3c2580 0000000000000008 ffff88017b3c2ee8
[  424.351227] Call Trace:
[  424.351231]  [<ffffffff813fba95>] dump_stack+0x67/0x92
[  424.351233]  [<ffffffff810c8cac>] print_circular_bug+0x1fc/0x310
[  424.351235]  [<ffffffff810cc659>] __lock_acquire+0x1fc9/0x20f0
[  424.351238]  [<ffffffff810cd09b>] lock_acquire+0xdb/0x1f0
[  424.351241]  [<ffffffff8124ec70>] ? kernfs_remove_by_name_ns+0x40/0xa0
[  424.351243]  [<ffffffff8124dca0>] __kernfs_remove+0x210/0x2f0
[  424.351246]  [<ffffffff8124ec70>] ? kernfs_remove_by_name_ns+0x40/0xa0
[  424.351248]  [<ffffffff8124deb8>] ? kernfs_find_ns+0x78/0x130
[  424.351250]  [<ffffffff8124ec70>] kernfs_remove_by_name_ns+0x40/0xa0
[  424.351253]  [<ffffffff8125101d>] sysfs_unmerge_group+0x3d/0x60
[  424.351255]  [<ffffffff81541704>] dpm_sysfs_remove+0x34/0x60
[  424.351258]  [<ffffffff81535204>] device_del+0x44/0x250
[  424.351260]  [<ffffffff81535429>] device_unregister+0x19/0x60
[  424.351262]  [<ffffffff8153fac1>] cpu_cache_sysfs_exit+0x51/0xb0
[  424.351265]  [<ffffffff81540098>] cacheinfo_cpu_callback+0x38/0x70
[  424.351267]  [<ffffffff8109b499>] notifier_call_chain+0x39/0xa0
[  424.351270]  [<ffffffff8109b509>] __raw_notifier_call_chain+0x9/0x10
[  424.351272]  [<ffffffff81078b2e>] cpu_notify+0x1e/0x40
[  424.351274]  [<ffffffff81078bc9>] cpu_notify_nofail+0x9/0x20
[  424.351276]  [<ffffffff81078f13>] _cpu_down+0x233/0x340
[  424.351279]  [<ffffffff810e4940>] ? __call_rcu.constprop.61+0x2f0/0x2f0
[  424.351281]  [<ffffffff810e49a0>] ? call_rcu_bh+0x20/0x20
[  424.351285]  [<ffffffff810e0430>] ? trace_raw_output_rcu_utilization+0x60/0x60
[  424.351288]  [<ffffffff810e0430>] ? trace_raw_output_rcu_utilization+0x60/0x60
[  424.351291]  [<ffffffff81079469>] disable_nonboot_cpus+0xc9/0x380
[  424.351293]  [<ffffffff810d36fe>] suspend_devices_and_enter+0x58e/0xbb0
[  424.351296]  [<ffffffff810c7939>] ? __lock_is_held+0x49/0x70
[  424.351299]  [<ffffffff810d42ec>] pm_suspend+0x5cc/0x970
[  424.351301]  [<ffffffff810d2447>] state_store+0x77/0xe0
[  424.351304]  [<ffffffff813fdc1f>] kobj_attr_store+0xf/0x20
[  424.351306]  [<ffffffff81250370>] sysfs_kf_write+0x40/0x50
[  424.351309]  [<ffffffff8124f72c>] kernfs_fop_write+0x13c/0x180
[  424.351311]  [<ffffffff811d2653>] __vfs_write+0x23/0xe0
[  424.351314]  [<ffffffff810c6332>] ? percpu_down_read+0x52/0x90
[  424.351316]  [<ffffffff811d69e4>] ? __sb_start_write+0xd4/0xf0
[  424.351318]  [<ffffffff811d69e4>] ? __sb_start_write+0xd4/0xf0
[  424.351320]  [<ffffffff811d33b4>] vfs_write+0xa4/0x190
[  424.351324]  [<ffffffff811f1b2a>] ? __fget_light+0x6a/0x90
[  424.351326]  [<ffffffff811d4254>] SyS_write+0x44/0xb0
[  424.351329]  [<ffffffff817c2e1b>] entry_SYSCALL_64_fastpath+0x16/0x73
[  424.360008]  cache: parent cpu1 should not be sleeping
[  425.501876] sd 0:0:0:0: [sda] Starting disk
Comment 1 Chris Wilson 2016-03-21 09:12:32 UTC

*** This bug has been marked as a duplicate of bug 94350 ***
Comment 2 Chris Wilson 2016-03-21 09:13:15 UTC
Note that CI *should* be catching another lockdep chain not involving kernfs. A real worry that they are not.
Comment 3 Jari Tahvanainen 2016-10-07 09:08:07 UTC
Closing as duplicate of closed+fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.