Bug 102993

Summary: [CI][SNB,HSW,APL,KBL] igt@kms_cursor_crc@cursor-size-change - dmesg-ward - lock dep
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BXT, HSW, KBL, SNB i915 features: GEM/Other

Description Marta Löfstedt 2017-09-26 07:30:58 UTC
On CI_DRM_3132 igt@kms_cursor_crc@cursor-size-change was caught with lock-dep an all shards:

[   38.188901] ======================================================
[   38.188905] WARNING: possible circular locking dependency detected
[   38.188910] 4.14.0-rc2-CI-CI_DRM_3132+ #1 Tainted: G     U         
[   38.188914] ------------------------------------------------------
[   38.188918] kms_cursor_crc/1396 is trying to acquire lock:
[   38.188921]  (&dev->struct_mutex){+.+.}, at: [<ffffffffa01af651>] i915_mutex_lock_interruptible+0x51/0x130 [i915]
[   38.189013] 
               but task is already holding lock:
[   38.189017]  (&mm->mmap_sem){++++}, at: [<ffffffff8104ab7d>] __do_page_fault+0x10d/0x570
[   38.189029] 
               which lock already depends on the new lock.

[   38.189034] 
               the existing dependency chain (in reverse order) is:
[   38.189039] 
               -> #6 (&mm->mmap_sem){++++}:
[   38.189055]        __lock_acquire+0x1420/0x15e0
[   38.189061]        lock_acquire+0xb0/0x200
[   38.189069]        __might_fault+0x68/0x90
[   38.189077]        _copy_to_user+0x23/0x70
[   38.189084]        filldir+0xa5/0x120
[   38.189092]        dcache_readdir+0xf9/0x170
[   38.189099]        iterate_dir+0x69/0x1a0
[   38.189106]        SyS_getdents+0xa5/0x140
[   38.189114]        entry_SYSCALL_64_fastpath+0x1c/0xb1
[   38.189117] 
               -> #5 (&sb->s_type->i_mutex_key#5){++++}:
[   38.189133]        down_write+0x3b/0x70
[   38.189139]        handle_create+0xcb/0x1e0
[   38.189145]        devtmpfsd+0x139/0x180
[   38.189150]        kthread+0x152/0x190
[   38.189156]        ret_from_fork+0x27/0x40
[   38.189159] 
               -> #4 ((complete)&req.done){+.+.}:
[   38.189173]        __lock_acquire+0x1420/0x15e0
[   38.189180]        lock_acquire+0xb0/0x200
[   38.189185]        wait_for_common+0x58/0x210
[   38.189191]        wait_for_completion+0x1d/0x20
[   38.189196]        devtmpfs_create_node+0x13d/0x160
[   38.189204]        device_add+0x5eb/0x620
[   38.189212]        device_create_groups_vargs+0xe0/0xf0
[   38.189219]        device_create+0x3a/0x40
[   38.189224]        msr_device_create+0x2b/0x40
[   38.189230]        cpuhp_invoke_callback+0xa3/0x840
[   38.189235]        cpuhp_thread_fun+0x7a/0x150
[   38.189242]        smpboot_thread_fn+0x18a/0x280
[   38.189246]        kthread+0x152/0x190
[   38.189253]        ret_from_fork+0x27/0x40
[   38.189256] 
               -> #3 (cpuhp_state){+.+.}:
[   38.189268]        __lock_acquire+0x1420/0x15e0
[   38.189274]        lock_acquire+0xb0/0x200
[   38.189280]        cpuhp_issue_call+0x10b/0x170
[   38.189286]        __cpuhp_setup_state_cpuslocked+0x134/0x2a0
[   38.189292]        __cpuhp_setup_state+0x46/0x60
[   38.189298]        page_writeback_init+0x43/0x67
[   38.189306]        pagecache_init+0x3d/0x42
[   38.189313]        start_kernel+0x3a8/0x3fc
[   38.189320]        x86_64_start_reservations+0x2a/0x2c
[   38.189327]        x86_64_start_kernel+0x6d/0x70
[   38.189335]        verify_cpu+0x0/0xfb
[   38.189338] 
               -> #2 (cpuhp_state_mutex){+.+.}:
[   38.189351]        __lock_acquire+0x1420/0x15e0
[   38.189357]        lock_acquire+0xb0/0x200
[   38.189363]        __mutex_lock+0x86/0x9b0
[   38.189368]        mutex_lock_nested+0x1b/0x20
[   38.189374]        __cpuhp_setup_state_cpuslocked+0x52/0x2a0
[   38.189380]        __cpuhp_setup_state+0x46/0x60
[   38.189384]        page_alloc_init+0x28/0x30
[   38.189391]        start_kernel+0x145/0x3fc
[   38.189398]        x86_64_start_reservations+0x2a/0x2c
[   38.189404]        x86_64_start_kernel+0x6d/0x70
[   38.189411]        verify_cpu+0x0/0xfb
[   38.189414] 
               -> #1 (cpu_hotplug_lock.rw_sem){++++}:
[   38.189426]        __lock_acquire+0x1420/0x15e0
[   38.189432]        lock_acquire+0xb0/0x200
[   38.189437]        cpus_read_lock+0x3d/0xb0
[   38.189446]        stop_machine+0x1c/0x40
[   38.189520]        i915_gem_set_wedged+0x1a/0x20 [i915]
[   38.189573]        i915_reset+0xb9/0x230 [i915]
[   38.189625]        i915_reset_device+0x1f6/0x260 [i915]
[   38.189677]        i915_handle_error+0x2d8/0x430 [i915]
[   38.189755]        hangcheck_declare_hang+0xd3/0xf0 [i915]
[   38.189828]        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
[   38.189836]        process_one_work+0x233/0x660
[   38.189842]        worker_thread+0x4e/0x3b0
[   38.189846]        kthread+0x152/0x190
[   38.189853]        ret_from_fork+0x27/0x40
[   38.189856] 
               -> #0 (&dev->struct_mutex){+.+.}:
[   38.189869]        check_prev_add+0x430/0x840
[   38.189875]        __lock_acquire+0x1420/0x15e0
[   38.189881]        lock_acquire+0xb0/0x200
[   38.189886]        __mutex_lock+0x86/0x9b0
[   38.189892]        mutex_lock_interruptible_nested+0x1b/0x20
[   38.189957]        i915_mutex_lock_interruptible+0x51/0x130 [i915]
[   38.190022]        i915_gem_fault+0x209/0x650 [i915]
[   38.190030]        __do_fault+0x1e/0x80
[   38.190038]        __handle_mm_fault+0x81f/0xed0
[   38.190045]        handle_mm_fault+0x156/0x300
[   38.190050]        __do_page_fault+0x27c/0x570
[   38.190055]        do_page_fault+0x28/0x250
[   38.190062]        page_fault+0x22/0x30
[   38.190065] 
               other info that might help us debug this:

[   38.190071] Chain exists of:
                 &dev->struct_mutex --> &sb->s_type->i_mutex_key#5 --> &mm->mmap_sem

[   38.190086]  Possible unsafe locking scenario:

[   38.190090]        CPU0                    CPU1
[   38.190093]        ----                    ----
[   38.190095]   lock(&mm->mmap_sem);
[   38.190101]                                lock(&sb->s_type->i_mutex_key#5);
[   38.190108]                                lock(&mm->mmap_sem);
[   38.190114]   lock(&dev->struct_mutex);
[   38.190119] 
                *** DEADLOCK ***

[   38.190126] 1 lock held by kms_cursor_crc/1396:
[   38.190129]  #0:  (&mm->mmap_sem){++++}, at: [<ffffffff8104ab7d>] __do_page_fault+0x10d/0x570
[   38.190140] 
               stack backtrace:
[   38.190148] CPU: 3 PID: 1396 Comm: kms_cursor_crc Tainted: G     U          4.14.0-rc2-CI-CI_DRM_3132+ #1
[   38.190153] Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
[   38.190156] Call Trace:
[   38.190165]  dump_stack+0x68/0x9f
[   38.190174]  print_circular_bug+0x235/0x3c0
[   38.190182]  ? lockdep_init_map_crosslock+0x20/0x20
[   38.190188]  check_prev_add+0x430/0x840
[   38.190198]  ? rcu_read_lock_sched_held+0x28/0x90
[   38.190263]  ? i915_gem_fault+0x201/0x650 [i915]
[   38.190272]  __lock_acquire+0x1420/0x15e0
[   38.190278]  ? __lock_acquire+0x1420/0x15e0
[   38.190285]  ? lockdep_init_map_crosslock+0x20/0x20
[   38.190294]  lock_acquire+0xb0/0x200
[   38.190355]  ? i915_mutex_lock_interruptible+0x51/0x130 [i915]
[   38.190363]  __mutex_lock+0x86/0x9b0
[   38.190422]  ? i915_mutex_lock_interruptible+0x51/0x130 [i915]
[   38.190478]  ? i915_mutex_lock_interruptible+0x51/0x130 [i915]
[   38.190489]  mutex_lock_interruptible_nested+0x1b/0x20
[   38.190495]  ? mutex_lock_interruptible_nested+0x1b/0x20
[   38.190550]  i915_mutex_lock_interruptible+0x51/0x130 [i915]
[   38.190559]  ? __pm_runtime_resume+0x5b/0x90
[   38.190620]  i915_gem_fault+0x209/0x650 [i915]
[   38.190631]  __do_fault+0x1e/0x80
[   38.190639]  __handle_mm_fault+0x81f/0xed0
[   38.190651]  handle_mm_fault+0x156/0x300
[   38.190657]  __do_page_fault+0x27c/0x570
[   38.190664]  do_page_fault+0x28/0x250
[   38.190672]  page_fault+0x22/0x30
[   38.190678] RIP: 0033:0x7ff31db65080
[   38.190682] RSP: 002b:00007ffee18cdb70 EFLAGS: 00010206
[   38.190688] RAX: 00007ff31ad1e000 RBX: 000000000000001f RCX: 00007ff31ad1f000
[   38.190693] RDX: 0000000000000f80 RSI: 00000000ff000000 RDI: 00000000ff000000
[   38.190697] RBP: 0000000000000300 R08: 00007ff31ad1e000 R09: 0000000000001000
[   38.190701] R10: 00000000000002ff R11: 0000000000001000 R12: 0000000000000400
[   38.190705] R13: 00000000ff000000 R14: 0000000000000400 R15: 00007ff31ad1e000

Full data:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3132/shard-snb1/igt@kms_cursor_crc@cursor-size-change.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3132/shard-hsw3/igt@kms_cursor_crc@cursor-size-change.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3132/shard-apl6/igt@kms_cursor_crc@cursor-size-change.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3132/shard-kbl1/igt@kms_cursor_crc@cursor-size-change.html
Comment 1 Chris Wilson 2017-09-26 08:51:29 UTC
Same cpuhp issue as the others.
Comment 2 Marta Löfstedt 2017-09-26 09:18:16 UTC
(In reply to Chris Wilson from comment #1)
> Same cpuhp issue as the others.

Agreed, I will duplicate it.
Comment 3 Marta Löfstedt 2017-09-26 09:18:29 UTC

*** This bug has been marked as a duplicate of bug 102886 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.