Bug 103950

Summary: [CI] igt@* - dmesg-warn - WARNING: possible circular locking dependency detected
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BXT, HSW, SNB i915 features:

Description Marta Löfstedt 2017-11-28 07:10:12 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-hsw2/igt@gem_exec_capture@capture-render.html

	

[   85.062523] Setting dangerous option reset - tainting kernel
[   85.068934] i915 0000:00:02.0: Resetting chip after gpu hang

[   85.069408] ======================================================
[   85.069410] WARNING: possible circular locking dependency detected
[   85.069413] 4.15.0-rc1-CI-CI_DRM_3397+ #1 Tainted: G     U          
[   85.069415] ------------------------------------------------------
[   85.069417] gem_exec_captur/2810 is trying to acquire lock:
[   85.069419]  ((completion)&self->parked){+.+.}, at: [<ffffffff8109d69d>] kthread_park+0x3d/0x50
[   85.069426] 
               but task is already holding lock:
[   85.069428]  (&dev->struct_mutex){+.+.}, at: [<ffffffffa039b13d>] i915_reset_device+0x1bd/0x230 [i915]
[   85.069448] 
               which lock already depends on the new lock.

[   85.069451] 
               the existing dependency chain (in reverse order) is:
[   85.069454] 
               -> #3 (&dev->struct_mutex){+.+.}:
[   85.069460]        __mutex_lock+0x81/0x9b0
[   85.069481]        i915_mutex_lock_interruptible+0x47/0x130 [i915]
[   85.069502]        i915_gem_fault+0x201/0x760 [i915]
[   85.069507]        __do_fault+0x15/0x70
[   85.069509]        __handle_mm_fault+0x7bf/0xda0
[   85.069512]        handle_mm_fault+0x14f/0x2f0
[   85.069515]        __do_page_fault+0x2d1/0x560
[   85.069518]        page_fault+0x22/0x30
[   85.069520] 
               -> #2 (&mm->mmap_sem){++++}:
[   85.069525]        __might_fault+0x63/0x90
[   85.069529]        _copy_to_user+0x1e/0x70
[   85.069532]        perf_read+0x21d/0x290
[   85.069534]        __vfs_read+0x1e/0x120
[   85.069536]        vfs_read+0xa1/0x150
[   85.069539]        SyS_read+0x40/0xa0
[   85.069541]        entry_SYSCALL_64_fastpath+0x1c/0x89
[   85.069543] 
               -> #1 (&cpuctx_mutex){+.+.}:
[   85.069547]        perf_event_ctx_lock_nested+0xbc/0x1d0
[   85.069549] 
               -> #0 ((completion)&self->parked){+.+.}:
[   85.069555]        lock_acquire+0xaf/0x200
[   85.069558]        wait_for_common+0x54/0x210
[   85.069560]        kthread_park+0x3d/0x50
[   85.069579]        i915_gem_reset_prepare_engine+0x1d/0x90 [i915]
[   85.069600]        i915_gem_reset_prepare+0x2c/0x60 [i915]
[   85.069617]        i915_reset+0x66/0x230 [i915]
[   85.069635]        i915_reset_device+0x1cb/0x230 [i915]
[   85.069651]        i915_handle_error+0x2d3/0x430 [i915]
[   85.069670]        i915_wedged_set+0x79/0xc0 [i915]
[   85.069673]        simple_attr_write+0xab/0xc0
[   85.069677]        full_proxy_write+0x4b/0x70
[   85.069679]        __vfs_write+0x1e/0x130
[   85.069682]        vfs_write+0xc0/0x1b0
[   85.069684]        SyS_write+0x40/0xa0
[   85.069686]        entry_SYSCALL_64_fastpath+0x1c/0x89
[   85.069688] 
               other info that might help us debug this:

[   85.069692] Chain exists of:
                 (completion)&self->parked --> &mm->mmap_sem --> &dev->struct_mutex

[   85.069698]  Possible unsafe locking scenario:

[   85.069701]        CPU0                    CPU1
[   85.069703]        ----                    ----
[   85.069705]   lock(&dev->struct_mutex);
[   85.069707]                                lock(&mm->mmap_sem);
[   85.069710]                                lock(&dev->struct_mutex);
[   85.069713]   lock((completion)&self->parked);
[   85.069715] 
                *** DEADLOCK ***

[   85.069718] 3 locks held by gem_exec_captur/2810:
[   85.069722]  #0:  (sb_writers#10){.+.+}, at: [<ffffffff811ff05e>] vfs_write+0x15e/0x1b0
[   85.069727]  #1:  (&attr->mutex){+.+.}, at: [<ffffffff8122a866>] simple_attr_write+0x36/0xc0
[   85.069732]  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa039b13d>] i915_reset_device+0x1bd/0x230 [i915]
[   85.069751] 
               stack backtrace:
[   85.069754] CPU: 2 PID: 2810 Comm: gem_exec_captur Tainted: G     U           4.15.0-rc1-CI-CI_DRM_3397+ #1
[   85.069758] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
[   85.069760] Call Trace:
[   85.069765]  dump_stack+0x5f/0x86
[   85.069769]  print_circular_bug+0x230/0x3b0
[   85.069772]  check_prev_add+0x439/0x7b0
[   85.069775]  ? lockdep_init_map_crosslock+0x20/0x20
[   85.069778]  ? _raw_spin_unlock_irq+0x24/0x50
[   85.069782]  ? __lock_acquire+0x1385/0x15a0
[   85.069784]  __lock_acquire+0x1385/0x15a0
[   85.069788]  lock_acquire+0xaf/0x200
[   85.069790]  ? kthread_park+0x3d/0x50
[   85.069793]  wait_for_common+0x54/0x210
[   85.069797]  ? kthread_park+0x3d/0x50
[   85.069799]  ? _raw_spin_unlock_irqrestore+0x55/0x60
[   85.069802]  ? try_to_wake_up+0x2a2/0x600
[   85.069804]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
[   85.069807]  kthread_park+0x3d/0x50
[   85.069827]  i915_gem_reset_prepare_engine+0x1d/0x90 [i915]
[   85.069849]  i915_gem_reset_prepare+0x2c/0x60 [i915]
[   85.069865]  i915_reset+0x66/0x230 [i915]
[   85.069881]  i915_reset_device+0x1cb/0x230 [i915]
[   85.069898]  ? gen8_gt_irq_ack+0x160/0x160 [i915]
[   85.069902]  ? work_on_cpu_safe+0x50/0x50
[   85.069919]  i915_handle_error+0x2d3/0x430 [i915]
[   85.069923]  ? __mutex_lock+0x81/0x9b0
[   85.069926]  ? __mutex_lock+0x432/0x9b0
[   85.069929]  ? __might_fault+0x39/0x90
[   85.069933]  ? __might_fault+0x39/0x90
[   85.069951]  i915_wedged_set+0x79/0xc0 [i915]
[   85.069955]  simple_attr_write+0xab/0xc0
[   85.069959]  full_proxy_write+0x4b/0x70
[   85.069961]  __vfs_write+0x1e/0x130
[   85.069965]  ? rcu_read_lock_sched_held+0x6f/0x80
[   85.069968]  ? rcu_sync_lockdep_assert+0x25/0x50
[   85.069971]  ? __sb_start_write+0xd5/0x1f0
[   85.069973]  ? __sb_start_write+0xef/0x1f0
[   85.069976]  vfs_write+0xc0/0x1b0
[   85.069979]  SyS_write+0x40/0xa0
[   85.069982]  entry_SYSCALL_64_fastpath+0x1c/0x89
[   85.069984] RIP: 0033:0x7f22dd739670
[   85.069986] RSP: 002b:00007ffe3f6bc7d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   85.069990] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f22dd739670
[   85.069994] RDX: 0000000000000002 RSI: 000055f260b5c376 RDI: 0000000000000007
[   85.069996] RBP: 0000000000000001 R08: 000055f2614ebed0 R09: 0000000000000000
[   85.069999] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f22df019000
[   85.070002] R13: 0000000000000007 R14: 00007f22df018000 R15: 0000000000000003
Comment 1 Marta Löfstedt 2017-11-28 07:12:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-hsw3/igt@kms_cursor_crc@cursor-256x256-suspend.html

[  282.801181] Suspending console(s) (use no_console_suspend to debug)
[  282.809347] sd 4:0:0:0: [sda] Synchronizing SCSI cache
[  282.809672] sd 4:0:0:0: [sda] Stopping disk

[  282.921866] ======================================================
[  282.921867] WARNING: possible circular locking dependency detected
[  282.921868] 4.15.0-rc1-CI-CI_DRM_3397+ #1 Tainted: G     U          
[  282.921869] ------------------------------------------------------
[  282.921870] watchdog/1/15 is trying to acquire lock:
[  282.921870]  (&cpuctx_mutex){+.+.}, at: [<ffffffff8116071c>] perf_event_ctx_lock_nested+0xbc/0x1d0
[  282.921875] 
               but now in release context of a crosslock acquired at the following:
[  282.921876]  ((completion)&self->parked){+.+.}, at: [<ffffffff8109d69d>] kthread_park+0x3d/0x50
[  282.921879] 
               which lock already depends on the new lock.

[  282.921879] 
               the existing dependency chain (in reverse order) is:
[  282.921879] 
               -> #3 ((completion)&self->parked){+.+.}:
[  282.921883]        wait_for_common+0x54/0x210
[  282.921884]        kthread_park+0x3d/0x50
[  282.921917]        i915_gem_reset_prepare_engine+0x1d/0x90 [i915]
[  282.921940]        i915_gem_reset_prepare+0x2c/0x60 [i915]
[  282.921957]        i915_reset+0x66/0x230 [i915]
[  282.921974]        i915_reset_device+0x1cb/0x230 [i915]
[  282.921991]        i915_handle_error+0x2d3/0x430 [i915]
[  282.922011]        i915_wedged_set+0x79/0xc0 [i915]
[  282.922013]        simple_attr_write+0xab/0xc0
[  282.922016]        full_proxy_write+0x4b/0x70
[  282.922017]        __vfs_write+0x1e/0x130
[  282.922017]        vfs_write+0xc0/0x1b0
[  282.922018]        SyS_write+0x40/0xa0
[  282.922020]        entry_SYSCALL_64_fastpath+0x1c/0x89
[  282.922020] 
               -> #2 (&dev->struct_mutex){+.+.}:
[  282.922023]        __mutex_lock+0x81/0x9b0
[  282.922044]        i915_mutex_lock_interruptible+0x47/0x130 [i915]
[  282.922065]        i915_gem_fault+0x201/0x760 [i915]
[  282.922067]        __do_fault+0x15/0x70
[  282.922069]        __handle_mm_fault+0x7bf/0xda0
[  282.922071]        handle_mm_fault+0x14f/0x2f0
[  282.922072]        __do_page_fault+0x2d1/0x560
[  282.922073]        page_fault+0x22/0x30
[  282.922073] 
               -> #1 (&mm->mmap_sem){++++}:
[  282.922076]        __might_fault+0x63/0x90
[  282.922078]        _copy_to_user+0x1e/0x70
[  282.922079]        perf_read+0x1aa/0x290
[  282.922080]        __vfs_read+0x1e/0x120
[  282.922081]        vfs_read+0xa1/0x150
[  282.922082]        SyS_read+0x40/0xa0
[  282.922083]        entry_SYSCALL_64_fastpath+0x1c/0x89
[  282.922083] 
               -> #0 (&cpuctx_mutex){+.+.}:
[  282.922085]        perf_event_ctx_lock_nested+0xbc/0x1d0
[  282.922086] 
               other info that might help us debug this:

[  282.922086] Chain exists of:
                 &cpuctx_mutex --> &dev->struct_mutex --> (completion)&self->parked

[  282.922088]  Possible unsafe locking scenario by crosslock:

[  282.922089]        CPU0                    CPU1
[  282.922089]        ----                    ----
[  282.922089]   lock(&dev->struct_mutex);
[  282.922090]   lock((completion)&self->parked);
[  282.922091]                                lock(&cpuctx_mutex);
[  282.922092]                                unlock((completion)&self->parked);
[  282.922093] 
                *** DEADLOCK ***

[  282.922094] 1 lock held by watchdog/1/15:
[  282.922094]  #0:  (&x->wait){....}, at: [<ffffffff810c7b03>] complete+0x13/0x50
[  282.922098] 
               stack backtrace:
[  282.922100] CPU: 1 PID: 15 Comm: watchdog/1 Tainted: G     U           4.15.0-rc1-CI-CI_DRM_3397+ #1
[  282.922100] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
[  282.922101] Call Trace:
[  282.922105]  dump_stack+0x5f/0x86
[  282.922108]  print_circular_bug+0x230/0x3b0
[  282.922110]  check_prev_add+0x439/0x7b0
[  282.922112]  ? HARDIRQ_verbose+0x10/0x10
[  282.922115]  ? lock_commit_crosslock+0x3e8/0x640
[  282.922116]  lock_commit_crosslock+0x3e8/0x640
[  282.922118]  ? __kthread_parkme+0x39/0x90
[  282.922120]  ? smpboot_thread_fn+0x184/0x270
[  282.922122]  complete+0x1f/0x50
[  282.922123]  ? __kthread_parkme+0x39/0x90
[  282.922124]  __kthread_parkme+0x63/0x90
[  282.922126]  ? smpboot_thread_fn+0x2c/0x270
[  282.922127]  smpboot_thread_fn+0xaa/0x270
[  282.922128]  kthread+0x139/0x170
[  282.922130]  ? sort_range+0x20/0x20
[  282.922131]  ? kthread_create_on_node+0x40/0x40
[  282.922133]  ret_from_fork+0x24/0x30
Comment 2 Marta Löfstedt 2017-11-28 07:12:56 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-hsw1/igt@pm_rpm@system-suspend-modeset.html

[   82.002286] Suspending console(s) (use no_console_suspend to debug)
[   82.012503] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[   82.021134] sd 1:0:0:0: [sda] Stopping disk

[   82.162690] ======================================================
[   82.162690] WARNING: possible circular locking dependency detected
[   82.162692] 4.15.0-rc1-CI-CI_DRM_3397+ #1 Tainted: G     U          
[   82.162692] ------------------------------------------------------
[   82.162693] watchdog/1/15 is trying to acquire lock:
[   82.162693]  (&cpuctx_mutex){+.+.}, at: [<ffffffff8116071c>] perf_event_ctx_lock_nested+0xbc/0x1d0
[   82.162698] 
               but now in release context of a crosslock acquired at the following:
[   82.162699]  ((completion)&self->parked){+.+.}, at: [<ffffffff8109d69d>] kthread_park+0x3d/0x50
[   82.162702] 
               which lock already depends on the new lock.

[   82.162702] 
               the existing dependency chain (in reverse order) is:
[   82.162702] 
               -> #3 ((completion)&self->parked){+.+.}:
[   82.162707]        wait_for_common+0x54/0x210
[   82.162707]        kthread_park+0x3d/0x50
[   82.162742]        i915_gem_reset_prepare_engine+0x1d/0x90 [i915]
[   82.162764]        i915_gem_reset_prepare+0x2c/0x60 [i915]
[   82.162780]        i915_reset+0x66/0x230 [i915]
[   82.162796]        i915_reset_device+0x1cb/0x230 [i915]
[   82.162812]        i915_handle_error+0x2d3/0x430 [i915]
[   82.162835]        hangcheck_declare_hang+0xc0/0xd0 [i915]
[   82.162857]        i915_hangcheck_elapsed+0x275/0x2f0 [i915]
[   82.162859]        process_one_work+0x227/0x650
[   82.162860]        worker_thread+0x48/0x3a0
[   82.162861]        kthread+0x139/0x170
[   82.162862]        ret_from_fork+0x24/0x30
[   82.162863] 
               -> #2 (&dev->struct_mutex){+.+.}:
[   82.162866]        __mutex_lock+0x81/0x9b0
[   82.162886]        i915_mutex_lock_interruptible+0x47/0x130 [i915]
[   82.162907]        i915_gem_fault+0x201/0x760 [i915]
[   82.162909]        __do_fault+0x15/0x70
[   82.162910]        __handle_mm_fault+0x9ab/0xda0
[   82.162912]        handle_mm_fault+0x14f/0x2f0
[   82.162913]        __do_page_fault+0x2d1/0x560
[   82.162914]        page_fault+0x22/0x30
[   82.162914] 
               -> #1 (&mm->mmap_sem){++++}:
[   82.162917]        __might_fault+0x63/0x90
[   82.162919]        _copy_to_user+0x1e/0x70
[   82.162920]        perf_read+0x1aa/0x290
[   82.162921]        __vfs_read+0x1e/0x120
[   82.162922]        vfs_read+0xa1/0x150
[   82.162923]        SyS_read+0x40/0xa0
[   82.162924]        entry_SYSCALL_64_fastpath+0x1c/0x89
[   82.162924] 
               -> #0 (&cpuctx_mutex){+.+.}:
[   82.162926]        perf_event_ctx_lock_nested+0xbc/0x1d0
[   82.162926] 
               other info that might help us debug this:

[   82.162927] Chain exists of:
                 &cpuctx_mutex --> &dev->struct_mutex --> (completion)&self->parked

[   82.162929]  Possible unsafe locking scenario by crosslock:

[   82.162929]        CPU0                    CPU1
[   82.162930]        ----                    ----
[   82.162930]   lock(&dev->struct_mutex);
[   82.162931]   lock((completion)&self->parked);
[   82.162932]                                lock(&cpuctx_mutex);
[   82.162932]                                unlock((completion)&self->parked);
[   82.162933] 
                *** DEADLOCK ***

[   82.162934] 1 lock held by watchdog/1/15:
[   82.162934]  #0:  (&x->wait){....}, at: [<ffffffff810c7b03>] complete+0x13/0x50
[   82.162938] 
               stack backtrace:
[   82.162939] CPU: 1 PID: 15 Comm: watchdog/1 Tainted: G     U           4.15.0-rc1-CI-CI_DRM_3397+ #1
[   82.162940] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
[   82.162940] Call Trace:
[   82.162944]  dump_stack+0x5f/0x86
[   82.162947]  print_circular_bug+0x230/0x3b0
[   82.162949]  check_prev_add+0x439/0x7b0
[   82.162951]  ? HARDIRQ_verbose+0x10/0x10
[   82.162954]  ? lock_commit_crosslock+0x3e8/0x640
[   82.162955]  lock_commit_crosslock+0x3e8/0x640
[   82.162956]  ? __kthread_parkme+0x39/0x90
[   82.162958]  ? smpboot_thread_fn+0x184/0x270
[   82.162960]  complete+0x1f/0x50
[   82.162961]  ? __kthread_parkme+0x39/0x90
[   82.162962]  __kthread_parkme+0x63/0x90
[   82.162964]  ? smpboot_thread_fn+0x2c/0x270
[   82.162965]  smpboot_thread_fn+0xaa/0x270
[   82.162966]  kthread+0x139/0x170
[   82.162968]  ? sort_range+0x20/0x20
[   82.162969]  ? kthread_create_on_node+0x40/0x40
[   82.162970]  ret_from_fork+0x24/0x30
[   82.196456] IRQ 31: no longer affine to CPU3
Comment 21 Marta Löfstedt 2017-11-30 07:22:57 UTC
a fix was integrated to CI_DRM_3411

commit 8f873adc152c899d0e1b9ccaf9caa468955c1fdd
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Nov 29 17:29:54 2017 +0100

    drm-tip: 2017y-11m-29d-16h-28m-40s UTC integration manifest

commit aff78eb7ce76b42d4c94d2f5ce768c1c2c8fb94d
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Nov 29 12:29:26 2017 +0100

    lockdep: finer-grained completion key for kthread
    
    Ideally we'd create the key through a macro at the real callers and
    pass it all the way down. This would give us better coverage for cases
    where a bunch of kthreads are created for the same thing.
    But this gets the job done meanwhile and unblocks our CI. Refining
    later on is always possible.
    
    v2:
    - make it compile
    - use the right map (Tvrtko)
    
    v3:
    
    lockdep insist on a static key, so the cheap way didn't work. Wire
    2 keys through all the callers.
    
    I didn't extend this up to alloc_workqueue because the
    lockdep_invariant_state() call should separate the work functions from
    the workqueue kthread logic and prevent cross-release state from
    leaking between unrelated work queues that happen to reuse the same
    kthreads.
    
    v4: CI found more compile fail :-/
    
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Marta Lofstedt <marta.lofstedt@intel.com>
    References: https://bugs.freedesktop.org/show_bug.cgi?id=103950
    Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20171129112926.28704-1-daniel.vetter@ffwll.ch

commit 281f8d96fa83efe67198608fe0d1792518c91925
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Nov 29 10:46:36 2017 +0100

    lockdep: Up MAX_LOCKDEP_CHAINS
    
    cross-release ftl
    
    From Chris:
    
    "Fwiw, this isn't cross-release but us reloading the module many times,
    creating a whole host of new lockclasses. Even more fun is when the
    module gets a slightly different address and the new lock address hashes
    into an old lock...
    
    "I did think about a module-hook to revoke the stale lockclasses, but
    that still leaves all the hashed chains.
    
    "This particular nuisance was temporarily pushed back by teaching igt not
    to reload i915.ko on a whim."
    
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Marta Lofstedt <marta.lofstedt@intel.com>
    References: https://bugs.freedesktop.org/show_bug.cgi?id=103707
    Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20171129094636.18638-2-daniel.vetter@ffwll.ch

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.