Bug 102939 - [CI][SNB,HSW,APL,KBL] igt@prime_mmap@test_userptr - possible circular locking dependency detected
Summary: [CI][SNB,HSW,APL,KBL] igt@prime_mmap@test_userptr - possible circular locking...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-22 07:08 UTC by Marta Löfstedt
Modified: 2017-10-11 09:27 UTC (History)
1 user (show)

See Also:
i915 platform: BXT, HSW, KBL, SNB
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marta Löfstedt 2017-09-22 07:08:45 UTC
From CI_DRM_3118 on SNB-, HSW-, APL- and KBL-shards:
new test: igt@prime_mmap@test_userptr: 

[   29.531667] ======================================================
[   29.531670] WARNING: possible circular locking dependency detected
[   29.531673] 4.14.0-rc1-CI-CI_DRM_3119+ #1 Tainted: G     U         
[   29.531676] ------------------------------------------------------
[   29.531679] prime_mmap/1521 is trying to acquire lock:
[   29.531681]  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50
[   29.531691] 
               but task is already holding lock:
[   29.531694]  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa0151b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
[   29.531742] 
               which lock already depends on the new lock.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3118/shard-snb4/igt@prime_mmap@test_userptr.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3118/shard-hsw6/igt@prime_mmap@test_userptr.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3118/shard-apl2/igt@prime_mmap@test_userptr.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3118/shard-kbl3/igt@prime_mmap@test_userptr.html
Comment 1 krisman 2017-10-02 05:30:11 UTC

*** This bug has been marked as a duplicate of bug 102886 ***
Comment 2 Marta Löfstedt 2017-10-04 11:46:51 UTC
From https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3170/shard-snb4/igt@prime_mmap@test_userptr.html

the lockdep is no longer reproduced on the igt@prime_mmap@test_userptr. However it is still reproduced by the gem_eio tests in BUG 102886.
Comment 3 Marta Löfstedt 2017-10-05 06:50:38 UTC
I was to quick to close this one:
The issue has started again on CI_DRM_3172 and forward

[   85.488531] ======================================================
[   85.488534] WARNING: possible circular locking dependency detected
[   85.488537] 4.14.0-rc3-CI-CI_DRM_3172+ #1 Tainted: G     U         
[   85.488539] ------------------------------------------------------
[   85.488542] prime_mmap/1588 is trying to acquire lock:
[   85.488544]  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109e5a7>] apply_workqueue_attrs+0x17/0x50
[   85.488552] 
               but task is already holding lock:
[   85.488555]  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01b2dfa>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
[   85.488589] 
               which lock already depends on the new lock.

It is very weird that this lockdep wasn't caught on CI_DRM_3170 CI_DRM_3171.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3172/shard-snb5/igt@prime_mmap@test_userptr.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3172/shard-hsw5/igt@prime_mmap@test_userptr.html
Comment 4 Marta Löfstedt 2017-10-05 06:53:48 UTC
I suggest we keep BUG 102886 and this one un-duplicated, since this one is flip-flopping on catching the lockdep and the BUG 102886 seem solid.
Comment 5 Marta Löfstedt 2017-10-05 12:26:38 UTC
I just realized that lockdeps are warn once. So, if other lockdep:ing tests are scheduled in the same shard we will miss the next one. This explains why we didn't hit this on: CI_DRM_3170 CI_DRM_3171

I will redo the duplicate again.
Comment 6 Marta Löfstedt 2017-10-05 12:26:58 UTC

*** This bug has been marked as a duplicate of bug 102886 ***
Comment 7 Marta Löfstedt 2017-10-06 07:40:05 UTC
The lockdep splat in igt@prime_mmap@test_userptr needs to be taken out of this bug  here in cibuglog, it is a different kind of lockdep splat. That one goes through i915_gem_userptr_init__mmu_notifier as the critical call, but not through i915_gem_set_wedged.

Marta, can you pls update cibuglog to create a separate entry and new bugzilla? I need that for my patch :-)
Comment 8 Chris Wilson 2017-10-06 08:34:13 UTC
It's the same cause. Overzealous lock coupling via /dev/cpu/*/msr.

*** This bug has been marked as a duplicate of bug 102886 ***
Comment 9 Marta Löfstedt 2017-10-11 05:55:14 UTC
"commit 7741b547b6e000b08e20667bb3bef22e1a362661
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Oct 9 18:44:00 2017 +0200

    drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock"

was integrated in CI_DRM_3202. From then the lockdep in  igt@prime_mmap@test_userptr has not been reproduced. Note that this is not due it being hidden by another lockdep, in for example:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3202/shard-apl3/igt@prime_mmap@test_userptr.html
there is no lockdep either in bootlog or runtime dmesg.
The same is true for: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3202/shard-kbl5/igt@gem_userptr_blits@process-exit-gtt.html. From which I draw the conclusion that all the gem_userptr_blits lockdeps are fixed by above commit.

However, the gem_eio lockdep is still present, see for example: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3202/shard-kbl7/igt@gem_eio@throttle.html. Also, this is still present: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3202/shard-hsw2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-render.html

From this I (again) draw the conclusion that this is not a duplicate BUG 102886.
Comment 10 Marta Löfstedt 2017-10-11 06:05:26 UTC
I consider all igt@gem_userptr_blits lockdeps to be handled by this bug see:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3201/shard-apl2/igt@gem_userptr_blits@map-fixed-invalidate-overlap.html

These have not been reproduced since CI_DRM_3202.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.