Bug 109665

Summary:	[CI][DRMTIP]igt@gem_mmap_gtt@forked-* - dmesg-warn - WARNING: possible recursive locking detected
Product:	DRI	Reporter:	Lakshmi <lakshminarayana.vudum>
Component:	DRM/Intel	Assignee:	Chris Wilson <chris>
Status:	CLOSED FIXED	QA Contact:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity:	normal
Priority:	medium	CC:	intel-gfx-bugs
Version:	DRI git
Hardware:	Other
OS:	All
Whiteboard:	Triaged, ReadyForDev
i915 platform:	I965G, PNV	i915 features:	GEM/Other

Description Lakshmi 2019-02-18 15:33:24 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-bwr-2160/igt@gem_mmap_gtt@forked-medium-copy-odd.html

<6> [41.274834] Console: switching to colour dummy device 80x25
<6> [41.276010] [IGT] gem_mmap_gtt: executing
<6> [41.290041] [IGT] gem_mmap_gtt: starting subtest forked-medium-copy-odd
<6> [41.312553] gem_mmap_gtt (954): drop_caches: 4
<4> [43.184051] 
<4> [43.184062] ============================================
<4> [43.184068] WARNING: possible recursive locking detected
<4> [43.184075] 5.0.0-rc6-gbf979a24473a-drmtip_221+ #1 Not tainted
<4> [43.184081] --------------------------------------------
<4> [43.184087] gem_mmap_gtt/955 is trying to acquire lock:
<4> [43.184094] 0000000033218f01 (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at: i915_reset_trylock+0x0/0x310 [i915]
<4> [43.184257] 
but task is already holding lock:
<4> [43.184263] 0000000033218f01 (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at: i915_reset_trylock+0x192/0x310 [i915]
<4> [43.184327] 
other info that might help us debug this:
<4> [43.184334]  Possible unsafe locking scenario:

<4> [43.184339]        CPU0
<4> [43.184343]        ----
<4> [43.184346]   lock(&dev_priv->gpu_error.reset_backoff_srcu);
<4> [43.184353]   lock(&dev_priv->gpu_error.reset_backoff_srcu);
<4> [43.184359] 
 *** DEADLOCK ***

<4> [43.184367]  May be due to missing lock nesting notation

<4> [43.184375] 5 locks held by gem_mmap_gtt/955:
<4> [43.184381]  #0: 00000000e57e5973 (&mm->mmap_sem){++++}, at: __do_page_fault+0x133/0x500
<4> [43.184396]  #1: 00000000bbf04b65 (&dev->struct_mutex){+.+.}, at: i915_gem_fault+0x1f6/0x860 [i915]
<4> [43.184470]  #2: 0000000033218f01 (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at: i915_reset_trylock+0x192/0x310 [i915]
<4> [43.184536]  #3: 00000000be208d84 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.25+0x0/0x30
<4> [43.184549]  #4: 00000000075d88fb (shrinker_rwsem){++++}, at: shrink_slab+0x1cb/0x2c0
<4> [43.184561] 
stack backtrace:
<4> [43.184569] CPU: 1 PID: 955 Comm: gem_mmap_gtt Not tainted 5.0.0-rc6-gbf979a24473a-drmtip_221+ #1
<4> [43.184578] Hardware name: Dell Inc.                 OptiPlex 745                 /0GW726, BIOS 2.3.1  05/21/2007
<4> [43.184588] Call Trace:
<4> [43.184597]  dump_stack+0x67/0x9b
<4> [43.184606]  __lock_acquire+0xc75/0x1b00
<4> [43.184615]  ? arch_tlb_finish_mmu+0x2a/0xa0
<4> [43.184622]  ? tlb_finish_mmu+0x1a/0x30
<4> [43.184629]  ? zap_page_range_single+0xe2/0x130
<4> [43.184637]  ? lock_acquire+0xa6/0x1c0
<4> [43.184643]  lock_acquire+0xa6/0x1c0
<4> [43.184701]  ? i915_clear_error_registers+0x280/0x280 [i915]
<4> [43.184762]  i915_reset_trylock+0x44/0x310 [i915]
<4> [43.184823]  ? i915_clear_error_registers+0x280/0x280 [i915]
<4> [43.184831]  ? lockdep_hardirqs_on+0xe0/0x1b0
<4> [43.184840]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4> [43.184906]  fence_update+0x218/0x470 [i915]
<4> [43.184979]  i915_vma_unbind+0xa6/0x550 [i915]
<4> [43.185048]  i915_gem_object_unbind+0xfa/0x190 [i915]
<4> [43.185121]  i915_gem_shrink+0x2dc/0x590 [i915]
<4> [43.185193]  ? i915_gem_shrinker_count+0xd6/0x140 [i915]
<4> [43.185264]  ? i915_gem_shrinker_scan+0xc9/0x130 [i915]
<4> [43.185333]  i915_gem_shrinker_scan+0xc9/0x130 [i915]
<4> [43.185342]  do_shrink_slab+0x143/0x3f0
<4> [43.185350]  shrink_slab+0x228/0x2c0
<4> [43.185358]  shrink_node+0x167/0x450
<4> [43.185366]  do_try_to_free_pages+0xc4/0x340
<4> [43.185373]  try_to_free_pages+0xdc/0x2e0
<4> [43.185382]  __alloc_pages_nodemask+0x662/0x1110
<4> [43.185393]  ? reacquire_held_locks+0xb5/0x1b0
<4> [43.185400]  ? reacquire_held_locks+0xb5/0x1b0
<4> [43.185458]  ? i915_reset_trylock+0x192/0x310 [i915]
<4> [43.185517]  ? i915_memcpy_init_early+0x30/0x30 [i915]
<4> [43.185526]  pte_alloc_one+0x12/0x70
<4> [43.185532]  __pte_alloc+0x11/0xf0
<4> [43.185538]  apply_to_page_range+0x37e/0x440
<4> [43.185599]  remap_io_mapping+0x6c/0x100 [i915]
<4> [43.185668]  i915_gem_fault+0x5a9/0x860 [i915]
<4> [43.185676]  ? ptlock_alloc+0x15/0x30
<4> [43.185684]  __do_fault+0x2c/0xb0
<4> [43.185690]  __handle_mm_fault+0x8ee/0xfa0
<4> [43.185699]  handle_mm_fault+0x196/0x3a0
<4> [43.185707]  __do_page_fault+0x246/0x500
<4> [43.185715]  ? page_fault+0x8/0x30
<4> [43.185721]  page_fault+0x1e/0x30
<4> [43.185728] RIP: 0033:0x55e32d37ee12
<4> [43.185736] Code: b0 df ff ff 89 c2 8b 85 70 df ff ff 01 c2 8b 85 70 df ff ff 48 98 48 8d 0c 85 00 00 00 00 48 8b 85 e0 df ff ff 48 01 c8 f7 d2 <89> 10 83 85 70 df ff ff 01 81 bd 70 df ff ff ff 03 00 00 7e be 48
<4> [43.185752] RSP: 002b:00007fffd2ee9da0 EFLAGS: 00010206
<4> [43.185759] RAX: 00007f48a78d9000 RBX: 0000000000000000 RCX: 0000000000000000
<4> [43.185767] RDX: 00000000ffffffff RSI: 0000000000005401 RDI: 0000000000000002
<4> [43.185775] RBP: 00007fffd2eebe60 R08: 00007fffd2ee9c10 R09: 000000000000001b
<4> [43.185782] R10: 7165722074736554 R11: 0000000000000246 R12: 000055e32d37ca80
<4> [43.185790] R13: 00007fffd2eec250 R14: 0000000000000000 R15: 0000000000000000
<4> [45.851997] ------------[ cut here ]------------
<4> [45.852016] downgrading a read lock
<4> [45.852037] WARNING: CPU: 1 PID: 956 at kernel/locking/lockdep.c:3553 lock_downgrade+0x158/0x1e0
<4> [45.852050] Modules linked in: i915 snd_hda_codec_analog snd_hda_codec_generic coretemp snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm i2c_i801 tg3 lpc_ich prime_numbers
<4> [45.852075] CPU: 1 PID: 956 Comm: gem_mmap_gtt Not tainted 5.0.0-rc6-gbf979a24473a-drmtip_221+ #1
<4> [45.852085] Hardware name: Dell Inc.                 OptiPlex 745                 /0GW726, BIOS 2.3.1  05/21/2007
<4> [45.852097] RIP: 0010:lock_downgrade+0x158/0x1e0
<4> [45.852104] Code: ff e9 23 ff ff ff 4c 89 ea 4c 89 f6 48 89 df e8 2e bd ff ff 85 c0 74 aa eb 9a 48 c7 c7 b2 fb 06 83 48 89 04 24 e8 b8 bf f9 ff <0f> 0b 8b 54 24 0c 48 8b 04 24 e9 46 ff ff ff e8 44 01 3a 00 85 c0
<4> [45.852120] RSP: 0018:ffff9d950027fe38 EFLAGS: 00010082
<4> [45.852127] RAX: 0000000000000000 RBX: ffff9492edb70040 RCX: 0000000000000000
<4> [45.852135] RDX: ffffffff821287fb RSI: 0000000000000001 RDI: ffffffff82128810
<4> [45.852143] RBP: 0000000000000002 R08: 646172676e776f64 R09: 0000000000000000
<4> [45.852151] R10: 0000000000000000 R11: 6b636f6c20646165 R12: 0000000000000246
<4> [45.852158] R13: ffffffff82210e03 R14: ffff9492f83af548 R15: ffff9492f9a4f4d8
<4> [45.852167] FS:  00007f48c3de6980(0000) GS:ffff9492fe040000(0000) knlGS:0000000000000000
<4> [45.852175] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [45.852182] CR2: 00007f48b7853ff8 CR3: 000000002ec82000 CR4: 00000000000006e0
<4> [45.852190] Call Trace:
<4> [45.852199]  downgrade_write+0x12/0x80
<4> [45.852207]  __do_munmap+0x393/0x400
<4> [45.852215]  __vm_munmap+0x6e/0xc0
<4> [45.852222]  __x64_sys_munmap+0x12/0x20
<4> [45.852230]  do_syscall_64+0x55/0x190
<4> [45.852239]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [45.852247] RIP: 0033:0x7f48c348cab7
<4> [45.852254] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 c9 f3 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 f3 2c 00 f7 d8 64 89 01 48
<4> [45.852270] RSP: 002b:00007fffd2ee9d98 EFLAGS: 00000246 ORIG_RAX: 000000000000000b
<4> [45.852279] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f48c348cab7
<4> [45.852287] RDX: 0000000008000000 RSI: 0000000008000000 RDI: 00007f48af8d9000
<4> [45.852294] RBP: 00007fffd2eebe60 R08: 00007fffd2ee9c10 R09: 00007fe4ffff801c
<4> [45.852301] R10: 00007fe6ffff801a R11: 0000000000000246 R12: 000055e32d37ca80
<4> [45.852309] R13: 00007fffd2eec250 R14: 0000000000000000 R15: 0000000000000000
<4> [45.852319] irq event stamp: 565894
<4> [45.852326] hardirqs last  enabled at (565893): [<ffffffff829a1e0c>] _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [45.852337] hardirqs last disabled at (565894): [<ffffffff8299aa5a>] __schedule+0xaa/0xb40
<4> [45.852347] softirqs last  enabled at (565544): [<ffffffff82c0033a>] __do_softirq+0x33a/0x4b9
<4> [45.852358] softirqs last disabled at (565537): [<ffffffff820b9a91>] irq_exit+0xd1/0xe0
<4> [45.852369] WARNING: CPU: 1 PID: 956 at kernel/locking/lockdep.c:3553 lock_downgrade+0x158/0x1e0
<4> [45.852378] ---[ end trace 5334b9b1c50b712d ]---
<4> [45.881027] ------------[ cut here ]------------
<4> [45.881044] downgrading a read lock
<4> [45.881063] WARNING: CPU: 0 PID: 955 at kernel/locking/lockdep.c:3553 lock_downgrade+0x158/0x1e0
<4> [45.881076] Modules linked in: i915 snd_hda_codec_analog snd_hda_codec_generic coretemp snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm i2c_i801 tg3 lpc_ich prime_numbers
<4> [45.881103] CPU: 0 PID: 955 Comm: gem_mmap_gtt Tainted: G        W         5.0.0-rc6-gbf979a24473a-drmtip_221+ #1
<4> [45.881114] Hardware name: Dell Inc.                 OptiPlex 745                 /0GW726, BIOS 2.3.1  05/21/2007
<4> [45.881126] RIP: 0010:lock_downgrade+0x158/0x1e0
<4> [45.881133] Code: ff e9 23 ff ff ff 4c 89 ea 4c 89 f6 48 89 df e8 2e bd ff ff 85 c0 74 aa eb 9a 48 c7 c7 b2 fb 06 83 48 89 04 24 e8 b8 bf f9 ff <0f> 0b 8b 54 24 0c 48 8b 04 24 e9 46 ff ff ff e8 44 01 3a 00 85 c0
<4> [45.881149] RSP: 0018:ffff9d9500273e38 EFLAGS: 00010082
<4> [45.881157] RAX: 0000000000000000 RBX: ffff9492ee0bccc0 RCX: 0000000000000000
<4> [45.881165] RDX: ffffffff821287fb RSI: 0000000000000001 RDI: ffffffff82128810
<4> [45.881172] RBP: 0000000000000005 R08: 646172676e776f64 R09: 0000000000000000
<4> [45.881180] R10: 0000000000000000 R11: 6b636f6c20646165 R12: 0000000000000246
<4> [45.881188] R13: ffffffff82210e03 R14: ffff9492f83a8fc8 R15: ffff9492edae2cb8
<4> [45.881196] FS:  00007f48c3de6980(0000) GS:ffff9492fe000000(0000) knlGS:0000000000000000
<4> [45.881205] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [45.881212] CR2: 00007f48b7853ff8 CR3: 000000003621c000 CR4: 00000000000006f0
<4> [45.881219] Call Trace:
<4> [45.881228]  downgrade_write+0x12/0x80
<4> [45.881238]  __do_munmap+0x393/0x400
<4> [45.881245]  __vm_munmap+0x6e/0xc0
<4> [45.881253]  __x64_sys_munmap+0x12/0x20
<4> [45.881260]  do_syscall_64+0x55/0x190
<4> [45.881271]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [45.881279] RIP: 0033:0x7f48c348cab7
<4> [45.881285] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 c9 f3 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 f3 2c 00 f7 d8 64 89 01 48
<4> [45.881301] RSP: 002b:00007fffd2ee9d98 EFLAGS: 00000246 ORIG_RAX: 000000000000000b
<4> [45.881310] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f48c348cab7
<4> [45.881318] RDX: 0000000008000000 RSI: 0000000008000000 RDI: 00007f48af8d9000
<4> [45.881325] RBP: 00007fffd2eebe60 R08: 00007fffd2ee9c10 R09: 00007fe4ffff801c
<4> [45.881333] R10: 00007fe6ffff801a R11: 0000000000000246 R12: 000055e32d37ca80
<4> [45.881340] R13: 00007fffd2eec250 R14: 0000000000000000 R15: 0000000000000000
<4> [45.881350] irq event stamp: 580823
<4> [45.881357] hardirqs last  enabled at (580823): [<ffffffff829a1e0c>] _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [45.881368] hardirqs last disabled at (580822): [<ffffffff829a1c7d>] _raw_spin_lock_irqsave+0xd/0x50
<4> [45.881379] softirqs last  enabled at (580820): [<ffffffff82c0033a>] __do_softirq+0x33a/0x4b9
<4> [45.881390] softirqs last disabled at (580813): [<ffffffff820b9a91>] irq_exit+0xd1/0xe0
<4> [45.881401] WARNING: CPU: 0 PID: 955 at kernel/locking/lockdep.c:3553 lock_downgrade+0x158/0x1e0
<4> [45.881410] ---[ end trace 5334b9b1c50b712e ]---
<6> [46.893263] gem_mmap_gtt (956) used greatest stack depth: 11992 bytes left
<6> [46.893624] [IGT] gem_mmap_gtt: exiting, ret=0
<6> [47.049826] Console: switching to colour frame buffer device 240x67

Comment 1 CI Bug Log 2019-02-18 15:34:44 UTC

The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* PNV BWR: igt@gem_mmap_gtt@forked-* - dmesg-warn - WARNING: possible recursive locking detected
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-bwr-2160/igt@gem_mmap_gtt@forked-medium-copy-odd.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-pnv-d510/igt@gem_mmap_gtt@forked-basic-small-copy-odd.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_223/fi-bwr-2160/igt@gem_mmap_gtt@forked-medium-copy-xy.html

Comment 2 Chris Wilson 2019-02-18 15:36:07 UTC

https://patchwork.freedesktop.org/patch/286712/

Comment 3 CI Bug Log 2019-02-18 15:38:56 UTC

A CI Bug Log filter associated to this bug has been updated:

{- PNV BWR: igt@gem_mmap_gtt@forked-* - dmesg-warn - WARNING: possible recursive locking detected -}
{+ PNV BWR: igt@gem_mmap_gtt@forked-* - dmesg-warn - WARNING: possible recursive locking detected +}

 No new failures caught with the new filter

Comment 4 CI Bug Log 2019-02-18 15:39:33 UTC

The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* BWR: igt@runner@aborted - fail - Previous test: gem_mmap_gtt (forked-medium-copy-odd)
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-bwr-2160/igt@runner@aborted.html

Comment 5 Chris Wilson 2019-02-20 18:48:23 UTC

commit c1d1746f6d4b37518fe3dc4aba99db1f7a155bdb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 19 12:21:54 2019 +0000

    drm/i915: Avoid reset lock in writing fence registers
    
    The idea of taking the reset lock around writing the fence register was
    to serialise the mmio write we also perform during the reset where those
    registers get clobbered. However, the lock is overkill as write tearing
    between reset and fence_update() is harmless; the final value of the
    fence register is the same. A race between revoke_fences() and
    fence_update() is also harmless at this point as on the fault path where
    this is necessary, we acquire the reset lock to coordinate ourselves in
    the upper layer.
    
    The danger of acquiring the reset lock again in fence_update() is that
    we may recurse from the shrinker along the i915_gem_fault() path.
    
    <4> [125.739646] ============================================
    <4> [125.739652] WARNING: possible recursive locking detected
    <4> [125.739659] 5.0.0-rc6-ga6e4cbf00557-drmtip_223+ #1 Tainted: G     U
    <4> [125.739666] --------------------------------------------
    <4> [125.739672] gem_mmap_gtt/1017 is trying to acquire lock:
    <4> [125.739679] 00000000a730190a (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at: i915_reset_trylock+0x0/0x310 [i915]
    <4> [125.739848]
    but task is already holding lock:
    <4> [125.739854] 00000000a730190a (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at: i915_reset_trylock+0x192/0x310 [i915]
    <4> [125.739918]
    other info that might help us debug this:
    <4> [125.739925]  Possible unsafe locking scenario:
    
    <4> [125.739930]        CPU0
    <4> [125.739934]        ----
    <4> [125.739937]   lock(&dev_priv->gpu_error.reset_backoff_srcu);
    <4> [125.739944]   lock(&dev_priv->gpu_error.reset_backoff_srcu);
    <4> [125.739950]
     *** DEADLOCK ***
    
    <4> [125.739958]  May be due to missing lock nesting notation
    
    <4> [125.739966] 5 locks held by gem_mmap_gtt/1017:
    <4> [125.739972]  #0: 00000000471f682c (&mm->mmap_sem){++++}, at: __do_page_fault+0x133/0x500
    <4> [125.739987]  #1: 0000000026542685 (&dev->struct_mutex){+.+.}, at: i915_gem_fault+0x1f6/0x860 [i915]
    <4> [125.740061]  #2: 00000000a730190a (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at: i915_reset_trylock+0x192/0x310 [i915]
    <4> [125.740126]  #3: 00000000c828eb4f (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.25+0x0/0x30
    <4> [125.740140]  #4: 000000002d360d65 (shrinker_rwsem){++++}, at: shrink_slab+0x1cb/0x2c0
    <4> [125.740151]
    stack backtrace:
    <4> [125.740159] CPU: 1 PID: 1017 Comm: gem_mmap_gtt Tainted: G     U            5.0.0-rc6-ga6e4cbf00557-drmtip_223+ #1
    <4> [125.740170] Hardware name: Dell Inc.                 OptiPlex 745                 /0GW726, BIOS 2.3.1  05/21/2007
    <4> [125.740180] Call Trace:
    <4> [125.740189]  dump_stack+0x67/0x9b
    <4> [125.740199]  __lock_acquire+0xc75/0x1b00
    <4> [125.740209]  ? arch_tlb_finish_mmu+0x2a/0xa0
    <4> [125.740216]  ? tlb_finish_mmu+0x1a/0x30
    <4> [125.740222]  ? zap_page_range_single+0xe2/0x130
    <4> [125.740230]  ? lock_acquire+0xa6/0x1c0
    <4> [125.740237]  lock_acquire+0xa6/0x1c0
    <4> [125.740296]  ? i915_clear_error_registers+0x280/0x280 [i915]
    <4> [125.740357]  i915_reset_trylock+0x44/0x310 [i915]
    <4> [125.740417]  ? i915_clear_error_registers+0x280/0x280 [i915]
    <4> [125.740426]  ? lockdep_hardirqs_on+0xe0/0x1b0
    <4> [125.740434]  ? _raw_spin_unlock_irqrestore+0x39/0x60
    <4> [125.740499]  fence_update+0x218/0x470 [i915]
    <4> [125.740571]  i915_vma_unbind+0xa6/0x550 [i915]
    <4> [125.740640]  i915_gem_object_unbind+0xfa/0x190 [i915]
    <4> [125.740711]  i915_gem_shrink+0x2dc/0x590 [i915]
    <4> [125.740722]  ? ___preempt_schedule+0x16/0x18
    <4> [125.740792]  ? i915_gem_shrinker_scan+0xc9/0x130 [i915]
    <4> [125.740861]  i915_gem_shrinker_scan+0xc9/0x130 [i915]
    <4> [125.740870]  do_shrink_slab+0x143/0x3f0
    <4> [125.740878]  shrink_slab+0x228/0x2c0
    <4> [125.740886]  shrink_node+0x167/0x450
    <4> [125.740894]  do_try_to_free_pages+0xc4/0x340
    <4> [125.740902]  try_to_free_pages+0xdc/0x2e0
    <4> [125.740911]  __alloc_pages_nodemask+0x662/0x1110
    <4> [125.740921]  ? reacquire_held_locks+0xb5/0x1b0
    <4> [125.740928]  ? reacquire_held_locks+0xb5/0x1b0
    <4> [125.740986]  ? i915_reset_trylock+0x192/0x310 [i915]
    <4> [125.741045]  ? i915_memcpy_init_early+0x30/0x30 [i915]
    <4> [125.741054]  pte_alloc_one+0x12/0x70
    <4> [125.741060]  __pte_alloc+0x11/0xf0
    <4> [125.741067]  apply_to_page_range+0x37e/0x440
    <4> [125.741127]  remap_io_mapping+0x6c/0x100 [i915]
    <4> [125.741196]  i915_gem_fault+0x5a9/0x860 [i915]
    <4> [125.741204]  ? ptlock_alloc+0x15/0x30
    <4> [125.741212]  __do_fault+0x2c/0xb0
    <4> [125.741218]  __handle_mm_fault+0x8ee/0xfa0
    <4> [125.741227]  handle_mm_fault+0x196/0x3a0
    <4> [125.741235]  __do_page_fault+0x246/0x500
    <4> [125.741243]  ? page_fault+0x8/0x30
    <4> [125.741250]  page_fault+0x1e/0x30
    <4> [125.741256] RIP: 0033:0x55d0cc456e12
    <4> [125.741264] Code: b0 df ff ff 89 c2 8b 85 70 df ff ff 01 c2 8b 85 70 df ff ff 48 98 48 8d 0c 85 00 00 00 00 48 8b 85 e0 df ff ff 48 01 c8 f7 d2 <89> 10 83 85 70 df ff ff 01 81 bd 70 df ff ff ff 03 00 00 7e be 48
    <4> [125.741280] RSP: 002b:00007ffc1bab7ab0 EFLAGS: 00010206
    <4> [125.741287] RAX: 00007fc787cb6000 RBX: 0000000000000000 RCX: 0000000000000000
    <4> [125.741295] RDX: 00000000ffffffff RSI: 0000000000005401 RDI: 0000000000000002
    <4> [125.741303] RBP: 00007ffc1bab9b70 R08: 00007ffc1bab7920 R09: 000000000000001b
    <4> [125.741310] R10: 7165722074736554 R11: 0000000000000246 R12: 000055d0cc454a80
    <4> [125.741318] R13: 00007ffc1bab9f60 R14: 0000000000000000 R15: 0000000000000000
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109665
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190219122215.8941-4-chris@chris-wilson.co.uk

Comment 6 CI Bug Log 2019-02-21 11:12:04 UTC

The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* PNV: igt@runner@aborted - fail - Previous test: gem_mmap_gtt (forked-basic-small-copy-odd)
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-pnv-d510/igt@runner@aborted.html

Comment 7 CI Bug Log 2019-02-21 11:16:21 UTC

A CI Bug Log filter associated to this bug has been updated:

{- PNV: igt@runner@aborted - fail - Previous test: gem_mmap_gtt (forked-basic-small-copy-odd) -}
{+ PNV: igt@runner@aborted - fail - Previous test: gem_mmap_gtt (forked-basic-small-copy-odd|xy) +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_224/fi-pnv-d510/igt@runner@aborted.html

Comment 8 CI Bug Log 2019-02-21 14:28:07 UTC

A CI Bug Log filter associated to this bug has been updated:

{- BWR: igt@runner@aborted - fail - Previous test: gem_mmap_gtt (forked-medium-copy-odd) -}
{+ BWR: igt@runner@aborted - fail - Previous test: gem_mmap_gtt +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_223/fi-bwr-2160/igt@runner@aborted.html

Comment 9 Martin Peres 2019-03-06 15:21:04 UTC

(In reply to Chris Wilson from comment #5)
> commit c1d1746f6d4b37518fe3dc4aba99db1f7a155bdb
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Feb 19 12:21:54 2019 +0000
> 
>     drm/i915: Avoid reset lock in writing fence registers
>     
>     The idea of taking the reset lock around writing the fence register was
>     to serialise the mmio write we also perform during the reset where those
>     registers get clobbered. However, the lock is overkill as write tearing
>     between reset and fence_update() is harmless; the final value of the
>     fence register is the same. A race between revoke_fences() and
>     fence_update() is also harmless at this point as on the fault path where
>     this is necessary, we acquire the reset lock to coordinate ourselves in
>     the upper layer.
>     
>     The danger of acquiring the reset lock again in fence_update() is that
>     we may recurse from the shrinker along the i915_gem_fault() path.
>     
>     <4> [125.739646] ============================================
>     <4> [125.739652] WARNING: possible recursive locking detected
>     <4> [125.739659] 5.0.0-rc6-ga6e4cbf00557-drmtip_223+ #1 Tainted: G     U
>     <4> [125.739666] --------------------------------------------
>     <4> [125.739672] gem_mmap_gtt/1017 is trying to acquire lock:
>     <4> [125.739679] 00000000a730190a
> (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at:
> i915_reset_trylock+0x0/0x310 [i915]
>     <4> [125.739848]
>     but task is already holding lock:
>     <4> [125.739854] 00000000a730190a
> (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at:
> i915_reset_trylock+0x192/0x310 [i915]
>     <4> [125.739918]
>     other info that might help us debug this:
>     <4> [125.739925]  Possible unsafe locking scenario:
>     
>     <4> [125.739930]        CPU0
>     <4> [125.739934]        ----
>     <4> [125.739937]   lock(&dev_priv->gpu_error.reset_backoff_srcu);
>     <4> [125.739944]   lock(&dev_priv->gpu_error.reset_backoff_srcu);
>     <4> [125.739950]
>      *** DEADLOCK ***
>     
>     <4> [125.739958]  May be due to missing lock nesting notation
>     
>     <4> [125.739966] 5 locks held by gem_mmap_gtt/1017:
>     <4> [125.739972]  #0: 00000000471f682c (&mm->mmap_sem){++++}, at:
> __do_page_fault+0x133/0x500
>     <4> [125.739987]  #1: 0000000026542685 (&dev->struct_mutex){+.+.}, at:
> i915_gem_fault+0x1f6/0x860 [i915]
>     <4> [125.740061]  #2: 00000000a730190a
> (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at:
> i915_reset_trylock+0x192/0x310 [i915]
>     <4> [125.740126]  #3: 00000000c828eb4f (fs_reclaim){+.+.}, at:
> fs_reclaim_acquire.part.25+0x0/0x30
>     <4> [125.740140]  #4: 000000002d360d65 (shrinker_rwsem){++++}, at:
> shrink_slab+0x1cb/0x2c0
>     <4> [125.740151]
>     stack backtrace:
>     <4> [125.740159] CPU: 1 PID: 1017 Comm: gem_mmap_gtt Tainted: G     U   
> 5.0.0-rc6-ga6e4cbf00557-drmtip_223+ #1
>     <4> [125.740170] Hardware name: Dell Inc.                 OptiPlex 745  
> /0GW726, BIOS 2.3.1  05/21/2007
>     <4> [125.740180] Call Trace:
>     <4> [125.740189]  dump_stack+0x67/0x9b
>     <4> [125.740199]  __lock_acquire+0xc75/0x1b00
>     <4> [125.740209]  ? arch_tlb_finish_mmu+0x2a/0xa0
>     <4> [125.740216]  ? tlb_finish_mmu+0x1a/0x30
>     <4> [125.740222]  ? zap_page_range_single+0xe2/0x130
>     <4> [125.740230]  ? lock_acquire+0xa6/0x1c0
>     <4> [125.740237]  lock_acquire+0xa6/0x1c0
>     <4> [125.740296]  ? i915_clear_error_registers+0x280/0x280 [i915]
>     <4> [125.740357]  i915_reset_trylock+0x44/0x310 [i915]
>     <4> [125.740417]  ? i915_clear_error_registers+0x280/0x280 [i915]
>     <4> [125.740426]  ? lockdep_hardirqs_on+0xe0/0x1b0
>     <4> [125.740434]  ? _raw_spin_unlock_irqrestore+0x39/0x60
>     <4> [125.740499]  fence_update+0x218/0x470 [i915]
>     <4> [125.740571]  i915_vma_unbind+0xa6/0x550 [i915]
>     <4> [125.740640]  i915_gem_object_unbind+0xfa/0x190 [i915]
>     <4> [125.740711]  i915_gem_shrink+0x2dc/0x590 [i915]
>     <4> [125.740722]  ? ___preempt_schedule+0x16/0x18
>     <4> [125.740792]  ? i915_gem_shrinker_scan+0xc9/0x130 [i915]
>     <4> [125.740861]  i915_gem_shrinker_scan+0xc9/0x130 [i915]
>     <4> [125.740870]  do_shrink_slab+0x143/0x3f0
>     <4> [125.740878]  shrink_slab+0x228/0x2c0
>     <4> [125.740886]  shrink_node+0x167/0x450
>     <4> [125.740894]  do_try_to_free_pages+0xc4/0x340
>     <4> [125.740902]  try_to_free_pages+0xdc/0x2e0
>     <4> [125.740911]  __alloc_pages_nodemask+0x662/0x1110
>     <4> [125.740921]  ? reacquire_held_locks+0xb5/0x1b0
>     <4> [125.740928]  ? reacquire_held_locks+0xb5/0x1b0
>     <4> [125.740986]  ? i915_reset_trylock+0x192/0x310 [i915]
>     <4> [125.741045]  ? i915_memcpy_init_early+0x30/0x30 [i915]
>     <4> [125.741054]  pte_alloc_one+0x12/0x70
>     <4> [125.741060]  __pte_alloc+0x11/0xf0
>     <4> [125.741067]  apply_to_page_range+0x37e/0x440
>     <4> [125.741127]  remap_io_mapping+0x6c/0x100 [i915]
>     <4> [125.741196]  i915_gem_fault+0x5a9/0x860 [i915]
>     <4> [125.741204]  ? ptlock_alloc+0x15/0x30
>     <4> [125.741212]  __do_fault+0x2c/0xb0
>     <4> [125.741218]  __handle_mm_fault+0x8ee/0xfa0
>     <4> [125.741227]  handle_mm_fault+0x196/0x3a0
>     <4> [125.741235]  __do_page_fault+0x246/0x500
>     <4> [125.741243]  ? page_fault+0x8/0x30
>     <4> [125.741250]  page_fault+0x1e/0x30
>     <4> [125.741256] RIP: 0033:0x55d0cc456e12
>     <4> [125.741264] Code: b0 df ff ff 89 c2 8b 85 70 df ff ff 01 c2 8b 85
> 70 df ff ff 48 98 48 8d 0c 85 00 00 00 00 48 8b 85 e0 df ff ff 48 01 c8 f7
> d2 <89> 10 83 85 70 df ff ff 01 81 bd 70 df ff ff ff 03 00 00 7e be 48
>     <4> [125.741280] RSP: 002b:00007ffc1bab7ab0 EFLAGS: 00010206
>     <4> [125.741287] RAX: 00007fc787cb6000 RBX: 0000000000000000 RCX:
> 0000000000000000
>     <4> [125.741295] RDX: 00000000ffffffff RSI: 0000000000005401 RDI:
> 0000000000000002
>     <4> [125.741303] RBP: 00007ffc1bab9b70 R08: 00007ffc1bab7920 R09:
> 000000000000001b
>     <4> [125.741310] R10: 7165722074736554 R11: 0000000000000246 R12:
> 000055d0cc454a80
>     <4> [125.741318] R13: 00007ffc1bab9f60 R14: 0000000000000000 R15:
> 0000000000000000
>     
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109665
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20190219122215.8941-4-
> chris@chris-wilson.co.uk

Looks good! Thanks!

Comment 10 CI Bug Log 2019-03-06 15:21:16 UTC

The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.