Bug 112068

Summary: [CI][BAT]igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma))
Product: DRI Reporter: Lakshmi <lakshminarayana.vudum>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: chris, intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: ALL i915 features: GEM/Other

Description Lakshmi 2019-10-18 18:11:18 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7115/shard-iclb3/igt@gem_persistent_relocs@forked-thrashing.html
<0> [416.075782] gem_pers-1394    1.... 408908675us : __i915_vma_unbind.part.39: __i915_vma_unbind:1150 GEM_BUG_ON(i915_vma_is_active(vma))
<0> [416.075791] ---------------------------------
<4> [416.076288] ---[ end trace f43cacb32dbb9b2b ]---
<3> [416.083258] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
<3> [416.083269] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1394, name: gem_persistent_
<4> [416.083279] INFO: lockdep is turned off.
<3> [416.083284] Preemption disabled at:
<4> [416.083286] [<0000000000000000>] 0x0
<4> [416.083297] CPU: 1 PID: 1394 Comm: gem_persistent_ Tainted: G     UD           5.4.0-rc3-CI-CI_DRM_7115+ #1
<4> [416.083307] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019
<4> [416.083320] Call Trace:
<4> [416.083328]  dump_stack+0x67/0x9b
<4> [416.083336]  ___might_sleep+0x178/0x260
<4> [416.083345]  wait_for_completion+0x37/0x1a0
<4> [416.083355]  virt_efi_query_variable_info+0x161/0x1b0
<4> [416.083365]  efi_query_variable_store+0xb3/0x1a0
<4> [416.083374]  ? efivar_entry_set_safe+0x19c/0x220
<4> [416.083380]  efivar_entry_set_safe+0x19c/0x220
<4> [416.083512]  ? efi_pstore_write+0x10b/0x150
<4> [416.083518]  efi_pstore_write+0x10b/0x150
<4> [416.083530]  pstore_dump+0x127/0x340
<4> [416.083540]  kmsg_dump+0x87/0x1c0
<4> [416.083547]  oops_end+0x3e/0x90
<4> [416.083554]  do_trap+0x80/0x100
<4> [416.083612]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.083622]  do_invalid_op+0x23/0x30
<4> [416.083672]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.083680]  invalid_op+0x23/0x30
<4> [416.083726] RIP: 0010:__i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.083734] Code: 68 16 df e0 48 8b 35 20 6a 1d 00 49 c7 c0 da 6c 47 a0 b9 7e 04 00 00 48 c7 c2 80 8c 41 a0 48 c7 c7 32 99 33 a0 e8 39 12 e6 e0 <0f> 0b 48 c7 c1 da 6c 47 a0 ba 84 04 00 00 48 c7 c6 80 8c 41 a0 48
<4> [416.083753] RSP: 0018:ffffc900005d7c48 EFLAGS: 00010286
<4> [416.083760] RAX: 000000000000000c RBX: ffff888499575188 RCX: 0000000000000000
<4> [416.083768] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88849d5b0400
<4> [416.083775] RBP: ffff8882f9001a80 R08: 000000000003195b R09: ffff888476292000
<4> [416.083783] R10: 0000000000000000 R11: ffff88849d5b0400 R12: ffff8882ff85cd48
<4> [416.083790] R13: ffffc900005d7ca0 R14: ffff8882ff85cb40 R15: ffffc900005d7ca0
<4> [416.083842]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.083888]  i915_vma_unbind+0x2d/0x50 [i915]
<4> [416.083936]  i915_gem_object_unbind+0x11c/0x250 [i915]
<4> [416.083984]  i915_gem_shrink+0x297/0x5f0 [i915]
<4> [416.083997]  ? lockdep_hardirqs_on+0xe3/0x1c0
<4> [416.084043]  ? i915_gem_shrink_all+0x38/0x60 [i915]
<4> [416.084089]  i915_gem_shrink_all+0x38/0x60 [i915]
<4> [416.084134]  i915_drop_caches_set+0xf3/0x250 [i915]
<4> [416.084145]  simple_attr_write+0xb0/0xd0
<4> [416.084153]  full_proxy_write+0x51/0x80
<4> [416.084161]  vfs_write+0xb9/0x1d0
<4> [416.084167]  ksys_write+0x9f/0xe0
<4> [416.084174]  do_syscall_64+0x4f/0x210
<4> [416.084181]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [416.084188] RIP: 0033:0x7f6805a9c281
<4> [416.084194] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [416.084214] RSP: 002b:00007ffe3f60adb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [416.084224] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6805a9c281
<4> [416.084233] RDX: 0000000000000004 RSI: 00007ffe3f60ae10 RDI: 0000000000000008
<4> [416.084241] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
<4> [416.084250] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3f60ae10
<4> [416.084258] R13: 0000000000000008 R14: 00007ffe3f60ae10 R15: 0000000000000000
<4> [416.084335] ------------[ cut here ]------------
<4> [416.084343] WARNING: CPU: 1 PID: 1394 at kernel/rcu/tree_plugin.h:293 rcu_note_context_switch+0x7e/0x650
<4> [416.084354] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp mei_hdcp crct10dif_pclmul crc32_pclmul snd_hda_intel cdc_ether snd_intel_nhlt e1000e usbnet mii snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core ptp pps_core snd_pcm mei_me mei prime_numbers thunderbolt
<4> [416.095444] CPU: 1 PID: 1394 Comm: gem_persistent_ Tainted: G     UD W         5.4.0-rc3-CI-CI_DRM_7115+ #1
<4> [416.095463] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019
<4> [416.095498] RIP: 0010:rcu_note_context_switch+0x7e/0x650
<4> [416.095504] Code: 74 17 65 48 8b 04 25 00 5f 01 00 8b 88 8c 08 00 00 85 c9 0f 84 96 03 00 00 45 84 ed 41 8b 84 24 80 03 00 00 75 69 85 c0 7e 11 <0f> 0b 41 80 bc 24 84 03 00 00 00 0f 84 a2 01 00 00 4c 89 e7 e8 89
<4> [416.095520] RSP: 0018:ffffc900005d76b8 EFLAGS: 00010002
<4> [416.095527] RAX: 0000000000000001 RBX: ffff88849fcb9ec0 RCX: 0000000000000000
<4> [416.095536] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
<4> [416.095544] RBP: ffffc900005d7750 R08: 0000000000000000 R09: 0000000000000000
<4> [416.095552] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888303eb0040
<4> [416.095559] R13: 0000000000000000 R14: ffff88849fcb9198 R15: ffff88849864c000
<4> [416.095568] FS:  00007f6806541300(0000) GS:ffff88849fc80000(0000) knlGS:0000000000000000
<4> [416.095577] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [416.095585] CR2: 00007f6806566000 CR3: 00000002ed790003 CR4: 0000000000760ee0
<4> [416.095593] PKRU: 55555554
<4> [416.095597] Call Trace:
<4> [416.095605]  __schedule+0xd0/0x7f0
<4> [416.095613]  ? wait_for_completion+0x108/0x1a0
<4> [416.095621]  schedule+0x34/0xc0
<4> [416.095627]  schedule_timeout+0x225/0x3f0
<4> [416.095634]  ? wait_for_completion+0x3f/0x1a0
<4> [416.095642]  ? wait_for_completion+0x108/0x1a0
<4> [416.095648]  wait_for_completion+0x130/0x1a0
<4> [416.095656]  ? wake_up_q+0x70/0x70
<4> [416.095664]  virt_efi_set_variable+0x151/0x1a0
<4> [416.095672]  efivar_entry_set_safe+0x115/0x220
<4> [416.095681]  ? efi_pstore_write+0x10b/0x150
<4> [416.095688]  efi_pstore_write+0x10b/0x150
<4> [416.095701]  pstore_dump+0x127/0x340
<4> [416.095712]  kmsg_dump+0x87/0x1c0
<4> [416.095720]  oops_end+0x3e/0x90
<4> [416.095726]  do_trap+0x80/0x100
<4> [416.095783]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.095794]  do_invalid_op+0x23/0x30
<4> [416.095841]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.095850]  invalid_op+0x23/0x30
<4> [416.095895] RIP: 0010:__i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.095903] Code: 68 16 df e0 48 8b 35 20 6a 1d 00 49 c7 c0 da 6c 47 a0 b9 7e 04 00 00 48 c7 c2 80 8c 41 a0 48 c7 c7 32 99 33 a0 e8 39 12 e6 e0 <0f> 0b 48 c7 c1 da 6c 47 a0 ba 84 04 00 00 48 c7 c6 80 8c 41 a0 48
<4> [416.095923] RSP: 0018:ffffc900005d7c48 EFLAGS: 00010286
<4> [416.095932] RAX: 000000000000000c RBX: ffff888499575188 RCX: 0000000000000000
<4> [416.095941] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88849d5b0400
<4> [416.095948] RBP: ffff8882f9001a80 R08: 000000000003195b R09: ffff888476292000
<4> [416.095956] R10: 0000000000000000 R11: ffff88849d5b0400 R12: ffff8882ff85cd48
<4> [416.095964] R13: ffffc900005d7ca0 R14: ffff8882ff85cb40 R15: ffffc900005d7ca0
<4> [416.096016]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.096063]  i915_vma_unbind+0x2d/0x50 [i915]
<4> [416.096110]  i915_gem_object_unbind+0x11c/0x250 [i915]
<4> [416.096160]  i915_gem_shrink+0x297/0x5f0 [i915]
<4> [416.096172]  ? lockdep_hardirqs_on+0xe3/0x1c0
<4> [416.096219]  ? i915_gem_shrink_all+0x38/0x60 [i915]
<4> [416.096264]  i915_gem_shrink_all+0x38/0x60 [i915]
<4> [416.096308]  i915_drop_caches_set+0xf3/0x250 [i915]
<4> [416.096319]  simple_attr_write+0xb0/0xd0
<4> [416.096328]  full_proxy_write+0x51/0x80
<4> [416.096336]  vfs_write+0xb9/0x1d0
<4> [416.096342]  ksys_write+0x9f/0xe0
<4> [416.096349]  do_syscall_64+0x4f/0x210
<4> [416.096356]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [416.096363] RIP: 0033:0x7f6805a9c281
<4> [416.096369] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [416.096389] RSP: 002b:00007ffe3f60adb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [416.096399] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6805a9c281
<4> [416.096407] RDX: 0000000000000004 RSI: 00007ffe3f60ae10 RDI: 0000000000000008
<4> [416.096416] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
<4> [416.096424] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3f60ae10
<4> [416.096432] R13: 0000000000000008 R14: 00007ffe3f60ae10 R15: 0000000000000000
<4> [416.096444] irq event stamp: 1718099
<4> [416.096452] hardirqs last  enabled at (1718099): [<ffffffff8101dc75>] do_error_trap+0xa5/0x100
<4> [416.096464] hardirqs last disabled at (1718098): [<ffffffff81001bba>] trace_hardirqs_off_thunk+0x1a/0x20
<4> [416.096475] softirqs last  enabled at (1718094): [<ffffffff81c00385>] __do_softirq+0x385/0x47f
<4> [416.096486] softirqs last disabled at (1718085): [<ffffffff810b7e9a>] irq_exit+0xba/0xc0
<4> [416.096496] ---[ end trace f43cacb32dbb9b2c ]---

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7116/shard-iclb8/igt@gem_persistent_relocs@forked-faulting-reloc-thrashing.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7087/shard-snb7/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrash-inactive.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7095/shard-hsw4/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrash-inactive.html
Comment 1 CI Bug Log 2019-10-18 18:12:38 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SNB HSW KBL ICL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma))
  (No new failures associated)
Comment 2 CI Bug Log 2019-10-18 19:54:47 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB HSW KBL ICL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ SNB HSW KBL ICL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7127/shard-iclb8/igt@gem_persistent_relocs@forked-interruptible-thrash-inactive.html
Comment 3 CI Bug Log 2019-10-19 06:50:18 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB HSW KBL ICL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ SNB HSW KBL ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7132/shard-tglb8/igt@gem_persistent_relocs@forked-thrash-inactive.html
Comment 4 CI Bug Log 2019-10-22 09:38:22 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB HSW KBL ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ SNB APL HSW KBL ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7134/shard-apl6/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrash-inactive.html
Comment 5 prathap.kumar.valsan 2019-10-23 05:17:21 UTC
Based on  reading the code, it looks to me that there is a BUG in driver where a single VMA is being added to the bound list more than once.

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index e90c4d0af8fd..dd930a3de013 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -921,7 +921,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
        /* There should only be at most 2 active bindings (user, global) */
        GEM_BUG_ON(bound + I915_VMA_PAGES_ACTIVE < bound);
        atomic_add(I915_VMA_PAGES_ACTIVE, &vma->pages_count);
-       list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+       list_del(&vma->vm_link);

        __i915_vma_pin(vma);
        GEM_BUG_ON(!i915_vma_is_pinned(vma));
Comment 6 CI Bug Log 2019-10-23 06:25:17 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB APL HSW KBL ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ SNB APL HSW KBL CFL WHL CML ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_388/fi-cml-s/igt@gem_persistent_relocs@forked-thrashing.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_389/fi-icl-u4/igt@gem_persistent_relocs@forked-interruptible-thrashing.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_389/fi-whl-u/igt@gem_persistent_relocs@forked-thrashing.html
Comment 7 CI Bug Log 2019-10-25 15:07:28 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB APL HSW KBL CFL WHL CML ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ SNB APL HSW KBL CFL WHL CML ICL TGL: igt@gem_persistent_relocs@forked-* - timeout/incomplete - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7168/shard-tglb2/igt@gem_persistent_relocs@forked-thrashing.html
Comment 8 CI Bug Log 2019-10-28 13:47:57 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB APL HSW KBL CFL WHL CML ICL TGL: igt@gem_persistent_relocs@forked-* - timeout/incomplete - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ All machines: igt@gem_persistent_relocs@forked-* - timeout/incomplete - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_392/fi-bwr-2160/igt@gem_persistent_relocs@forked-thrash-inactive.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_393/fi-byt-j1900/igt@gem_persistent_relocs@forked-interruptible-thrash-inactive.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_394/fi-byt-j1900/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrash-inactive.html
Comment 9 CI Bug Log 2019-10-31 09:12:42 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* KBL: igt@aborted-runner - fail - Previous test: gem_persistent_relocs (forked-thrashing)
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7219/shard-kbl2/igt@runner@aborted.html
Comment 10 Francesco Balestrieri 2019-11-11 10:34:03 UTC
BAT, all platforms, non-negligible reproduction rate.
Comment 11 CI Bug Log 2019-11-12 08:47:08 UTC
A CI Bug Log filter associated to this bug has been updated:

{- KBL: igt@aborted-runner - fail - Previous test: gem_persistent_relocs (forked-thrashing) -}
{+ KBL: igt@aborted-runner - fail - Previous test: gem_persistent_relocs (forked-thrashing) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7295/shard-kbl6/igt@runner@aborted.html
Comment 12 Martin Peres 2019-11-29 19:42:17 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/530.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.