Bug 112068 - [CI][BAT]igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma))
Summary: [CI][BAT]igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_i...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-18 18:11 UTC by Lakshmi
Modified: 2019-11-12 08:47 UTC (History)
2 users (show)

See Also:
i915 platform: ALL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lakshmi 2019-10-18 18:11:18 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7115/shard-iclb3/igt@gem_persistent_relocs@forked-thrashing.html
<0> [416.075782] gem_pers-1394    1.... 408908675us : __i915_vma_unbind.part.39: __i915_vma_unbind:1150 GEM_BUG_ON(i915_vma_is_active(vma))
<0> [416.075791] ---------------------------------
<4> [416.076288] ---[ end trace f43cacb32dbb9b2b ]---
<3> [416.083258] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
<3> [416.083269] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1394, name: gem_persistent_
<4> [416.083279] INFO: lockdep is turned off.
<3> [416.083284] Preemption disabled at:
<4> [416.083286] [<0000000000000000>] 0x0
<4> [416.083297] CPU: 1 PID: 1394 Comm: gem_persistent_ Tainted: G     UD           5.4.0-rc3-CI-CI_DRM_7115+ #1
<4> [416.083307] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019
<4> [416.083320] Call Trace:
<4> [416.083328]  dump_stack+0x67/0x9b
<4> [416.083336]  ___might_sleep+0x178/0x260
<4> [416.083345]  wait_for_completion+0x37/0x1a0
<4> [416.083355]  virt_efi_query_variable_info+0x161/0x1b0
<4> [416.083365]  efi_query_variable_store+0xb3/0x1a0
<4> [416.083374]  ? efivar_entry_set_safe+0x19c/0x220
<4> [416.083380]  efivar_entry_set_safe+0x19c/0x220
<4> [416.083512]  ? efi_pstore_write+0x10b/0x150
<4> [416.083518]  efi_pstore_write+0x10b/0x150
<4> [416.083530]  pstore_dump+0x127/0x340
<4> [416.083540]  kmsg_dump+0x87/0x1c0
<4> [416.083547]  oops_end+0x3e/0x90
<4> [416.083554]  do_trap+0x80/0x100
<4> [416.083612]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.083622]  do_invalid_op+0x23/0x30
<4> [416.083672]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.083680]  invalid_op+0x23/0x30
<4> [416.083726] RIP: 0010:__i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.083734] Code: 68 16 df e0 48 8b 35 20 6a 1d 00 49 c7 c0 da 6c 47 a0 b9 7e 04 00 00 48 c7 c2 80 8c 41 a0 48 c7 c7 32 99 33 a0 e8 39 12 e6 e0 <0f> 0b 48 c7 c1 da 6c 47 a0 ba 84 04 00 00 48 c7 c6 80 8c 41 a0 48
<4> [416.083753] RSP: 0018:ffffc900005d7c48 EFLAGS: 00010286
<4> [416.083760] RAX: 000000000000000c RBX: ffff888499575188 RCX: 0000000000000000
<4> [416.083768] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88849d5b0400
<4> [416.083775] RBP: ffff8882f9001a80 R08: 000000000003195b R09: ffff888476292000
<4> [416.083783] R10: 0000000000000000 R11: ffff88849d5b0400 R12: ffff8882ff85cd48
<4> [416.083790] R13: ffffc900005d7ca0 R14: ffff8882ff85cb40 R15: ffffc900005d7ca0
<4> [416.083842]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.083888]  i915_vma_unbind+0x2d/0x50 [i915]
<4> [416.083936]  i915_gem_object_unbind+0x11c/0x250 [i915]
<4> [416.083984]  i915_gem_shrink+0x297/0x5f0 [i915]
<4> [416.083997]  ? lockdep_hardirqs_on+0xe3/0x1c0
<4> [416.084043]  ? i915_gem_shrink_all+0x38/0x60 [i915]
<4> [416.084089]  i915_gem_shrink_all+0x38/0x60 [i915]
<4> [416.084134]  i915_drop_caches_set+0xf3/0x250 [i915]
<4> [416.084145]  simple_attr_write+0xb0/0xd0
<4> [416.084153]  full_proxy_write+0x51/0x80
<4> [416.084161]  vfs_write+0xb9/0x1d0
<4> [416.084167]  ksys_write+0x9f/0xe0
<4> [416.084174]  do_syscall_64+0x4f/0x210
<4> [416.084181]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [416.084188] RIP: 0033:0x7f6805a9c281
<4> [416.084194] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [416.084214] RSP: 002b:00007ffe3f60adb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [416.084224] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6805a9c281
<4> [416.084233] RDX: 0000000000000004 RSI: 00007ffe3f60ae10 RDI: 0000000000000008
<4> [416.084241] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
<4> [416.084250] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3f60ae10
<4> [416.084258] R13: 0000000000000008 R14: 00007ffe3f60ae10 R15: 0000000000000000
<4> [416.084335] ------------[ cut here ]------------
<4> [416.084343] WARNING: CPU: 1 PID: 1394 at kernel/rcu/tree_plugin.h:293 rcu_note_context_switch+0x7e/0x650
<4> [416.084354] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp mei_hdcp crct10dif_pclmul crc32_pclmul snd_hda_intel cdc_ether snd_intel_nhlt e1000e usbnet mii snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core ptp pps_core snd_pcm mei_me mei prime_numbers thunderbolt
<4> [416.095444] CPU: 1 PID: 1394 Comm: gem_persistent_ Tainted: G     UD W         5.4.0-rc3-CI-CI_DRM_7115+ #1
<4> [416.095463] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019
<4> [416.095498] RIP: 0010:rcu_note_context_switch+0x7e/0x650
<4> [416.095504] Code: 74 17 65 48 8b 04 25 00 5f 01 00 8b 88 8c 08 00 00 85 c9 0f 84 96 03 00 00 45 84 ed 41 8b 84 24 80 03 00 00 75 69 85 c0 7e 11 <0f> 0b 41 80 bc 24 84 03 00 00 00 0f 84 a2 01 00 00 4c 89 e7 e8 89
<4> [416.095520] RSP: 0018:ffffc900005d76b8 EFLAGS: 00010002
<4> [416.095527] RAX: 0000000000000001 RBX: ffff88849fcb9ec0 RCX: 0000000000000000
<4> [416.095536] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
<4> [416.095544] RBP: ffffc900005d7750 R08: 0000000000000000 R09: 0000000000000000
<4> [416.095552] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888303eb0040
<4> [416.095559] R13: 0000000000000000 R14: ffff88849fcb9198 R15: ffff88849864c000
<4> [416.095568] FS:  00007f6806541300(0000) GS:ffff88849fc80000(0000) knlGS:0000000000000000
<4> [416.095577] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [416.095585] CR2: 00007f6806566000 CR3: 00000002ed790003 CR4: 0000000000760ee0
<4> [416.095593] PKRU: 55555554
<4> [416.095597] Call Trace:
<4> [416.095605]  __schedule+0xd0/0x7f0
<4> [416.095613]  ? wait_for_completion+0x108/0x1a0
<4> [416.095621]  schedule+0x34/0xc0
<4> [416.095627]  schedule_timeout+0x225/0x3f0
<4> [416.095634]  ? wait_for_completion+0x3f/0x1a0
<4> [416.095642]  ? wait_for_completion+0x108/0x1a0
<4> [416.095648]  wait_for_completion+0x130/0x1a0
<4> [416.095656]  ? wake_up_q+0x70/0x70
<4> [416.095664]  virt_efi_set_variable+0x151/0x1a0
<4> [416.095672]  efivar_entry_set_safe+0x115/0x220
<4> [416.095681]  ? efi_pstore_write+0x10b/0x150
<4> [416.095688]  efi_pstore_write+0x10b/0x150
<4> [416.095701]  pstore_dump+0x127/0x340
<4> [416.095712]  kmsg_dump+0x87/0x1c0
<4> [416.095720]  oops_end+0x3e/0x90
<4> [416.095726]  do_trap+0x80/0x100
<4> [416.095783]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.095794]  do_invalid_op+0x23/0x30
<4> [416.095841]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.095850]  invalid_op+0x23/0x30
<4> [416.095895] RIP: 0010:__i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.095903] Code: 68 16 df e0 48 8b 35 20 6a 1d 00 49 c7 c0 da 6c 47 a0 b9 7e 04 00 00 48 c7 c2 80 8c 41 a0 48 c7 c7 32 99 33 a0 e8 39 12 e6 e0 <0f> 0b 48 c7 c1 da 6c 47 a0 ba 84 04 00 00 48 c7 c6 80 8c 41 a0 48
<4> [416.095923] RSP: 0018:ffffc900005d7c48 EFLAGS: 00010286
<4> [416.095932] RAX: 000000000000000c RBX: ffff888499575188 RCX: 0000000000000000
<4> [416.095941] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88849d5b0400
<4> [416.095948] RBP: ffff8882f9001a80 R08: 000000000003195b R09: ffff888476292000
<4> [416.095956] R10: 0000000000000000 R11: ffff88849d5b0400 R12: ffff8882ff85cd48
<4> [416.095964] R13: ffffc900005d7ca0 R14: ffff8882ff85cb40 R15: ffffc900005d7ca0
<4> [416.096016]  ? __i915_vma_unbind.part.39+0x207/0x460 [i915]
<4> [416.096063]  i915_vma_unbind+0x2d/0x50 [i915]
<4> [416.096110]  i915_gem_object_unbind+0x11c/0x250 [i915]
<4> [416.096160]  i915_gem_shrink+0x297/0x5f0 [i915]
<4> [416.096172]  ? lockdep_hardirqs_on+0xe3/0x1c0
<4> [416.096219]  ? i915_gem_shrink_all+0x38/0x60 [i915]
<4> [416.096264]  i915_gem_shrink_all+0x38/0x60 [i915]
<4> [416.096308]  i915_drop_caches_set+0xf3/0x250 [i915]
<4> [416.096319]  simple_attr_write+0xb0/0xd0
<4> [416.096328]  full_proxy_write+0x51/0x80
<4> [416.096336]  vfs_write+0xb9/0x1d0
<4> [416.096342]  ksys_write+0x9f/0xe0
<4> [416.096349]  do_syscall_64+0x4f/0x210
<4> [416.096356]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [416.096363] RIP: 0033:0x7f6805a9c281
<4> [416.096369] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [416.096389] RSP: 002b:00007ffe3f60adb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [416.096399] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6805a9c281
<4> [416.096407] RDX: 0000000000000004 RSI: 00007ffe3f60ae10 RDI: 0000000000000008
<4> [416.096416] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
<4> [416.096424] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3f60ae10
<4> [416.096432] R13: 0000000000000008 R14: 00007ffe3f60ae10 R15: 0000000000000000
<4> [416.096444] irq event stamp: 1718099
<4> [416.096452] hardirqs last  enabled at (1718099): [<ffffffff8101dc75>] do_error_trap+0xa5/0x100
<4> [416.096464] hardirqs last disabled at (1718098): [<ffffffff81001bba>] trace_hardirqs_off_thunk+0x1a/0x20
<4> [416.096475] softirqs last  enabled at (1718094): [<ffffffff81c00385>] __do_softirq+0x385/0x47f
<4> [416.096486] softirqs last disabled at (1718085): [<ffffffff810b7e9a>] irq_exit+0xba/0xc0
<4> [416.096496] ---[ end trace f43cacb32dbb9b2c ]---

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7116/shard-iclb8/igt@gem_persistent_relocs@forked-faulting-reloc-thrashing.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7087/shard-snb7/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrash-inactive.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7095/shard-hsw4/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrash-inactive.html
Comment 1 CI Bug Log 2019-10-18 18:12:38 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SNB HSW KBL ICL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma))
  (No new failures associated)
Comment 2 CI Bug Log 2019-10-18 19:54:47 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB HSW KBL ICL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ SNB HSW KBL ICL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7127/shard-iclb8/igt@gem_persistent_relocs@forked-interruptible-thrash-inactive.html
Comment 3 CI Bug Log 2019-10-19 06:50:18 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB HSW KBL ICL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ SNB HSW KBL ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7132/shard-tglb8/igt@gem_persistent_relocs@forked-thrash-inactive.html
Comment 4 CI Bug Log 2019-10-22 09:38:22 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB HSW KBL ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ SNB APL HSW KBL ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7134/shard-apl6/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrash-inactive.html
Comment 5 prathap.kumar.valsan 2019-10-23 05:17:21 UTC
Based on  reading the code, it looks to me that there is a BUG in driver where a single VMA is being added to the bound list more than once.

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index e90c4d0af8fd..dd930a3de013 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -921,7 +921,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
        /* There should only be at most 2 active bindings (user, global) */
        GEM_BUG_ON(bound + I915_VMA_PAGES_ACTIVE < bound);
        atomic_add(I915_VMA_PAGES_ACTIVE, &vma->pages_count);
-       list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+       list_del(&vma->vm_link);

        __i915_vma_pin(vma);
        GEM_BUG_ON(!i915_vma_is_pinned(vma));
Comment 6 CI Bug Log 2019-10-23 06:25:17 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB APL HSW KBL ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ SNB APL HSW KBL CFL WHL CML ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_388/fi-cml-s/igt@gem_persistent_relocs@forked-thrashing.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_389/fi-icl-u4/igt@gem_persistent_relocs@forked-interruptible-thrashing.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_389/fi-whl-u/igt@gem_persistent_relocs@forked-thrashing.html
Comment 7 CI Bug Log 2019-10-25 15:07:28 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB APL HSW KBL CFL WHL CML ICL TGL: igt@gem_persistent_relocs@forked-* - timeout - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ SNB APL HSW KBL CFL WHL CML ICL TGL: igt@gem_persistent_relocs@forked-* - timeout/incomplete - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7168/shard-tglb2/igt@gem_persistent_relocs@forked-thrashing.html
Comment 8 CI Bug Log 2019-10-28 13:47:57 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SNB APL HSW KBL CFL WHL CML ICL TGL: igt@gem_persistent_relocs@forked-* - timeout/incomplete - GEM_BUG_ON(i915_vma_is_active(vma)) -}
{+ All machines: igt@gem_persistent_relocs@forked-* - timeout/incomplete - GEM_BUG_ON(i915_vma_is_active(vma)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_392/fi-bwr-2160/igt@gem_persistent_relocs@forked-thrash-inactive.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_393/fi-byt-j1900/igt@gem_persistent_relocs@forked-interruptible-thrash-inactive.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_394/fi-byt-j1900/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrash-inactive.html
Comment 9 CI Bug Log 2019-10-31 09:12:42 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* KBL: igt@aborted-runner - fail - Previous test: gem_persistent_relocs (forked-thrashing)
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7219/shard-kbl2/igt@runner@aborted.html
Comment 10 Francesco Balestrieri 2019-11-11 10:34:03 UTC
BAT, all platforms, non-negligible reproduction rate.
Comment 11 CI Bug Log 2019-11-12 08:47:08 UTC
A CI Bug Log filter associated to this bug has been updated:

{- KBL: igt@aborted-runner - fail - Previous test: gem_persistent_relocs (forked-thrashing) -}
{+ KBL: igt@aborted-runner - fail - Previous test: gem_persistent_relocs (forked-thrashing) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7295/shard-kbl6/igt@runner@aborted.html


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.