Bug 110343

Summary: [CI][SHARDS] igt@* - dmesg-warn - DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
Product: DRI Reporter: Lakshmi <lakshminarayana.vudum>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED INVALID QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: ALL i915 features: GEM/Other

Description Lakshmi 2019-04-05 19:58:08 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5875/shard-iclb6/igt@gem_flink_race@flink_close.html

<6> [2420.394363] Console: switching to colour dummy device 80x25
<6> [2420.394412] [IGT] gem_flink_race: executing
<6> [2420.399327] [IGT] gem_flink_race: starting subtest flink_close
<4> [2425.300723] ------------[ cut here ]------------
<4> [2425.300734] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
<4> [2425.300742] WARNING: CPU: 5 PID: 7676 at kernel/locking/mutex-debug.c:103 mutex_destroy+0x49/0x50
<4> [2425.300759] Modules linked in: i2c_dev vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal i915 coretemp mei_hdcp crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec cdc_ether snd_hwdep ghash_clmulni_intel usbnet snd_hda_core mii e1000e i2c_i801 snd_pcm ptp pps_core mei_me mei prime_numbers
<4> [2425.300803] CPU: 5 PID: 7676 Comm: kworker/u16:41 Tainted: G     U            5.1.0-rc3-CI-CI_DRM_5875+ #1
<4> [2425.300815] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [2425.300880] Workqueue: i915 __i915_gem_free_work [i915]
<4> [2425.300889] RIP: 0010:mutex_destroy+0x49/0x50
<4> [2425.300896] Code: 00 00 5b c3 e8 48 26 3b 00 85 c0 74 ed 8b 05 a2 5c 23 01 85 c0 75 e3 48 c7 c6 60 01 08 82 48 c7 c7 18 df 06 82 e8 d7 17 fa ff <0f> 0b eb cc 0f 1f 00 48 b8 11 11 11 11 11 11 11 11 48 89 76 20 48
<4> [2425.300916] RSP: 0018:ffffc90000adbdd8 EFLAGS: 00010282
<4> [2425.300924] RAX: 0000000000000000 RBX: ffff88831a25bf10 RCX: 0000000000000000
<4> [2425.300933] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffff81126564
<4> [2425.300941] RBP: ffff8884936b9208 R08: 0000000000000000 R09: 0000000000000000
<4> [2425.300950] R10: 0000000000000000 R11: 0000000000001dfc R12: ffff8884936b0000
<4> [2425.300958] R13: ffff8884936b0068 R14: ffffffff8233ebf0 R15: 0000000000000000
<4> [2425.300967] FS:  0000000000000000(0000) GS:ffff88849ff40000(0000) knlGS:0000000000000000
<4> [2425.300977] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [2425.300985] CR2: 00007fb44af089a8 CR3: 000000040cf06001 CR4: 0000000000760ee0
<4> [2425.300993] PKRU: 55555554
<4> [2425.300997] Call Trace:
<4> [2425.301045]  __i915_gem_free_objects+0x39e/0x720 [i915]
<4> [2425.301092]  __i915_gem_free_work+0x5a/0x90 [i915]
<4> [2425.301101]  process_one_work+0x245/0x610
<4> [2425.301109]  worker_thread+0x37/0x380
<4> [2425.301117]  ? process_one_work+0x610/0x610
<4> [2425.301124]  kthread+0x119/0x130
<4> [2425.301130]  ? kthread_park+0x80/0x80
<4> [2425.301139]  ret_from_fork+0x24/0x50
<4> [2425.301148] irq event stamp: 1855377
<4> [2425.301155] hardirqs last  enabled at (1855377): [<ffffffff8122ea58>] __slab_free+0x3e8/0x4f0
<4> [2425.301166] hardirqs last disabled at (1855376): [<ffffffff8122e7a9>] __slab_free+0x139/0x4f0
<4> [2425.301178] softirqs last  enabled at (1849702): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [2425.301190] softirqs last disabled at (1849695): [<ffffffff810b5159>] irq_exit+0xa9/0xc0
<4> [2425.301201] WARNING: CPU: 5 PID: 7676 at kernel/locking/mutex-debug.c:103 mutex_destroy+0x49/0x50
<4> [2425.301211] ---[ end trace cfc39dc2547ed83f ]---
<6> [2425.593015] [IGT] gem_flink_race: exiting, ret=0
<5> [2425.593215] Setting dangerous option reset - tainting kernel
<6> [2425.630793] Console: switching to colour frame buffer device 240x67
Comment 1 CI Bug Log 2019-04-05 20:01:26 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@gem_flink_race@flink_close - dmesg-warn - DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5875/shard-iclb6/igt@gem_flink_race@flink_close.html

* ICL: igt@runner@aborted - fail - Previous test: gem_flink_race (flink_close)
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5875/shard-iclb6/igt@runner@aborted.html
Comment 2 Chris Wilson 2019-04-05 20:28:39 UTC
icl and memory coruption? It's complaining about reservation_object being locked and that is execbuf, and this is just create/flink/close -- nothing using the reservation object lock. The ongoing memory corruption seems more likely.
Comment 3 Francesco Balestrieri 2019-04-08 04:25:12 UTC
Happened once so far, let's keep watching. Is the "ongoing memory corruption" on a specific machine?
Comment 4 Francesco Balestrieri 2019-04-08 04:45:10 UTC
> Is the "ongoing memory corruption" on a specific machine?

No, it's suspected on at least another one (icl-u2)
Comment 5 Francesco Balestrieri 2019-04-11 08:15:16 UTC
Happened once so far. Keep watching.
Comment 6 Jani Saarinen 2019-04-22 15:40:04 UTC
Same still. Lower priority? 
u2/u3 and shards have now updated BIOS
Comment 7 Francesco Balestrieri 2019-06-03 05:28:12 UTC
Still nothing. Closing.
Comment 8 CI Bug Log 2019-06-12 10:17:10 UTC
A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@gem_flink_race@flink_close - dmesg-warn - DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) -}
{+ All machine: all tests - dmesg-warn / incomplete - DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) +}


  No new failures caught with the new filter
Comment 9 Martin Peres 2019-06-12 10:18:49 UTC
(In reply to Francesco Balestrieri from comment #7)
> Still nothing. Closing.

Sorry to disappoint: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13248/fi-skl-gvtdvm/igt@i915_module_load@reload-with-fault-injection.html

<4> [285.220850] ------------[ cut here ]------------
<4> [285.220851] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
<4> [285.220862] WARNING: CPU: 0 PID: 7 at kernel/locking/mutex-debug.c:103 mutex_destroy+0x49/0x50
<4> [285.220862] Modules linked in: i915(-) snd_hda_codec snd_hwdep snd_hda_core snd_pcm vgem crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000 i2c_piix4 prime_numbers [last unloaded: i915]
<4> [285.220868] CPU: 0 PID: 7 Comm: kworker/u2:0 Tainted: G     U            5.2.0-rc4-CI-Patchwork_13248+ #1
<4> [285.220869] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [285.220909] Workqueue: i915 __i915_gem_free_work [i915]
<4> [285.220911] RIP: 0010:mutex_destroy+0x49/0x50
<4> [285.220912] Code: 00 00 5b c3 e8 58 a4 3b 00 85 c0 74 ed 8b 05 3e 69 23 01 85 c0 75 e3 48 c7 c6 e0 d3 08 82 48 c7 c7 88 ae 07 82 e8 e7 06 fa ff <0f> 0b eb cc 0f 1f 00 48 b8 11 11 11 11 11 11 11 11 48 89 76 20 48
<4> [285.220913] RSP: 0018:ffffc90000043de0 EFLAGS: 00010286
<4> [285.220914] RAX: 0000000000000000 RBX: ffff88803e6dbf08 RCX: 0000000000000000
<4> [285.220915] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffffffff8112867c
<4> [285.220915] RBP: 0000000000000202 R08: 0000000000000000 R09: 0000000000039600
<4> [285.220916] R10: 0000000000000000 R11: 0000000000000007 R12: ffff888037cf0000
<4> [285.220917] R13: ffff888037cf0068 R14: ffffffff82341c28 R15: 0000000000000000
<4> [285.220918] FS:  0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000
<4> [285.220920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [285.220921] CR2: 000055af2b30d258 CR3: 0000000066e88001 CR4: 00000000003606f0
<4> [285.220922] Call Trace:
<4> [285.220955]  __i915_gem_free_objects+0x36e/0x700 [i915]
<4> [285.220987]  __i915_gem_free_work+0x5a/0x90 [i915]
<4> [285.220990]  process_one_work+0x245/0x610
<4> [285.220993]  worker_thread+0x37/0x380
<4> [285.220995]  ? process_one_work+0x610/0x610
<4> [285.220997]  kthread+0x119/0x130
<4> [285.220998]  ? kthread_park+0x80/0x80
<4> [285.221001]  ret_from_fork+0x3a/0x50
<4> [285.221004] irq event stamp: 490337
<4> [285.221007] hardirqs last  enabled at (490337): [<ffffffff812149c8>] free_unref_page+0x48/0x60
<4> [285.221008] hardirqs last disabled at (490336): [<ffffffff812149b3>] free_unref_page+0x33/0x60
<4> [285.221010] softirqs last  enabled at (489738): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [285.221012] softirqs last disabled at (489731): [<ffffffff810b6399>] irq_exit+0xa9/0xc0
<4> [285.221014] WARNING: CPU: 0 PID: 7 at kernel/locking/mutex-debug.c:103 mutex_destroy+0x49/0x50
<4> [285.221014] ---[ end trace 77cae7935b558ee6 ]---
<4> [285.221027] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4> [285.221029] CPU: 0 PID: 3053 Comm: i915_module_loa Tainted: G     U  W         5.2.0-rc4-CI-Patchwork_13248+ #1
<4> [285.221029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [285.221062] RIP: 0010:i915_gem_object_wait+0x50/0x440 [i915]
<4> [285.221063] Code: 48 89 55 b0 31 d2 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 20 e4 ca e0 4d 85 f6 0f 88 d6 02 00 00 4c 8b a3 88 04 00 00 <45> 8b b4 24 90 00 00 00 41 f6 c6 01 0f 85 7b 01 00 00 41 f6 c5 08
<4> [285.221064] RSP: 0018:ffffc900001bfd10 EFLAGS: 00010206
<4> [285.221065] RAX: 0000000000000246 RBX: ffff88803e6dba40 RCX: 0000000000000001
<4> [285.221065] RDX: ffff888079048040 RSI: 00000000ffffffff RDI: 0000000000000246
<4> [285.221066] RBP: ffffc900001bfd60 R08: 0000000000000000 R09: 0000000000000001
<4> [285.221067] R10: ffffc900001bfd98 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b
<4> [285.221067] R13: 0000000000000001 R14: 7fffffffffffffff R15: ffffffffa055b348
<4> [285.221068] FS:  00007fd2ea773e40(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000
<4> [285.221069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [285.221070] CR2: 000055af2b30d258 CR3: 0000000066e88001 CR4: 00000000003606f0
<4> [285.221070] Call Trace:
<4> [285.221102]  i915_gem_object_set_to_gtt_domain+0x3b/0x270 [i915]
<4> [285.221133]  i915_gem_suspend_late+0x88/0x120 [i915]
<4> [285.221168]  i915_gem_fini_hw+0x35/0x110 [i915]
<4> [285.221190]  i915_driver_unload+0xdd/0x100 [i915]
<4> [285.221212]  i915_pci_remove+0x19/0x30 [i915]
<4> [285.221215]  pci_device_remove+0x36/0xb0
<4> [285.221217]  device_release_driver_internal+0xd3/0x1b0
<4> [285.221219]  driver_detach+0x3f/0x80
<4> [285.221221]  bus_remove_driver+0x53/0xd0
<4> [285.221223]  pci_unregister_driver+0x25/0xa0
<4> [285.221283]  i915_exit+0x16/0x1c [i915]
<4> [285.221285]  __se_sys_delete_module+0x162/0x210
<4> [285.221287]  ? trace_hardirqs_off_thunk+0x1a/0x1c
<4> [285.221289]  ? do_syscall_64+0xd/0x1c0
<4> [285.221290]  do_syscall_64+0x55/0x1c0
<4> [285.221292]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [285.221293] RIP: 0033:0x7fd2e9e1a1b7
<4> [285.221295] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
<4> [285.221296] RSP: 002b:00007ffee793b428 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
<4> [285.221297] RAX: ffffffffffffffda RBX: 00007ffee793b5a0 RCX: 00007fd2e9e1a1b7
<4> [285.221298] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 000055bc83493b08
<4> [285.221298] RBP: 000055bc83493aa0 R08: 000055bc83493b0c R09: 00007ffee793b450
<4> [285.221299] R10: 00007ffee793a424 R11: 0000000000000206 R12: 0000000000000000
<4> [285.221300] R13: 00007ffee793bb10 R14: 0000000000000000 R15: 0000000000000000
<4> [285.221303] Modules linked in: i915(-) snd_hda_codec snd_hwdep snd_hda_core snd_pcm vgem crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000 i2c_piix4 prime_numbers [last unloaded: i915]
Comment 10 Chris Wilson 2019-06-12 10:22:21 UTC
(In reply to Martin Peres from comment #9)
> (In reply to Francesco Balestrieri from comment #7)
> > Still nothing. Closing.
> 
> Sorry to disappoint:
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13248/fi-skl-gvtdvm/
> igt@i915_module_load@reload-with-fault-injection.html
> 
> <4> [285.220850] ------------[ cut here ]------------
> <4> [285.220851] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
> <4> [285.220862] WARNING: CPU: 0 PID: 7 at kernel/locking/mutex-debug.c:103
> mutex_destroy+0x49/0x50
> <4> [285.220862] Modules linked in: i915(-) snd_hda_codec snd_hwdep
> snd_hda_core snd_pcm vgem crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> e1000 i2c_piix4 prime_numbers [last unloaded: i915]
> <4> [285.220868] CPU: 0 PID: 7 Comm: kworker/u2:0 Tainted: G     U          
> 5.2.0-rc4-CI-Patchwork_13248+ #1
> <4> [285.220869] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
> <4> [285.220909] Workqueue: i915 __i915_gem_free_work [i915]
> <4> [285.220911] RIP: 0010:mutex_destroy+0x49/0x50
> <4> [285.220912] Code: 00 00 5b c3 e8 58 a4 3b 00 85 c0 74 ed 8b 05 3e 69 23
> 01 85 c0 75 e3 48 c7 c6 e0 d3 08 82 48 c7 c7 88 ae 07 82 e8 e7 06 fa ff <0f>
> 0b eb cc 0f 1f 00 48 b8 11 11 11 11 11 11 11 11 48 89 76 20 48
> <4> [285.220913] RSP: 0018:ffffc90000043de0 EFLAGS: 00010286
> <4> [285.220914] RAX: 0000000000000000 RBX: ffff88803e6dbf08 RCX:
> 0000000000000000
> <4> [285.220915] RDX: 0000000000000002 RSI: 0000000000000000 RDI:
> ffffffff8112867c
> <4> [285.220915] RBP: 0000000000000202 R08: 0000000000000000 R09:
> 0000000000039600
> <4> [285.220916] R10: 0000000000000000 R11: 0000000000000007 R12:
> ffff888037cf0000
> <4> [285.220917] R13: ffff888037cf0068 R14: ffffffff82341c28 R15:
> 0000000000000000
> <4> [285.220918] FS:  0000000000000000(0000) GS:ffff88807da00000(0000)
> knlGS:0000000000000000
> <4> [285.220920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [285.220921] CR2: 000055af2b30d258 CR3: 0000000066e88001 CR4:
> 00000000003606f0
> <4> [285.220922] Call Trace:
> <4> [285.220955]  __i915_gem_free_objects+0x36e/0x700 [i915]
...
> <4> [285.221027] general protection fault: 0000 [#1] PREEMPT SMP PTI
> <4> [285.221029] CPU: 0 PID: 3053 Comm: i915_module_loa Tainted: G     U  W 
> 5.2.0-rc4-CI-Patchwork_13248+ #1
> <4> [285.221029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
> <4> [285.221062] RIP: 0010:i915_gem_object_wait+0x50/0x440 [i915]
> <4> [285.221070] Call Trace:
> <4> [285.221102]  i915_gem_object_set_to_gtt_domain+0x3b/0x270 [i915]
> <4> [285.221133]  i915_gem_suspend_late+0x88/0x120 [i915]

Except that quite clearly is a very different bug and a regression.
Comment 11 CI Bug Log 2019-11-15 12:37:21 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.
Comment 12 Lakshmi 2019-11-15 12:38:48 UTC
Archiving this issue. Last seen CI_DRM_5875_full (7 months, 1 week old).
Other issues seen on this bug is reported under 112298.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.