Bug 110343 - [CI][SHARDS] igt@* - dmesg-warn - DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
Summary: [CI][SHARDS] igt@* - dmesg-warn - DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
Status: RESOLVED INVALID
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-05 19:58 UTC by Lakshmi
Modified: 2019-06-12 10:23 UTC (History)
1 user (show)

See Also:
i915 platform: ALL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lakshmi 2019-04-05 19:58:08 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5875/shard-iclb6/igt@gem_flink_race@flink_close.html

<6> [2420.394363] Console: switching to colour dummy device 80x25
<6> [2420.394412] [IGT] gem_flink_race: executing
<6> [2420.399327] [IGT] gem_flink_race: starting subtest flink_close
<4> [2425.300723] ------------[ cut here ]------------
<4> [2425.300734] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
<4> [2425.300742] WARNING: CPU: 5 PID: 7676 at kernel/locking/mutex-debug.c:103 mutex_destroy+0x49/0x50
<4> [2425.300759] Modules linked in: i2c_dev vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal i915 coretemp mei_hdcp crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec cdc_ether snd_hwdep ghash_clmulni_intel usbnet snd_hda_core mii e1000e i2c_i801 snd_pcm ptp pps_core mei_me mei prime_numbers
<4> [2425.300803] CPU: 5 PID: 7676 Comm: kworker/u16:41 Tainted: G     U            5.1.0-rc3-CI-CI_DRM_5875+ #1
<4> [2425.300815] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [2425.300880] Workqueue: i915 __i915_gem_free_work [i915]
<4> [2425.300889] RIP: 0010:mutex_destroy+0x49/0x50
<4> [2425.300896] Code: 00 00 5b c3 e8 48 26 3b 00 85 c0 74 ed 8b 05 a2 5c 23 01 85 c0 75 e3 48 c7 c6 60 01 08 82 48 c7 c7 18 df 06 82 e8 d7 17 fa ff <0f> 0b eb cc 0f 1f 00 48 b8 11 11 11 11 11 11 11 11 48 89 76 20 48
<4> [2425.300916] RSP: 0018:ffffc90000adbdd8 EFLAGS: 00010282
<4> [2425.300924] RAX: 0000000000000000 RBX: ffff88831a25bf10 RCX: 0000000000000000
<4> [2425.300933] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffff81126564
<4> [2425.300941] RBP: ffff8884936b9208 R08: 0000000000000000 R09: 0000000000000000
<4> [2425.300950] R10: 0000000000000000 R11: 0000000000001dfc R12: ffff8884936b0000
<4> [2425.300958] R13: ffff8884936b0068 R14: ffffffff8233ebf0 R15: 0000000000000000
<4> [2425.300967] FS:  0000000000000000(0000) GS:ffff88849ff40000(0000) knlGS:0000000000000000
<4> [2425.300977] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [2425.300985] CR2: 00007fb44af089a8 CR3: 000000040cf06001 CR4: 0000000000760ee0
<4> [2425.300993] PKRU: 55555554
<4> [2425.300997] Call Trace:
<4> [2425.301045]  __i915_gem_free_objects+0x39e/0x720 [i915]
<4> [2425.301092]  __i915_gem_free_work+0x5a/0x90 [i915]
<4> [2425.301101]  process_one_work+0x245/0x610
<4> [2425.301109]  worker_thread+0x37/0x380
<4> [2425.301117]  ? process_one_work+0x610/0x610
<4> [2425.301124]  kthread+0x119/0x130
<4> [2425.301130]  ? kthread_park+0x80/0x80
<4> [2425.301139]  ret_from_fork+0x24/0x50
<4> [2425.301148] irq event stamp: 1855377
<4> [2425.301155] hardirqs last  enabled at (1855377): [<ffffffff8122ea58>] __slab_free+0x3e8/0x4f0
<4> [2425.301166] hardirqs last disabled at (1855376): [<ffffffff8122e7a9>] __slab_free+0x139/0x4f0
<4> [2425.301178] softirqs last  enabled at (1849702): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [2425.301190] softirqs last disabled at (1849695): [<ffffffff810b5159>] irq_exit+0xa9/0xc0
<4> [2425.301201] WARNING: CPU: 5 PID: 7676 at kernel/locking/mutex-debug.c:103 mutex_destroy+0x49/0x50
<4> [2425.301211] ---[ end trace cfc39dc2547ed83f ]---
<6> [2425.593015] [IGT] gem_flink_race: exiting, ret=0
<5> [2425.593215] Setting dangerous option reset - tainting kernel
<6> [2425.630793] Console: switching to colour frame buffer device 240x67
Comment 1 CI Bug Log 2019-04-05 20:01:26 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@gem_flink_race@flink_close - dmesg-warn - DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5875/shard-iclb6/igt@gem_flink_race@flink_close.html

* ICL: igt@runner@aborted - fail - Previous test: gem_flink_race (flink_close)
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5875/shard-iclb6/igt@runner@aborted.html
Comment 2 Chris Wilson 2019-04-05 20:28:39 UTC
icl and memory coruption? It's complaining about reservation_object being locked and that is execbuf, and this is just create/flink/close -- nothing using the reservation object lock. The ongoing memory corruption seems more likely.
Comment 3 Francesco Balestrieri 2019-04-08 04:25:12 UTC
Happened once so far, let's keep watching. Is the "ongoing memory corruption" on a specific machine?
Comment 4 Francesco Balestrieri 2019-04-08 04:45:10 UTC
> Is the "ongoing memory corruption" on a specific machine?

No, it's suspected on at least another one (icl-u2)
Comment 5 Francesco Balestrieri 2019-04-11 08:15:16 UTC
Happened once so far. Keep watching.
Comment 6 Jani Saarinen 2019-04-22 15:40:04 UTC
Same still. Lower priority? 
u2/u3 and shards have now updated BIOS
Comment 7 Francesco Balestrieri 2019-06-03 05:28:12 UTC
Still nothing. Closing.
Comment 8 CI Bug Log 2019-06-12 10:17:10 UTC
A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@gem_flink_race@flink_close - dmesg-warn - DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) -}
{+ All machine: all tests - dmesg-warn / incomplete - DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) +}


  No new failures caught with the new filter
Comment 9 Martin Peres 2019-06-12 10:18:49 UTC
(In reply to Francesco Balestrieri from comment #7)
> Still nothing. Closing.

Sorry to disappoint: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13248/fi-skl-gvtdvm/igt@i915_module_load@reload-with-fault-injection.html

<4> [285.220850] ------------[ cut here ]------------
<4> [285.220851] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
<4> [285.220862] WARNING: CPU: 0 PID: 7 at kernel/locking/mutex-debug.c:103 mutex_destroy+0x49/0x50
<4> [285.220862] Modules linked in: i915(-) snd_hda_codec snd_hwdep snd_hda_core snd_pcm vgem crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000 i2c_piix4 prime_numbers [last unloaded: i915]
<4> [285.220868] CPU: 0 PID: 7 Comm: kworker/u2:0 Tainted: G     U            5.2.0-rc4-CI-Patchwork_13248+ #1
<4> [285.220869] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [285.220909] Workqueue: i915 __i915_gem_free_work [i915]
<4> [285.220911] RIP: 0010:mutex_destroy+0x49/0x50
<4> [285.220912] Code: 00 00 5b c3 e8 58 a4 3b 00 85 c0 74 ed 8b 05 3e 69 23 01 85 c0 75 e3 48 c7 c6 e0 d3 08 82 48 c7 c7 88 ae 07 82 e8 e7 06 fa ff <0f> 0b eb cc 0f 1f 00 48 b8 11 11 11 11 11 11 11 11 48 89 76 20 48
<4> [285.220913] RSP: 0018:ffffc90000043de0 EFLAGS: 00010286
<4> [285.220914] RAX: 0000000000000000 RBX: ffff88803e6dbf08 RCX: 0000000000000000
<4> [285.220915] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffffffff8112867c
<4> [285.220915] RBP: 0000000000000202 R08: 0000000000000000 R09: 0000000000039600
<4> [285.220916] R10: 0000000000000000 R11: 0000000000000007 R12: ffff888037cf0000
<4> [285.220917] R13: ffff888037cf0068 R14: ffffffff82341c28 R15: 0000000000000000
<4> [285.220918] FS:  0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000
<4> [285.220920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [285.220921] CR2: 000055af2b30d258 CR3: 0000000066e88001 CR4: 00000000003606f0
<4> [285.220922] Call Trace:
<4> [285.220955]  __i915_gem_free_objects+0x36e/0x700 [i915]
<4> [285.220987]  __i915_gem_free_work+0x5a/0x90 [i915]
<4> [285.220990]  process_one_work+0x245/0x610
<4> [285.220993]  worker_thread+0x37/0x380
<4> [285.220995]  ? process_one_work+0x610/0x610
<4> [285.220997]  kthread+0x119/0x130
<4> [285.220998]  ? kthread_park+0x80/0x80
<4> [285.221001]  ret_from_fork+0x3a/0x50
<4> [285.221004] irq event stamp: 490337
<4> [285.221007] hardirqs last  enabled at (490337): [<ffffffff812149c8>] free_unref_page+0x48/0x60
<4> [285.221008] hardirqs last disabled at (490336): [<ffffffff812149b3>] free_unref_page+0x33/0x60
<4> [285.221010] softirqs last  enabled at (489738): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [285.221012] softirqs last disabled at (489731): [<ffffffff810b6399>] irq_exit+0xa9/0xc0
<4> [285.221014] WARNING: CPU: 0 PID: 7 at kernel/locking/mutex-debug.c:103 mutex_destroy+0x49/0x50
<4> [285.221014] ---[ end trace 77cae7935b558ee6 ]---
<4> [285.221027] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4> [285.221029] CPU: 0 PID: 3053 Comm: i915_module_loa Tainted: G     U  W         5.2.0-rc4-CI-Patchwork_13248+ #1
<4> [285.221029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [285.221062] RIP: 0010:i915_gem_object_wait+0x50/0x440 [i915]
<4> [285.221063] Code: 48 89 55 b0 31 d2 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 20 e4 ca e0 4d 85 f6 0f 88 d6 02 00 00 4c 8b a3 88 04 00 00 <45> 8b b4 24 90 00 00 00 41 f6 c6 01 0f 85 7b 01 00 00 41 f6 c5 08
<4> [285.221064] RSP: 0018:ffffc900001bfd10 EFLAGS: 00010206
<4> [285.221065] RAX: 0000000000000246 RBX: ffff88803e6dba40 RCX: 0000000000000001
<4> [285.221065] RDX: ffff888079048040 RSI: 00000000ffffffff RDI: 0000000000000246
<4> [285.221066] RBP: ffffc900001bfd60 R08: 0000000000000000 R09: 0000000000000001
<4> [285.221067] R10: ffffc900001bfd98 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b
<4> [285.221067] R13: 0000000000000001 R14: 7fffffffffffffff R15: ffffffffa055b348
<4> [285.221068] FS:  00007fd2ea773e40(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000
<4> [285.221069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [285.221070] CR2: 000055af2b30d258 CR3: 0000000066e88001 CR4: 00000000003606f0
<4> [285.221070] Call Trace:
<4> [285.221102]  i915_gem_object_set_to_gtt_domain+0x3b/0x270 [i915]
<4> [285.221133]  i915_gem_suspend_late+0x88/0x120 [i915]
<4> [285.221168]  i915_gem_fini_hw+0x35/0x110 [i915]
<4> [285.221190]  i915_driver_unload+0xdd/0x100 [i915]
<4> [285.221212]  i915_pci_remove+0x19/0x30 [i915]
<4> [285.221215]  pci_device_remove+0x36/0xb0
<4> [285.221217]  device_release_driver_internal+0xd3/0x1b0
<4> [285.221219]  driver_detach+0x3f/0x80
<4> [285.221221]  bus_remove_driver+0x53/0xd0
<4> [285.221223]  pci_unregister_driver+0x25/0xa0
<4> [285.221283]  i915_exit+0x16/0x1c [i915]
<4> [285.221285]  __se_sys_delete_module+0x162/0x210
<4> [285.221287]  ? trace_hardirqs_off_thunk+0x1a/0x1c
<4> [285.221289]  ? do_syscall_64+0xd/0x1c0
<4> [285.221290]  do_syscall_64+0x55/0x1c0
<4> [285.221292]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [285.221293] RIP: 0033:0x7fd2e9e1a1b7
<4> [285.221295] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
<4> [285.221296] RSP: 002b:00007ffee793b428 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
<4> [285.221297] RAX: ffffffffffffffda RBX: 00007ffee793b5a0 RCX: 00007fd2e9e1a1b7
<4> [285.221298] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 000055bc83493b08
<4> [285.221298] RBP: 000055bc83493aa0 R08: 000055bc83493b0c R09: 00007ffee793b450
<4> [285.221299] R10: 00007ffee793a424 R11: 0000000000000206 R12: 0000000000000000
<4> [285.221300] R13: 00007ffee793bb10 R14: 0000000000000000 R15: 0000000000000000
<4> [285.221303] Modules linked in: i915(-) snd_hda_codec snd_hwdep snd_hda_core snd_pcm vgem crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000 i2c_piix4 prime_numbers [last unloaded: i915]
Comment 10 Chris Wilson 2019-06-12 10:22:21 UTC
(In reply to Martin Peres from comment #9)
> (In reply to Francesco Balestrieri from comment #7)
> > Still nothing. Closing.
> 
> Sorry to disappoint:
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13248/fi-skl-gvtdvm/
> igt@i915_module_load@reload-with-fault-injection.html
> 
> <4> [285.220850] ------------[ cut here ]------------
> <4> [285.220851] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
> <4> [285.220862] WARNING: CPU: 0 PID: 7 at kernel/locking/mutex-debug.c:103
> mutex_destroy+0x49/0x50
> <4> [285.220862] Modules linked in: i915(-) snd_hda_codec snd_hwdep
> snd_hda_core snd_pcm vgem crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> e1000 i2c_piix4 prime_numbers [last unloaded: i915]
> <4> [285.220868] CPU: 0 PID: 7 Comm: kworker/u2:0 Tainted: G     U          
> 5.2.0-rc4-CI-Patchwork_13248+ #1
> <4> [285.220869] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
> <4> [285.220909] Workqueue: i915 __i915_gem_free_work [i915]
> <4> [285.220911] RIP: 0010:mutex_destroy+0x49/0x50
> <4> [285.220912] Code: 00 00 5b c3 e8 58 a4 3b 00 85 c0 74 ed 8b 05 3e 69 23
> 01 85 c0 75 e3 48 c7 c6 e0 d3 08 82 48 c7 c7 88 ae 07 82 e8 e7 06 fa ff <0f>
> 0b eb cc 0f 1f 00 48 b8 11 11 11 11 11 11 11 11 48 89 76 20 48
> <4> [285.220913] RSP: 0018:ffffc90000043de0 EFLAGS: 00010286
> <4> [285.220914] RAX: 0000000000000000 RBX: ffff88803e6dbf08 RCX:
> 0000000000000000
> <4> [285.220915] RDX: 0000000000000002 RSI: 0000000000000000 RDI:
> ffffffff8112867c
> <4> [285.220915] RBP: 0000000000000202 R08: 0000000000000000 R09:
> 0000000000039600
> <4> [285.220916] R10: 0000000000000000 R11: 0000000000000007 R12:
> ffff888037cf0000
> <4> [285.220917] R13: ffff888037cf0068 R14: ffffffff82341c28 R15:
> 0000000000000000
> <4> [285.220918] FS:  0000000000000000(0000) GS:ffff88807da00000(0000)
> knlGS:0000000000000000
> <4> [285.220920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [285.220921] CR2: 000055af2b30d258 CR3: 0000000066e88001 CR4:
> 00000000003606f0
> <4> [285.220922] Call Trace:
> <4> [285.220955]  __i915_gem_free_objects+0x36e/0x700 [i915]
...
> <4> [285.221027] general protection fault: 0000 [#1] PREEMPT SMP PTI
> <4> [285.221029] CPU: 0 PID: 3053 Comm: i915_module_loa Tainted: G     U  W 
> 5.2.0-rc4-CI-Patchwork_13248+ #1
> <4> [285.221029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
> <4> [285.221062] RIP: 0010:i915_gem_object_wait+0x50/0x440 [i915]
> <4> [285.221070] Call Trace:
> <4> [285.221102]  i915_gem_object_set_to_gtt_domain+0x3b/0x270 [i915]
> <4> [285.221133]  i915_gem_suspend_late+0x88/0x120 [i915]

Except that quite clearly is a very different bug and a regression.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.