Bug 106725 - [CI] igt@drv_module_reload@* - dmesg-warn - WARNING: CPU: 1 PID: 1654 at drivers/gpu/drm/drm_mode_config.c:439 drm_mode_config_cleanup
Summary: [CI] igt@drv_module_reload@* - dmesg-warn - WARNING: CPU: 1 PID: 1654 at driv...
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Aditya Swarup
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-30 13:14 UTC by Martin Peres
Modified: 2019-03-11 18:04 UTC (History)
3 users (show)

See Also:
i915 platform: GLK
i915 features: display/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-05-30 13:14:46 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_53/fi-glk-j4005/igt@drv_module_reload@basic-no-display.html

[  237.755510] WARNING: CPU: 1 PID: 1654 at drivers/gpu/drm/drm_mode_config.c:439 drm_mode_config_cleanup+0x27a/0x2d0
[  237.755516] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic btusb i915(-) btrtl btbcm btintel x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel bluetooth ecdh_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [last unloaded: snd_hda_intel]
[  237.755595] CPU: 1 PID: 1654 Comm: drv_module_relo Tainted: G     U            4.17.0-rc6-gf460c1f3512a-drmtip_53+ #1
[  237.755598] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018
[  237.755602] RIP: 0010:drm_mode_config_cleanup+0x27a/0x2d0
[  237.755605] RSP: 0018:ffffb33000243d80 EFLAGS: 00010287
[  237.755609] RAX: ffff8b9be3c86fe0 RBX: ffff8b9bde790838 RCX: 0000000000000000
[  237.755611] RDX: ffff8b9bde7907e0 RSI: ffffffff860fb831 RDI: 00000000ffffffff
[  237.755614] RBP: ffff8b9bde790000 R08: 00000000f5ad9eef R09: 0000000000000000
[  237.755616] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b9bde790840
[  237.755619] R13: ffffffffc04f1870 R14: ffff8b9bf5d74ae8 R15: ffffffffc04f18f0
[  237.755622] FS:  00007f5edcb85980(0000) GS:ffff8b9bffc80000(0000) knlGS:0000000000000000
[  237.755624] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  237.755627] CR2: 0000561d7a4db238 CR3: 0000000267e3a000 CR4: 0000000000340ee0
[  237.755629] Call Trace:
[  237.755715]  intel_modeset_cleanup+0xb3/0x130 [i915]
[  237.755759]  i915_driver_unload+0x98/0x110 [i915]
[  237.755802]  i915_pci_remove+0x10/0x20 [i915]
[  237.755810]  pci_device_remove+0x36/0xb0
[  237.755817]  device_release_driver_internal+0x15d/0x220
[  237.755823]  driver_detach+0x35/0x70
[  237.755827]  bus_remove_driver+0x53/0xd0
[  237.755831]  pci_unregister_driver+0x25/0xa0
[  237.755839]  __se_sys_delete_module+0x162/0x210
[  237.755845]  ? do_syscall_64+0xd/0x190
[  237.755851]  do_syscall_64+0x55/0x190
[  237.755857]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  237.755860] RIP: 0033:0x7f5edc23d1b7
[  237.755862] RSP: 002b:00007ffe3252a878 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  237.755867] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5edc23d1b7
[  237.755870] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000056013b392a38
[  237.755872] RBP: 000056013b3929d0 R08: 000056013b392a3c R09: 00007f5edc289b40
[  237.755874] R10: 00007ffe32529874 R11: 0000000000000206 R12: 000056013b1622f0
[  237.755877] R13: 00007ffe3252aa90 R14: 0000000000000000 R15: 0000000000000000
[  237.755909] Code: eb 31 00 48 8b 45 00 48 39 c5 75 62 48 8b 44 24 28 65 48 33 04 25 28 00 00 00 75 56 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 48 89 e6 48 89 ef 48 c7 c3 62 0e 0d 86 e8 02 95 ff ff eb 
[  237.756048] irq event stamp: 153884
[  237.756052] hardirqs last  enabled at (153883): [<ffffffff859488cc>] _raw_spin_unlock_irqrestore+0x4c/0x60
[  237.756057] hardirqs last disabled at (153884): [<ffffffff85a0111c>] error_entry+0x7c/0x100
[  237.756060] softirqs last  enabled at (153872): [<ffffffff85c0032b>] __do_softirq+0x32b/0x4e1
[  237.756065] softirqs last disabled at (153865): [<ffffffff8508f7f4>] irq_exit+0xa4/0xb0
[  237.756068] WARNING: CPU: 1 PID: 1654 at drivers/gpu/drm/drm_mode_config.c:439 drm_mode_config_cleanup+0x27a/0x2d0
[  237.756071] ---[ end trace 4a33f25fc0d17512 ]---
[  237.756311] [drm:drm_mode_config_cleanup] *ERROR* connector HDMI-A-1 leaked!
[  237.758594] WARNING: CPU: 1 PID: 1654 at drivers/gpu/drm/drm_mode_config.c:473 drm_mode_config_cleanup+0x2b6/0x2d0
[  237.758597] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic btusb i915(-) btrtl btbcm btintel x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel bluetooth ecdh_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [last unloaded: snd_hda_intel]
[  237.758667] CPU: 1 PID: 1654 Comm: drv_module_relo Tainted: G     U  W         4.17.0-rc6-gf460c1f3512a-drmtip_53+ #1
[  237.758669] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018
[  237.758673] RIP: 0010:drm_mode_config_cleanup+0x2b6/0x2d0
[  237.758675] RSP: 0018:ffffb33000243d80 EFLAGS: 00010287
[  237.758680] RAX: ffff8b9befb83860 RBX: ffff8b9bde7909a8 RCX: 0000000000000000
[  237.758682] RDX: ffff8b9bde7909d0 RSI: 0000000000000000 RDI: ffff8b9bde7909a8
[  237.758685] RBP: ffff8b9bde790000 R08: 0000000000000000 R09: 0000000000000000
[  237.758687] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b9bde790748
[  237.758689] R13: ffffffffc04f1870 R14: ffff8b9bf5d74ae8 R15: ffffffffc04f18f0
[  237.758692] FS:  00007f5edcb85980(0000) GS:ffff8b9bffc80000(0000) knlGS:0000000000000000
[  237.758695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  237.758697] CR2: 0000561d7a4db238 CR3: 0000000267e3a000 CR4: 0000000000340ee0
[  237.758699] Call Trace:
[  237.758764]  intel_modeset_cleanup+0xb3/0x130 [i915]
[  237.758808]  i915_driver_unload+0x98/0x110 [i915]
[  237.758850]  i915_pci_remove+0x10/0x20 [i915]
[  237.758856]  pci_device_remove+0x36/0xb0
[  237.758862]  device_release_driver_internal+0x15d/0x220
[  237.758868]  driver_detach+0x35/0x70
[  237.758872]  bus_remove_driver+0x53/0xd0
[  237.758876]  pci_unregister_driver+0x25/0xa0
[  237.758904]  __se_sys_delete_module+0x162/0x210
[  237.758909]  ? do_syscall_64+0xd/0x190
[  237.758914]  do_syscall_64+0x55/0x190
[  237.758920]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  237.758923] RIP: 0033:0x7f5edc23d1b7
[  237.758925] RSP: 002b:00007ffe3252a878 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  237.758930] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5edc23d1b7
[  237.758933] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000056013b392a38
[  237.758935] RBP: 000056013b3929d0 R08: 000056013b392a3c R09: 00007f5edc289b40
[  237.758938] R10: 00007ffe32529874 R11: 0000000000000206 R12: 000056013b1622f0
[  237.758940] R13: 00007ffe3252aa90 R14: 0000000000000000 R15: 0000000000000000
[  237.758953] Code: 95 ff ff eb 0c 48 8b 70 48 48 89 df e8 a4 f4 ff ff 48 89 e7 e8 9c 95 ff ff 48 85 c0 75 e7 48 89 e7 e8 5f 96 ff ff e9 f8 fd ff ff <0f> 0b e9 ef fe ff ff 0f 0b eb 9a e8 4a 78 a7 ff 66 2e 0f 1f 84 
[  237.759109] irq event stamp: 157456
[  237.759116] hardirqs last  enabled at (157455): [<ffffffff851fb2f2>] __slab_free+0x472/0x580
[  237.759119] hardirqs last disabled at (157456): [<ffffffff85a0111c>] error_entry+0x7c/0x100
[  237.759122] softirqs last  enabled at (156642): [<ffffffff85c0032b>] __do_softirq+0x32b/0x4e1
[  237.759126] softirqs last disabled at (156635): [<ffffffff8508f7f4>] irq_exit+0xa4/0xb0
[  237.759130] WARNING: CPU: 1 PID: 1654 at drivers/gpu/drm/drm_mode_config.c:473 drm_mode_config_cleanup+0x2b6/0x2d0
[  237.759132] ---[ end trace 4a33f25fc0d17513 ]---
Comment 1 Martin Peres 2018-05-30 13:15:30 UTC
This seems to be a regression introduced in drmtip_53, bumping the priority. See also https://bugs.freedesktop.org/show_bug.cgi?id=106723 and https://bugs.freedesktop.org/show_bug.cgi?id=106724
Comment 2 Francesco Balestrieri 2018-06-01 07:13:34 UTC
Proposing this as highest since it is a CI regression. Is it seen in BAT?
Comment 3 Francesco Balestrieri 2018-06-12 12:25:43 UTC
Last seen 3 hours ago. 264 / 324 runs (81.5%)
Comment 4 Clinton Taylor 2018-06-15 20:30:58 UTC
Attempting to reproduce out side CI system now.
Comment 5 Clinton Taylor 2018-06-21 16:25:51 UTC
Changing development environment to support driver load/unload
Comment 6 Clinton Taylor 2018-06-21 18:15:42 UTC
Unable to reproduce with current drm-tip

Issue apparently is fixed based on CI logs starting at CI_DRM_4351.
Comment 7 Martin Peres 2018-09-07 16:30:14 UTC
(In reply to Clinton Taylor from comment #6)
> Unable to reproduce with current drm-tip
> 
> Issue apparently is fixed based on CI logs starting at CI_DRM_4351.

This has been happening again regularly since CI_DRM_4523: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4523/fi-glk-j4005/igt@drv_module_reload@basic-reload-inject.html

Last one: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4780/fi-glk-j4005/igt@drv_module_reload@basic-reload.html

Sorry!
Comment 8 Lakshmi 2018-10-18 06:50:12 UTC
Clint, any update here?
Comment 9 Lakshmi 2018-10-24 12:00:23 UTC
Latest logs from CI_DRM_5024, dmesg-warn
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5024/fi-glk-j4005/igt@drv_module_reload@basic-reload.html
Comment 10 Clinton Taylor 2018-11-15 00:14:37 UTC
No update yet. Customer high priority issues.
Comment 11 Lakshmi 2019-02-20 07:36:19 UTC
Updating the priority, last seen this issue on CI_DRM_5090_188 (3 months / 1930 runs ago).

Clint/Aditya any idea if the bug has been fixed? If so, I can close this bug.
Comment 12 Aditya Swarup 2019-02-21 19:42:29 UTC
@Lakshmi

I just started working on this issue and I am working on some priority issue as well. This will need some time for investigation. I will keep posting updates.
Comment 13 James Ausmus 2019-02-27 19:54:15 UTC
Current plan is for Aditya to set up 1000 runs of this test on his local RVP. If he can also not reproduce it, then we should close this.
Comment 14 Martin Peres 2019-03-08 16:16:48 UTC
Used to happen every week, then nothing for over 4 months. Closing!
Comment 15 CI Bug Log 2019-03-08 16:24:43 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.
Comment 16 Aditya Swarup 2019-03-11 18:04:28 UTC
So, I ran the test for over 3000 iterations on glkrvp and didn't come across any failures for the basic-no-display sub-test with i915_module_load. However, the subtest reload-with-fault-injection fails for every single run. Since, the bug is filed for no-display subtest, it should be closed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.