Bug 102707

Summary: [BAT][CI] igt@*- dmesg-warn - WARNING: CPU: 2 PID: 8245 at drivers/gpu/drm/drm_mode_config.c:468 drm_mode_config_cleanup
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Maarten Lankhorst <bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: intel-gfx-bugs, martin.peres
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: GLK, HSW, KBL, SNB i915 features: display/Other

Description Marta Löfstedt 2017-09-13 13:41:26 UTC
CI_DRM_3082 HSW- and SNB-shards

[   60.712670] ------------[ cut here ]------------
[   60.712677] WARNING: CPU: 0 PID: 1597 at drivers/gpu/drm/drm_mode_config.c:465 drm_mode_config_cleanup+0x250/0x270
[   60.712679] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp broadcom crct10dif_pclmul bcm_phy_lib snd_hda_codec crc32_pclmul snd_hwdep snd_hda_core ghash_clmulni_intel tg3 ptp snd_pcm pps_core mei_me lpc_ich prime_numbers mei [last unloaded: snd_hda_intel]
[   60.712727] CPU: 0 PID: 1597 Comm: drv_module_relo Tainted: G     U          4.13.0-CI-CI_DRM_3082+ #1
[   60.712729] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
[   60.712731] task: ffff88021ff8a800 task.stack: ffffc90000a94000
[   60.712734] RIP: 0010:drm_mode_config_cleanup+0x250/0x270
[   60.712736] RSP: 0018:ffffc90000a97d80 EFLAGS: 00010283
[   60.712740] RAX: ffff88021f50f0b0 RBX: ffff880212440890 RCX: 0000000000000001
[   60.712742] RDX: ffff8802124408b8 RSI: 0000000000000000 RDI: ffff880212440890
[   60.712744] RBP: ffffc90000a97da8 R08: 0000000000000000 R09: 0000000000000000
[   60.712746] R10: ffffc90000a97d00 R11: 0000000000000000 R12: ffff880212440000
[   60.712748] R13: ffff8802124406a8 R14: ffffffffa02d3d88 R15: ffff88022585b848
[   60.712750] FS:  00007f6ed98888c0(0000) GS:ffff88022fa00000(0000) knlGS:0000000000000000
[   60.712752] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   60.712755] CR2: 000055c9bd6a2ec0 CR3: 0000000219a1d000 CR4: 00000000000406f0
[   60.712756] Call Trace:
[   60.712786]  intel_modeset_cleanup+0x66/0xa0 [i915]
[   60.712804]  i915_driver_unload+0x92/0x180 [i915]
[   60.712822]  i915_pci_remove+0x19/0x30 [i915]
[   60.712826]  pci_device_remove+0x39/0xb0
[   60.712831]  device_release_driver_internal+0x15d/0x220
[   60.712835]  driver_detach+0x40/0x80
[   60.712839]  bus_remove_driver+0x58/0xd0
[   60.712842]  driver_unregister+0x2c/0x40
[   60.712845]  pci_unregister_driver+0x36/0xb0
[   60.712872]  i915_exit+0x1a/0x8b [i915]
[   60.712876]  SyS_delete_module+0x18c/0x1e0
[   60.712882]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[   60.712884] RIP: 0033:0x7f6ed7da2287
[   60.712886] RSP: 002b:00007ffeb58e4858 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[   60.712890] RAX: ffffffffffffffda RBX: ffffffff8148a713 RCX: 00007f6ed7da2287
[   60.712892] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 0000563e66b4c9e8
[   60.712894] RBP: ffffc90000a97f88 R08: 0000000000000000 R09: 0000000000000080
[   60.712896] R10: 00007f6ed98888c0 R11: 0000000000000246 R12: 0000000000000000
[   60.712898] R13: 00007ffeb58e4a40 R14: 0000000000000000 R15: 0000000000000000
[   60.712903]  ? __this_cpu_preempt_check+0x13/0x20
[   60.712908] Code: da 31 f6 48 c7 c7 e3 d9 c7 81 e8 5c 61 fe ff 48 8d 7d d8 e8 03 a0 ff ff 48 85 c0 75 dd 48 8d 7d d8 e8 a5 a0 ff ff e9 45 fe ff ff <0f> ff e9 3a ff ff ff 0f ff 48 83 c4 10 5b 41 5c 41 5d 5d c3 66 
[   60.713037] ---[ end trace 8f77174a31f4f41e ]---

See comment in driver/gpu/drm/drm_mode_config.c  
* Also, if there are any framebuffers left, that's a driver leak now,
* so politely WARN about this.
*/
WARN_ON(!list_empty(&dev->mode_config.fb_list));


Full data:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3082/shard-snb5/igt@drv_module_reload@basic-reload-inject.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3082/shard-hsw2/igt@drv_module_reload@basic-reload-inject.html
Comment 1 Ville Syrjala 2017-09-13 13:47:43 UTC
This is also visible on GDG. Not sure why GDG hits it every time, and these things apparently only in the shard runs. Some kind of fb leak somewhere...
Comment 2 Marta Löfstedt 2017-09-22 06:36:11 UTC
*** Bug 102938 has been marked as a duplicate of this bug. ***
Comment 5 Marta Löfstedt 2017-09-22 07:21:45 UTC
*** Bug 102576 has been marked as a duplicate of this bug. ***
Comment 6 Daniel Vetter 2017-10-11 12:49:16 UTC
Yes this just indicates an fb leak in any of the previously run tests. module-reload is just the canary.

So step 1: take note of all the previous tests run in the same shard, step 2: figure out which one it actually is.

Yes this is one of the more painful ones.
Comment 8 Marta Löfstedt 2017-10-30 12:37:23 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/shard-glkb1/igt@drv_module_reload@basic-no-display.html

	

[ 4141.102395] ------------[ cut here ]------------
[ 4141.102416] WARNING: CPU: 1 PID: 18487 at drivers/gpu/drm/drm_mode_config.c:468 drm_mode_config_cleanup+0x250/0x270
[ 4141.102421] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm prime_numbers mei_me r8169 mei mii i2c_hid pinctrl_geminilake pinctrl_intel [last unloaded: snd_hda_intel]
[ 4141.102535] CPU: 1 PID: 18487 Comm: drv_module_relo Tainted: G     U  W       4.14.0-rc6-CI-CI_DRM_3293+ #1
[ 4141.102540] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
[ 4141.102546] task: ffff8801737fcec0 task.stack: ffffc90000c14000
[ 4141.102553] RIP: 0010:drm_mode_config_cleanup+0x250/0x270
[ 4141.102558] RSP: 0018:ffffc90000c17d70 EFLAGS: 00010202
[ 4141.102567] RAX: ffff8801457e90a0 RBX: ffff880169300890 RCX: 0000000000000001
[ 4141.102572] RDX: ffff8801693008b8 RSI: 0000000000000000 RDI: ffff880169300890
[ 4141.102577] RBP: ffffc90000c17d98 R08: 0000000000000000 R09: 0000000000000000
[ 4141.102581] R10: ffffc90000c17cf0 R11: 0000000000000000 R12: ffff880169300000
[ 4141.102586] R13: ffff8801693006a8 R14: ffffffffa028bd90 R15: ffff880179c24ae8
[ 4141.102592] FS:  00007f75b68398c0(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000
[ 4141.102596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4141.102601] CR2: 00005571597e5920 CR3: 000000011cde0000 CR4: 00000000003406e0
[ 4141.102606] Call Trace:
[ 4141.102693]  intel_modeset_cleanup+0x9d/0xf0 [i915]
[ 4141.102747]  i915_driver_unload+0x92/0x180 [i915]
[ 4141.102921]  i915_pci_remove+0x19/0x30 [i915]
[ 4141.102935]  pci_device_remove+0x39/0xb0
[ 4141.102949]  device_release_driver_internal+0x15d/0x220
[ 4141.102963]  driver_detach+0x40/0x80
[ 4141.102975]  bus_remove_driver+0x58/0xd0
[ 4141.102984]  driver_unregister+0x2c/0x40
[ 4141.102994]  pci_unregister_driver+0x36/0xb0
[ 4141.103078]  i915_exit+0x1a/0x8b [i915]
[ 4141.103090]  SyS_delete_module+0x18c/0x1e0
[ 4141.103103]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 4141.103108] RIP: 0033:0x7f75b4d51287
[ 4141.103113] RSP: 002b:00007fffca2f2a28 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[ 4141.103122] RAX: ffffffffffffffda RBX: ffffffff81491ef3 RCX: 00007f75b4d51287
[ 4141.103127] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 000055a94a578e58
[ 4141.103131] RBP: ffffc90000c17f88 R08: 0000000000000000 R09: 00007f75b4d9daa0
[ 4141.103137] R10: 000055a94a572cc0 R11: 0000000000000246 R12: 0000000000000000
[ 4141.103142] R13: 00007fffca2f2c10 R14: 0000000000000000 R15: 0000000000000000
[ 4141.103151]  ? __this_cpu_preempt_check+0x13/0x20
[ 4141.103163] Code: da 31 f6 48 c7 c7 c3 12 c9 81 e8 4c f5 ff ff 48 8d 7d d8 e8 b3 9d ff ff 48 85 c0 75 dd 48 8d 7d d8 e8 55 9e ff ff e9 45 fe ff ff <0f> ff e9 3a ff ff ff 0f ff 48 83 c4 10 5b 41 5c 41 5d 5d c3 66 
[ 4141.103443] ---[ end trace 4036b02fda94457e ]---
Comment 9 Daniel Vetter 2017-11-08 13:34:18 UTC
We dont track machines in the title anymore, avoids confusion when they don't match up.
Comment 10 Marta Löfstedt 2017-11-24 08:26:42 UTC
*** Bug 103719 has been marked as a duplicate of this bug. ***
Comment 11 Daniel Vetter 2017-12-07 17:45:47 UTC
In the vain hope of finding more clue:

commit 2aa0fcc2c72456a20ba958fce7669be922c6db15
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Dec 7 15:49:25 2017 +0100

    drm: More debug info for fb leaks in mode_config_cleanup
Comment 12 Daniel Vetter 2017-12-07 17:46:08 UTC
I think a new backtrace from CI would be great.
Comment 13 Marta Löfstedt 2017-12-08 07:37:57 UTC
patch included from CI_DRM_3476.
we hit the issue on:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3478/shard-snb5/igt@drv_module_reload@basic-reload.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3478/shard-hsw6/igt@drv_module_reload@basic-reload.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3478/shard-kbl2/igt@drv_module_reload@basic-reload.html

from KBL:
<7>[  196.940243] [leaked fb] framebuffer[89]:
<7>[  196.940245] [leaked fb] 	refcount=22
<7>[  196.940247] [leaked fb] 	format=XR24 little-endian (0x34325258)
<7>[  196.940249] [leaked fb] 	modifier=0x0
<7>[  196.940250] [leaked fb] 	size=1920x1200
<7>[  196.940252] [leaked fb] 	layers:
<7>[  196.940253] [leaked fb] 		size[0]=1920x1200
<7>[  196.940255] [leaked fb] 		pitch[0]=7680
<7>[  196.940256] [leaked fb] 		offset[0]=0
<7>[  196.940258] [leaked fb] 		obj[0]:(null)
<7>[  197.198401] [drm:i915_driver_load [i915]] Found SunrisePoint LP PCH
<7>[  197.198429] [drm:intel_power_domains_init [i915]] Allowed DC state mask 03
<7>[  197.199792] [drm:intel_device_info_dump [i915]] i915 device info: platform=KABYLAKE gen=9 pciid=0x5926 rev=0x06
<7>[  197.199827] [drm:intel_device_info_dump [i915]] i915 device info: is_mobile: no
<7>[  197.199860] [drm:intel_device_info_dump [i915]] i915 device info: is_lp: no
<7>[  197.199890] [drm:intel_device_info_dump [i915]] i915 device info: is_alpha_support: no
<7>[  197.199919] [drm:intel_device_info_dump [i915]] i915 device info: has_64bit_reloc: yes
<7>[  197.199981] [drm:intel_device_info_dump [i915]] i915 device info: has_aliasing_ppgtt: yes
<7>[  197.200009] [drm:intel_device_info_dump [i915]] i915 device info: has_csr: yes
<7>[  197.200037] [drm:intel_device_info_dump [i915]] i915 device info: has_ddi: yes
<7>[  197.200067] [drm:intel_device_info_dump [i915]] i915 device info: has_dp_mst: yes
<7>[  197.200097] [drm:intel_device_info_dump [i915]] i915 device info: has_reset_engine: yes
<7>[  197.200125] [drm:intel_device_info_dump [i915]] i915 device info: has_fbc: yes
<7>[  197.200152] [drm:intel_device_info_dump [i915]] i915 device info: has_fpga_dbg: yes
<7>[  197.200181] [drm:intel_device_info_dump [i915]] i915 device info: has_full_ppgtt: yes
<7>[  197.200208] [drm:intel_device_info_dump [i915]] i915 device info: has_full_48bit_ppgtt: yes
<7>[  197.200236] [drm:intel_device_info_dump [i915]] i915 device info: has_gmch_display: no
<7>[  197.200264] [drm:intel_device_info_dump [i915]] i915 device info: has_guc: yes
<7>[  197.200291] [drm:intel_device_info_dump [i915]] i915 device info: has_guc_ct: no
<7>[  197.200319] [drm:intel_device_info_dump [i915]] i915 device info: has_hotplug: yes
<7>[  197.200348] [drm:intel_device_info_dump [i915]] i915 device info: has_l3_dpf: no
<7>[  197.200376] [drm:intel_device_info_dump [i915]] i915 device info: has_llc: yes
<7>[  197.200404] [drm:intel_device_info_dump [i915]] i915 device info: has_logical_ring_contexts: yes
<7>[  197.200432] [drm:intel_device_info_dump [i915]] i915 device info: has_logical_ring_preemption: yes
<7>[  197.200460] [drm:intel_device_info_dump [i915]] i915 device info: has_overlay: no
<7>[  197.200488] [drm:intel_device_info_dump [i915]] i915 device info: has_pooled_eu: no
<7>[  197.200519] [drm:intel_device_info_dump [i915]] i915 device info: has_psr: yes
<7>[  197.200549] [drm:intel_device_info_dump [i915]] i915 device info: has_rc6: yes
<7>[  197.200578] [drm:intel_device_info_dump [i915]] i915 device info: has_rc6p: no
<7>[  197.202112] [drm:intel_device_info_dump [i915]] i915 device info: has_resource_streamer: yes
<7>[  197.202143] [drm:intel_device_info_dump [i915]] i915 device info: has_runtime_pm: yes
<7>[  197.202170] [drm:intel_device_info_dump [i915]] i915 device info: has_snoop: no
<7>[  197.202202] [drm:intel_device_info_dump [i915]] i915 device info: unfenced_needs_alignment: no
<7>[  197.202232] [drm:intel_device_info_dump [i915]] i915 device info: cursor_needs_physical: no
<7>[  197.202263] [drm:intel_device_info_dump [i915]] i915 device info: hws_needs_physical: no
<7>[  197.202288] [drm:intel_device_info_dump [i915]] i915 device info: overlay_needs_physical: no
<7>[  197.202317] [drm:intel_device_info_dump [i915]] i915 device info: supports_tv: no
<7>[  197.202348] [drm:i915_driver_load [i915]] i915 device info: has_ipc: yes
<6>[  197.202393] [drm] Found 64MB of eDRAM
<7>[  197.203008] [drm:intel_device_info_runtime_init [i915]] slice mask: 0003
<7>[  197.203037] [drm:intel_device_info_runtime_init [i915]] slice total: 2
<7>[  197.203068] [drm:intel_device_info_runtime_init [i915]] subslice total: 6
<7>[  197.203099] [drm:intel_device_info_runtime_init [i915]] subslice mask 0007
<7>[  197.203129] [drm:intel_device_info_runtime_init [i915]] subslice per slice: 3
<7>[  197.203155] [drm:intel_device_info_runtime_init [i915]] EU total: 48
<7>[  197.203180] [drm:intel_device_info_runtime_init [i915]] EU per subslice: 8
<7>[  197.203210] [drm:intel_device_info_runtime_init [i915]] has slice power gating: y
<7>[  197.203239] [drm:intel_device_info_runtime_init [i915]] has subslice power gating: n
<7>[  197.203267] [drm:intel_device_info_runtime_init [i915]] has EU power gating: y
Comment 14 Maarten Lankhorst 2017-12-08 11:01:49 UTC
https://patchwork.freedesktop.org/series/35077/
Comment 15 Maarten Lankhorst 2017-12-20 09:37:56 UTC
https://patchwork.freedesktop.org/series/35615/
Comment 16 Maarten Lankhorst 2017-12-21 09:59:57 UTC
Patches merged, bug fixed.

commit 20bdc112bbe4c86e1293852bb11d56f8928a4c6d (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Wed Dec 20 10:35:45 2017 +0100

    drm/i915: Disable all planes for load detection, v2.

commit ce0769e0ea4b3e192466243a1a9fd39acf214f1e (HEAD -> drm-misc-fixes)
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date:   Wed Dec 20 10:35:43 2017 +0100

    drm/plane: Make framebuffer refcounting the responsibility of setplane_internal callers
Comment 17 Marta Löfstedt 2018-01-02 08:55:35 UTC
Issue no longer reproduced

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.