Bug 77511 - [ILK/SNB/IVB/BYT/HSW/BDW regression]igt/drv_module_reload causes system hang
Summary: [ILK/SNB/IVB/BYT/HSW/BDW regression]igt/drv_module_reload causes system hang
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: highest critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-16 07:06 UTC by Guo Jinxian
Modified: 2017-02-10 08:01 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
drv_module_reload hang (5.52 KB, text/plain)
2014-04-16 07:06 UTC, Guo Jinxian
no flags Details
reorder fbcon cleanup (490 bytes, patch)
2014-06-11 12:25 UTC, Daniel Vetter
no flags Details | Splinter Review

Description Guo Jinxian 2014-04-16 07:06:13 UTC
Created attachment 97450 [details]
drv_module_reload hang

System Environment:
--------------------------
Platform: ILK SNB IVB BYT HSW BDW
kernel:   (drm-intel-nightly)8c7da4ebd7c0aa6f24a558634f6f59204cf65c0b

Bug detailed description:
----------------------------
System random hung while run drv_module_reload.

The hung rate about 1 out of 3.

output:
module successfully unloaded


Reproduce steps:
---------------------------- 
1.  ./drv_module_reload
Comment 1 Daniel Vetter 2014-04-16 08:53:06 UTC
Can you please bisect where this regression has been introduced?

Both the module reload hang issue and the backtrace from

[  104.889887] WARNING: CPU: 0 PID: 3807 at drivers/gpu/drm/drm_crtc.c:4684 drm_mode_config_cleanup+0x114/0x1a0 [drm]()

separately since they might be different bugs.
Comment 2 Guo Jinxian 2014-04-17 07:12:23 UTC
(In reply to comment #1)
> Can you please bisect where this regression has been introduced?
> 
> Both the module reload hang issue and the backtrace from
> 
> [  104.889887] WARNING: CPU: 0 PID: 3807 at drivers/gpu/drm/drm_crtc.c:4684
> drm_mode_config_cleanup+0x114/0x1a0 [drm]()
> 
> separately since they might be different bugs.

About [  104.889887] WARNING: CPU: 0 PID: 3807 at drivers/gpu/drm/drm_crtc.c:4684
drm_mode_config_cleanup+0x114/0x1a0 [drm]()

4c6baa595f4e8516bb9cf0081765f90856aa2fe3 is the first bad commit
commit 4c6baa595f4e8516bb9cf0081765f90856aa2fe3
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Fri Mar 7 08:57:50 2014 -0800

    drm/i915: get_plane_config support for ILK+ v3

    This should allow BIOS fb inheritance to work on ILK+ machines too.

    v2: handle tiled BIOS fbs (Kristian)
        split out common bits (Jesse)
    v3: alloc fb obj out in _init

    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 be6fd58e34c1ca582a123f0d47baaac24cd6ede1 c6d54592168e65908c1bdc65727150d3df1829ae M      drivers


About system random hang, we didn't met it during bisection. We will further investigate it later.
Comment 3 Chris Wilson 2014-04-30 10:47:25 UTC
Hmm, it appears to be a harmless warning about a fb leak. Not sure it is valid with out current plane_config recording the initial state.
Comment 4 Daniel Vetter 2014-05-15 16:01:47 UTC
It's a leak in the bios fb takeover code somewhere. Confirmed to exists both with and w/o FBDEV support. So shouldn't be too hard to track down really.
Comment 5 Daniel Vetter 2014-06-11 12:25:22 UTC
Created attachment 100879 [details] [review]
reorder fbcon cleanup

Please test this patch. When testing please make sure that you don't trip up over some unrelated WARNING, but only check for the on in drm_crtc.c mentioned in this bug report.
Comment 6 Chris Wilson 2014-07-19 10:52:57 UTC
Ping?
Comment 7 Guo Jinxian 2014-07-21 03:27:26 UTC
I retested it on latest -nightly(8734408c113bb38234ed03ec51c723b3deff579b) 10 times, and didn't find this issue. It should be fixed.

[root@x-bdw01 tests]# ./drv_module_reload
unbinding /sys/class/vtconsole/vtcon1/: (M) frame buffer device
module successfully unloaded
module successfully loaded again
[root@x-bdw01 tests]# echo $?
0
Comment 8 Daniel Vetter 2014-08-06 08:34:59 UTC
Nope, it's not fixed. It only happens on the first module reload on a fresh boot though.
Comment 9 Guo Jinxian 2014-09-22 06:45:51 UTC
The failure is able to reproduce on latest -fixes(8c875fca1a8d76665c60fa141c220cee65f44f5e)

[root@x-ivb6 tests]# ./drv_module_reload
unbinding /sys/class/vtconsole/vtcon0/: (M) frame buffer device
module successfully unloaded
[root@x-ivb6 tests]# echo $?
0
[root@x-ivb6 tests]# dmesg -r|egrep "<[1-4]>"|grep drm
<4>[ 3353.640262] 3.17.0-rc5_drm-intel-fixes_8c875f_20140922_debug+ #2706 Not tainted
<4>[ 3353.640314]        [<ffffffffa0013a8f>] drm_dev_register+0x84/0xfd [drm]
<4>[ 3353.640320]        [<ffffffffa00162b1>] drm_get_pci_dev+0xfc/0x1b3 [drm]
<4>[ 3353.640343]        [<ffffffffa00163cd>] drm_pci_init+0x65/0xe8 [drm]
<4>[ 3353.640381] CPU: 0 PID: 4395 Comm: drv_module_relo Not tainted 3.17.0-rc5_drm-intel-fixes_8c875f_20140922_debug+ #2706
<4>[ 3353.689963] WARNING: CPU: 3 PID: 4433 at drivers/gpu/drm/drm_crtc.c:5113 drm_mode_config_cleanup+0x152/0x22b [drm]()
<4>[ 3353.689967] Modules linked in: dm_mod iTCO_wdt iTCO_vendor_support ppdev snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic pcspkr snd_hda_controller snd_hda_codec snd_hwdep snd_pcm i2c_i801 lpc_ich mfd_core snd_timer snd soundcore battery parport_pc parport wmi acpi_cpufreq i915(-) button video drm_kms_helper drm cfbfillrect cfbimgblt cfbcopyarea [last unloaded: snd_hda_intel]
<4>[ 3353.689993] CPU: 3 PID: 4433 Comm: rmmod Not tainted 3.17.0-rc5_drm-intel-fixes_8c875f_20140922_debug+ #2706
<4>[ 3353.690030]  [<ffffffffa001b677>] ? drm_mode_config_cleanup+0x152/0x22b [drm]
<4>[ 3353.690042]  [<ffffffffa001b677>] drm_mode_config_cleanup+0x152/0x22b [drm]
<4>[ 3353.690086]  [<ffffffffa00138a8>] drm_dev_unregister+0x29/0x9a [drm]
<4>[ 3353.690094]  [<ffffffffa001427c>] drm_put_dev+0x51/0x5d [drm]
<4>[ 3353.690168]  [<ffffffffa0015b16>] drm_pci_exit+0x46/0x8e [drm]
<4>[ 3353.691860] WARNING: CPU: 3 PID: 4433 at drivers/gpu/drm/i915/i915_dma.c:1905 i915_driver_unload+0x1b0/0x289 [i915]()
<4>[ 3353.691864] Modules linked in: dm_mod iTCO_wdt iTCO_vendor_support ppdev snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic pcspkr snd_hda_controller snd_hda_codec snd_hwdep snd_pcm i2c_i801 lpc_ich mfd_core snd_timer snd soundcore battery parport_pc parport wmi acpi_cpufreq i915(-) button video drm_kms_helper drm cfbfillrect cfbimgblt cfbcopyarea [last unloaded: snd_hda_intel]
<4>[ 3353.691889] CPU: 3 PID: 4433 Comm: rmmod Tainted: G        W      3.17.0-rc5_drm-intel-fixes_8c875f_20140922_debug+ #2706
<4>[ 3353.691958]  [<ffffffffa00138a8>] drm_dev_unregister+0x29/0x9a [drm]
<4>[ 3353.691966]  [<ffffffffa001427c>] drm_put_dev+0x51/0x5d [drm]
<4>[ 3353.692037]  [<ffffffffa0015b16>] drm_pci_exit+0x46/0x8e [drm]
<4>[ 3353.692087] WARNING: CPU: 3 PID: 4433 at drivers/gpu/drm/i915/i915_dma.c:1913 i915_driver_unload+0x1f4/0x289 [i915]()
<4>[ 3353.692090] Modules linked in: dm_mod iTCO_wdt iTCO_vendor_support ppdev snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic pcspkr snd_hda_controller snd_hda_codec snd_hwdep snd_pcm i2c_i801 lpc_ich mfd_core snd_timer snd soundcore battery parport_pc parport wmi acpi_cpufreq i915(-) button video drm_kms_helper drm cfbfillrect cfbimgblt cfbcopyarea [last unloaded: snd_hda_intel]
<4>[ 3353.692114] CPU: 3 PID: 4433 Comm: rmmod Tainted: G        W      3.17.0-rc5_drm-intel-fixes_8c875f_20140922_debug+ #2706
<4>[ 3353.692179]  [<ffffffffa00138a8>] drm_dev_unregister+0x29/0x9a [drm]
<4>[ 3353.692187]  [<ffffffffa001427c>] drm_put_dev+0x51/0x5d [drm]
<4>[ 3353.692256]  [<ffffffffa0015b16>] drm_pci_exit+0x46/0x8e [drm]
Comment 10 Paulo Zanoni 2014-10-14 21:20:20 UTC
Same comment as the one I posted on bug #83484:

Please test https://bugs.freedesktop.org/attachment.cgi?id=107840 , then post the dmesg here.

I tested this on BDW, and module_reload works for me on this machine, with this patch on top of drm-intel-nightly.
Comment 11 Guo Jinxian 2014-10-16 06:21:01 UTC
(In reply to Paulo Zanoni from comment #10)
> Same comment as the one I posted on bug #83484:
> 
> Please test https://bugs.freedesktop.org/attachment.cgi?id=107840 , then
> post the dmesg here.
> 
> I tested this on BDW, and module_reload works for me on this machine, with
> this patch on top of drm-intel-nightly.

The failure still able to reproduce with this patch on latest -nightly(2ea23cd593ba60ead60e2f796fae675aa4475b1a)


root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./drv_module_reload
unbinding /sys/class/vtconsole/vtcon1/: (M) frame buffer device
module successfully unloaded
libkmod: ERROR ../libkmod/libkmod.c:554 kmod_search_moddep: could not open moddep file '/lib/modules/3.17.0_kcloud_43c5c7_20141016+/modules.dep.bin'
Killed
root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# echo $?
137
Comment 12 Paulo Zanoni 2014-10-17 21:49:17 UTC
(In reply to Guo Jinxian from comment #11)
> (In reply to Paulo Zanoni from comment #10)
> > Same comment as the one I posted on bug #83484:
> > 
> > Please test https://bugs.freedesktop.org/attachment.cgi?id=107840 , then
> > post the dmesg here.
> > 
> > I tested this on BDW, and module_reload works for me on this machine, with
> > this patch on top of drm-intel-nightly.
> 
> The failure still able to reproduce with this patch on latest
> -nightly(2ea23cd593ba60ead60e2f796fae675aa4475b1a)
> 
> 
> root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests#
> ./drv_module_reload
> unbinding /sys/class/vtconsole/vtcon1/: (M) frame buffer device
> module successfully unloaded
> libkmod: ERROR ../libkmod/libkmod.c:554 kmod_search_moddep: could not open
> moddep file '/lib/modules/3.17.0_kcloud_43c5c7_20141016+/modules.dep.bin'
> Killed
> root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# echo $?
> 137

Does this file exist on your machine? Maybe your system setup is wrong? This really looks like a bug in your machine setup, not our driver.

Also, bug 83484 is a duplicate of this one?
Comment 13 Guo Jinxian 2014-10-21 06:32:10 UTC


(In reply to Paulo Zanoni from comment #12)
> (In reply to Guo Jinxian from comment #11)
> > (In reply to Paulo Zanoni from comment #10)
> > > Same comment as the one I posted on bug #83484:
> > > 
> > > Please test https://bugs.freedesktop.org/attachment.cgi?id=107840 , then
> > > post the dmesg here.
> > > 
> > > I tested this on BDW, and module_reload works for me on this machine, with
> > > this patch on top of drm-intel-nightly.
> > 
> > The failure still able to reproduce with this patch on latest
> > -nightly(2ea23cd593ba60ead60e2f796fae675aa4475b1a)
> > 
> > 
> > root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests#
> > ./drv_module_reload
> > unbinding /sys/class/vtconsole/vtcon1/: (M) frame buffer device
> > module successfully unloaded
> > libkmod: ERROR ../libkmod/libkmod.c:554 kmod_search_moddep: could not open
> > moddep file '/lib/modules/3.17.0_kcloud_43c5c7_20141016+/modules.dep.bin'
> > Killed
> > root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# echo $?
> > 137
> 
> Does this file exist on your machine? Maybe your system setup is wrong? This
> really looks like a bug in your machine setup, not our driver.
> 
> Also, bug 83484 is a duplicate of this one?

Yes, some modules don't install by default on our Ubuntu devices. Execute commands below then run the tests, the result passes.

apt-get install nfs-common nfs-kernel-server
/etc/init.d/client_module
mount -a 


root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./drv_module_reload
unbinding /sys/class/vtconsole/vtcon1/: (M) frame buffer device
module successfully unloaded
module successfully loaded again
root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# echo $?
0


[root@x-bsw01 tests]#./drv_module_reload
unbinding /sys/class/vtconsole/vtcon1/: (M) frame buffer device
module successfully unloaded
module successfully loaded again

[root@x-bsw01 tests]# echo $?
0
Comment 14 Jari Tahvanainen 2017-02-10 08:01:30 UTC
Closing (>2 years) old verified+fixed


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.