Bug 71208

Summary: [regression] module_reload segfault
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: dh.herrmann, intel-gfx-bugs
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
dmesg none

Description lu hua 2013-11-04 09:12:26 UTC
Created attachment 88587 [details]
dmesg

System Environment:
--------------------------
Platform: Haswell
Kernel:   (drm-intel-nightly)265a909ea074a346af0c40e41c7da10464046277

Bug detailed description:
-----------------------------
It happens on haswell with -nightly and drm-next branch. It doesn't happens on -queued and -fixes kernel.

Run the first cycle, It causes call trace:
output:
module successfully unloaded
module successfully loaded again

dmesg:
[   31.122255] ------------[ cut here ]------------
[   31.122373] WARNING: CPU: 2 PID: 3913 at fs/sysfs/file.c:498 sysfs_attr_ns+0x25/0x8c()
[   31.122545] sysfs: kobject \xffffffd0\xffffff85\xffffff85\x04\xffffff9e without dirent
[   31.122660] Modules linked in: netconsole configfs ipv6 dm_mod snd_hda_codec_realtek pcspkr snd_hda_codec_hdmi i2c_i801 iTCO_wdt iTCO_vendor_support lpc_ich mfd_core snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915(-) video button drm_kms_helper drm freq_table [last unloaded: snd_hda_intel]
[   31.124140] CPU: 2 PID: 3913 Comm: rmmod Not tainted 3.12.0-rc7_nightlytop_265a90_20131104_+ #2203
[   31.124317] Hardware name: Intel Corporation Shark Bay Client platform/Flathead Creek Crb, BIOS HSWLPTU1.86C.0131.R03.1307262359 07/26/2013
[   31.124498]  0000000000000000 0000000000000009 ffffffff8170464c ffff880243723cd8
[   31.124827]  ffffffff8103319e ffff880200000000 ffffffff8112f8b7 ffff880243723ca8
[   31.125153]  ffffffff81adc2b0 ffff88009e073610 ffff88009e0c7a00 ffff88009e13db40
[   31.125477] Call Trace:
[   31.125579]  [<ffffffff8170464c>] ? dump_stack+0x41/0x51
[   31.125693]  [<ffffffff8103319e>] ? warn_slowpath_common+0x73/0x8b
[   31.125809]  [<ffffffff8112f8b7>] ? sysfs_attr_ns+0x25/0x8c
[   31.125916]  [<ffffffff8103324e>] ? warn_slowpath_fmt+0x45/0x4a
[   31.126028]  [<ffffffff8112f8b7>] ? sysfs_attr_ns+0x25/0x8c
[   31.126142]  [<ffffffff8112f934>] ? sysfs_remove_file+0x16/0x32
[   31.126255]  [<ffffffff8136ab85>] ? device_del+0x114/0x17a
[   31.126368]  [<ffffffff8136abf4>] ? device_unregister+0x9/0x12
[   31.126487]  [<ffffffffa000befc>] ? drm_sysfs_connector_remove+0x7d/0x89 [drm]
[   31.126672]  [<ffffffffa00905fc>] ? intel_modeset_cleanup+0xb9/0xe2 [i915]
[   31.126796]  [<ffffffffa00628ca>] ? i915_driver_unload+0xb6/0x2a7 [i915]
[   31.126909]  [<ffffffffa000954e>] ? drm_dev_unregister+0x21/0xd0 [drm]
[   31.127025]  [<ffffffffa0009ba6>] ? drm_put_dev+0x48/0x51 [drm]
[   31.127137]  [<ffffffff812eb190>] ? pci_device_remove+0x24/0x48
[   31.127251]  [<ffffffff8136d34f>] ? __device_release_driver+0x68/0xc1
[   31.127364]  [<ffffffff8136da29>] ? driver_detach+0x6e/0x99
[   31.127476]  [<ffffffff8136d1ab>] ? bus_remove_driver+0x78/0xb9
[   31.127590]  [<ffffffff812eb2c5>] ? pci_unregister_driver+0x17/0x75
[   31.127703]  [<ffffffffa000b210>] ? drm_pci_exit+0x3b/0x72 [drm]
[   31.127817]  [<ffffffff81077d1d>] ? SyS_delete_module+0x1a3/0x219
[   31.127929]  [<ffffffff81709ef2>] ? page_fault+0x22/0x30
[   31.128039]  [<ffffffff8170ebe2>] ? system_call_fastpath+0x16/0x1b
[   31.128152] ---[ end trace 67e4404f3dd63a8a ]---


Run the 2nd cycle, It segfault.
output:
./module_reload: line 27: 18675 Segmentation fault      rmmod i915
WARNING: i915.ko still loaded!

Reproduce steps:
----------------------------
1. ./module_reload
Comment 1 Daniel Vetter 2013-11-04 09:57:08 UTC
Should be fixed with

commit 9d6104e0174b130ed864571b31811c3fd09fd611
Author: Thierry Reding <thierry.reding@gmail.com>
Date:   Wed Oct 30 11:59:05 2013 +0100

    drm/sysfs: Do not drop device reference twice

Note that this patch is merged into drm-next, hence it's only in drm-intel-nightly but not in any of the other intel branches.
Comment 2 lu hua 2013-11-11 07:07:05 UTC
Created attachment 88992 [details]
dmesg

Test on latest -nightly branch. It still fails on -nightly branch. 
output:
./drv_module_reload: line 27:  3995 Killed                  rmmod i915
WARNING: i915.ko still loaded!
./drv_module_reload: line 43: /sys/class/vtconsole/vtcon1/bind: No such file or directory

Call Trace:
[   42.883496]  [<ffffffffa000996a>] ? drm_put_minor+0x35/0x40 [drm]
[   42.883587]  [<ffffffffa0009985>] ? drm_dev_free+0x10/0x66 [drm]
[   42.883677]  [<ffffffff812eb020>] ? pci_device_remove+0x24/0x48
[   42.883768]  [<ffffffff8136d1df>] ? __device_release_driver+0x68/0xc1
[   42.883858]  [<ffffffff8136d8b9>] ? driver_detach+0x6e/0x99
[   42.883946]  [<ffffffff8136d03b>] ? bus_remove_driver+0x78/0xb9
[   42.884035]  [<ffffffff812eb155>] ? pci_unregister_driver+0x17/0x75
[   42.884127]  [<ffffffffa000b208>] ? drm_pci_exit+0x3b/0x72 [drm]
[   42.884217]  [<ffffffff81077d1d>] ? SyS_delete_module+0x1a3/0x219
[   42.884307]  [<ffffffff81047783>] ? task_work_run+0x78/0x89
[   42.884396]  [<ffffffff81709f72>] ? page_fault+0x22/0x30
[   42.884484]  [<ffffffff8170ec62>] ? system_call_fastpath+0x16/0x1b
[   42.884573] Code: 18 be 2f 00 00 00 48 c7 c7 a4 83 02 a0 e8 7a 99 02 e1 c6 05 be 36 02 00 01 48 89 d8 5b c3 48 85 ff 53 48 89 fb 74 26 48 8b 47 10 <f6> 40 4c 02 74 1c e8 63 d3 00 00 48 89 df e8 4c 2d 00 00 8b 33
[   42.886706] RIP  [<ffffffffa0009855>] drm_unplug_minor+0xd/0x31 [drm]
[   42.886827]  RSP <ffff8800881f5e18>
[   42.886911] CR2: 000000000000004c
[   42.887000] ---[ end trace 04d77421c25f93bf ]---
Comment 3 lu hua 2013-11-12 03:28:48 UTC
(In reply to comment #2)
> Created attachment 88992 [details]
> dmesg
> 
> Test on latest -nightly branch. It still fails on -nightly branch. 
> output:
> ./drv_module_reload: line 27:  3995 Killed                  rmmod i915
> WARNING: i915.ko still loaded!
> ./drv_module_reload: line 43: /sys/class/vtconsole/vtcon1/bind: No such file
> or directory
> 

This fail happens on all platform with -nightly kernel.
Comment 4 Daniel Vetter 2013-11-12 08:34:23 UTC
Can you please bisect where this regression has been introduced?
Comment 5 lu hua 2013-11-13 07:08:48 UTC
(In reply to comment #2)
> Created attachment 88992 [details]
> dmesg
> 
> Test on latest -nightly branch. It still fails on -nightly branch. 
> output:
> ./drv_module_reload: line 27:  3995 Killed                  rmmod i915
> WARNING: i915.ko still loaded!
> ./drv_module_reload: line 43: /sys/class/vtconsole/vtcon1/bind: No such file
> or directory
> 


Bisect shows: 8f6599da8e772fa8de54cdf98e9e03cbaf3946da is the first bad commit.
commit 8f6599da8e772fa8de54cdf98e9e03cbaf3946da
Author:     David Herrmann <dh.herrmann@gmail.com>
AuthorDate: Sun Oct 20 18:55:45 2013 +0200
Commit:     Dave Airlie <airlied@redhat.com>
CommitDate: Wed Nov 6 14:53:25 2013 +1000

    drm: delay minor destruction to drm_dev_free()

    Instead of freeing minors in drm_dev_unregister(), we only unplug them and
    delay the free to drm_dev_free(). Note that if drm_dev_register() has
    never been called, minors are NULL and this has no effect.

    This change is needed to allow early device unregistration. If we want to
    call drm_dev_unregister() on live devices, we need to guarantee that
    minors are still valid (but unplugged). This way, any open file can still
    access file_priv->minor->dev to get the DRM device. However, the minor is
    unplugged so no new users can occur.

    Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
Comment 6 Daniel Vetter 2013-11-13 10:44:01 UTC
Please test https://patchwork.kernel.org/patch/3177821/
Comment 7 lu hua 2013-11-14 07:51:57 UTC
(In reply to comment #6)
> Please test https://patchwork.kernel.org/patch/3177821/

Test on drm-next branch.
Fixed by this patch.
Comment 8 Daniel Vetter 2013-11-16 12:37:02 UTC
Fix laned in drm-next, which is included in -nightly.
Comment 9 lu hua 2013-11-19 03:29:56 UTC
Verified.Fixed.
Comment 10 lu hua 2013-12-11 02:36:02 UTC
This issue still happens on -fixes kernel.
output:
./drv_module_reload: line 27:  3861 Killed                  rmmod i915
WARNING: i915.ko still loaded!
./drv_module_reload: line 43: /sys/class/vtconsole/vtcon1/bind: No such file or directory

Call Trace:
[   46.531866]  [<ffffffffa000996a>] ? drm_put_minor+0x35/0x40 [drm]
[   46.531957]  [<ffffffffa0009985>] ? drm_dev_free+0x10/0x66 [drm]
[   46.532048]  [<ffffffff812eb5b1>] ? pci_device_remove+0x38/0x80
[   46.532138]  [<ffffffff8136e0cd>] ? __device_release_driver+0x82/0xdb
[   46.532228]  [<ffffffff8136e7c1>] ? driver_detach+0x6e/0x99
[   46.532317]  [<ffffffff8136df0f>] ? bus_remove_driver+0x78/0xb9
[   46.532407]  [<ffffffff812eb70a>] ? pci_unregister_driver+0x17/0x75
[   46.532498]  [<ffffffffa000b208>] ? drm_pci_exit+0x3b/0x72 [drm]
[   46.532589]  [<ffffffff81077d19>] ? SyS_delete_module+0x1a3/0x219
[   46.532679]  [<ffffffff81047783>] ? task_work_run+0x78/0x89
[   46.532769]  [<ffffffff8170f432>] ? page_fault+0x22/0x30
[   46.532858]  [<ffffffff81714122>] ? system_call_fastpath+0x16/0x1b
[   46.532946] Code: 18 be 2f 00 00 00 48 c7 c7 24 84 02 a0 e8 7a 99 02 e1 c6 05 2e 37 02 00 01 48 89 d8 5b c3 48 85 ff 53 48 89 fb 74 26 48 8b 47 10 <f6> 40 4c 02 74 1c e8 e7 d3 00 00 48 89 df e8 4c 2d 00 00 8b 33
[   46.535086] RIP  [<ffffffffa0009855>] drm_unplug_minor+0xd/0x31 [drm]
[   46.535207]  RSP <ffff8802530bde18>
[   46.535291] CR2: 000000000000004c
[   46.535380] ---[ end trace 3ef6dc145d6160c4 ]---
Comment 11 Daniel Vetter 2013-12-11 11:08:46 UTC
-fixes hasn't rolled forward to the latest drm-fixes yet, which contains the fix. So this is expected.
Comment 12 Elizabeth 2017-10-06 14:42:17 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.