Bug 54101

Summary: [IVB] slub_debug + module_reload causes system hang with calltrace
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Daniel Vetter <daniel>
Status: CLOSED FIXED QA Contact:
Severity: major    
Priority: medium CC: ben, chris, daniel, jbarnes, xunx.fang
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
dmesg
none
Destroy CRTCs after planes. none

Description lu hua 2012-08-27 03:11:25 UTC
Created attachment 66156 [details]
dmesg

System Environment:
--------------------------
Arch:             x86_64
Platform:         Ivybridge(i7-3610QM)
Libdrm:	(master)libdrm-2.4.39-1-g7080bfdfd9b6c5f003daaef37ae9c329f2d46a6c
Mesa:	(master)a3685544e1e88828c4931059686cf3acc199079c
Xserver:(master)xorg-server-1.12.99.905
Xf86_video_intel:(master)2.20.4-58-g454cc8453af1852758c3396dbe303c13c5c1be27
Libva:	(staging)f12f80371fb534e6bbf248586b3c17c298a31f4e
Libva_intel_driver:(staging)82fa52510a37ab645daaa3bb7091ff5096a20d0b
Kernel:	(drm-intel-next-queued) 7788a765205f63abcb8645c16c85a968bd578f4f

Bug detailed description:
-------------------------
Run ./module_reload,system hangs and has calltrace in dmesg.
It happens on Ivybridge(i7-3610QM) with -queued kernel.  It doesn't happen on   -fixes kernel.
The last known good commit:20d5a540e55a29daeef12706f9ee73baf5641c16
The last known bad commit: 7788a765205f63abcb8645c16c85a968bd578f4f


 Call Trace:
[   42.254012]  [<ffffffffa00ccfd6>] sandybridge_update_wm+0x61/0x414 [i915]
[   42.254123]  [<ffffffffa00ce3ea>] intel_update_watermarks+0x19/0x1b [i915]
[   42.254235]  [<ffffffffa00d7e7b>] ivb_disable_plane+0x95/0x9e [i915]
[   42.254344]  [<ffffffffa00d785a>] intel_disable_plane+0x24/0x60 [i915]
[   42.254455]  [<ffffffffa00d78a5>] intel_destroy_plane+0xf/0x24 [i915]
[   42.254564]  [<ffffffffa005188f>] drm_mode_config_cleanup+0x147/0x17c [drm]
[   42.254676]  [<ffffffffa00bf3c2>] intel_modeset_cleanup+0xf7/0x104 [i915]
[   42.254782]  [<ffffffffa009d1e2>] i915_driver_unload+0xec/0x24b [i915]
[   42.254890]  [<ffffffffa004c1a3>] drm_put_dev+0xd2/0x1af [drm]
[   42.254993]  [<ffffffffa00991f6>] i915_pci_remove+0x18/0x1a [i915]
[   42.255093]  [<ffffffff81214858>] pci_device_remove+0x28/0x4c
[   42.255194]  [<ffffffff81292843>] __device_release_driver+0x67/0xba
[   42.255295]  [<ffffffff81292f43>] driver_detach+0x7e/0xa7
[   42.255395]  [<ffffffff812926aa>] bus_remove_driver+0x89/0xab
[   42.255494]  [<ffffffff812934c3>] driver_unregister+0x64/0x6d
[   42.255596]  [<ffffffff81214adf>] pci_unregister_driver+0x3f/0x84
[   42.255702]  [<ffffffffa004e1ee>] drm_pci_exit+0x3f/0x78 [drm]
[   42.255813]  [<ffffffffa00dc39f>] i915_exit+0x17/0x19 [i915]
[   42.255913]  [<ffffffff8107736c>] sys_delete_module+0x1a2/0x200
[   42.256015]  [<ffffffff8108b2a1>] ? __audit_syscall_entry+0x191/0x1bd
[   42.256115]  [<ffffffff813da522>] system_call_fastpath+0x16/0x1b
[   42.256214] Code: 48 8b bc f0 a8 28 00 00 48 8b 47 28 48 85 c0 74 06 80 7f 30 00 75 14 49 8b 40 18 41 89 03 49 8b 42 18 89 03 31 c0 e9 ca 00 00 00 <44> 8b 68 5c be 08 00 00 00 44 8b a7 80 00 00 00 4d 8b 72 20 44
[   42.259296] RIP  [<ffffffffa00cab8f>] g4x_compute_wm0+0x4d/0x122 [i915]
[   42.259449]  RSP <ffff88021a0f9bb0>
[   42.259551] ---[ end trace 391914c4b976e614 ]---
Comment 1 Daniel Vetter 2012-08-28 07:56:49 UTC
Is this still an issue on drm-intel-testing (that combines -fixes and -queued)? I have a feeling this is another case of the hw context stuff blowing up.
Comment 2 lu hua 2012-08-28 08:08:01 UTC
It also happens on (drm-intel-testing)ef6113ad0f406db4fbe2bcf3359dd938a6046d75.
Comment 3 Daniel Vetter 2012-08-28 08:20:15 UTC
In that case it sounds like a normal regression introduce in -queued. Can you please bisect this? Just looking at the dmesg&backtrace I have no idea what's going wrong here :(

Just to confirm: Does this happen even if you run the module reload test right after boot, i.e. with nothing else having run?
Comment 4 lu hua 2012-08-29 05:28:39 UTC
After boot, Run(only run) the module reload, This issue happens.
Comment 5 lu hua 2012-08-29 06:24:32 UTC
I tried to bisect it.Selected good commit:20d5a540e55a29daeef12706f9ee73baf5641c16   and bad commit:83358c85866ebd2af1229fc9870b93e126690671.

Bisect shows:
The merge base 6b16351acbd415e66ba16bf7d473ece1574cf0bc is bad.
This means the bug has been fixed between 6b16351acbd415e66ba16bf7d473ece1574cf0bc and [20d5a540e55a29daeef12706f9ee73baf5641c16]
Comment 6 Daniel Vetter 2012-08-29 07:48:41 UTC
Can you please double-check that 6b16351acbd415e66ba16bf7d473ece1574cf0bc is really bad. Also, please attach the bisect log:

$ git bisect log
Comment 7 lu hua 2012-08-30 09:15:02 UTC
For checking Bug 53526, add slub_debug on the cmdline causes this issue.
Remove this slub_debug, It doesn't happen.
Comment 8 lu hua 2012-08-31 05:18:04 UTC
It's not a regression, caused by slub_debug.
Comment 9 Chris Wilson 2012-09-13 17:22:37 UTC
Can you please retest now that the other mysterious slub_debug issue resolved itself.
Comment 10 lu hua 2012-09-17 03:04:38 UTC
It still happens on commit 2fc764a311ca1e51c69ac3ff2872ae49617f9b46(Merge: 1a9a08f 3d840a1)

[   42.362225] Call Trace:
[   42.362330]  [<ffffffffa01c4e11>] sandybridge_update_wm+0x61/0x414 [i915]
[   42.362434]  [<ffffffff810f8f16>] ? kfree+0xd2/0x12e
[   42.362542]  [<ffffffffa01ac56c>] ? intel_crtc_destroy+0x62/0x6b [i915]
[   42.362652]  [<ffffffffa01c6251>] intel_update_watermarks+0x19/0x1b [i915]
[   42.362766]  [<ffffffffa01d00a5>] ivb_disable_plane+0x95/0x9e [i915]
[   42.362877]  [<ffffffffa01cfa7e>] intel_disable_plane+0x24/0x62 [i915]
[   42.362987]  [<ffffffffa01cfacb>] intel_destroy_plane+0xf/0x24 [i915]
[   42.363097]  [<ffffffffa0053bf3>] drm_mode_config_cleanup+0x147/0x17c [drm]
[   42.363208]  [<ffffffffa01b698b>] intel_modeset_cleanup+0xf9/0x105 [i915]
[   42.363315]  [<ffffffffa0192282>] i915_driver_unload+0xee/0x24f [i915]
[   42.363424]  [<ffffffffa004e413>] drm_put_dev+0xd2/0x1af [drm]
[   42.363529]  [<ffffffffa018e1eb>] i915_pci_remove+0x18/0x1a [i915]
[   42.363631]  [<ffffffff8122a5f9>] pci_device_remove+0x28/0x4c
[   42.363732]  [<ffffffff812a9b23>] __device_release_driver+0x67/0xba
[   42.363834]  [<ffffffff812aa247>] driver_detach+0x8f/0xb8
[   42.363936]  [<ffffffff812a9986>] bus_remove_driver+0x8c/0xae
[   42.364036]  [<ffffffff812aa7cb>] driver_unregister+0x64/0x6d
[   42.364137]  [<ffffffff8122a88d>] pci_unregister_driver+0x3f/0x88
[   42.364245]  [<ffffffffa00504e2>] drm_pci_exit+0x3f/0x78 [drm]
[   42.364356]  [<ffffffffa01d4897>] i915_exit+0x17/0x19 [i915]
[   42.364458]  [<ffffffff81081532>] sys_delete_module+0x1a6/0x204
[   42.364559]  [<ffffffff81079400>] ? trace_hardirqs_on_caller+0x11e/0x155
[   42.364662]  [<ffffffff81095e68>] ? __audit_syscall_entry+0x191/0x1bd
[   42.364766]  [<ffffffff813de062>] system_call_fastpath+0x16/0x1b
[   42.364866] Code: 48 8b bc f0 78 36 00 00 48 8b 47 28 48 85 c0 74 06 80 7f 30 00 75 14 49 8b 40 18 41 89 03 49 8b 42 18 89 03 31 c0 e9 ca 00 00 00 <44> 8b 68 5c be 08 00 00 00 44 8b a7 80 00 00 00 4d 8b 72 20 44
[   42.367978] RIP  [<ffffffffa01c296b>] g4x_compute_wm0+0x4d/0x122 [i915]
[   42.368132]  RSP <ffff880225125bb0>
[   42.368236] ---[ end trace 6eaebf99d70dd4de ]---
Comment 11 lu hua 2012-09-17 06:59:56 UTC
Created attachment 67268 [details]
dmesg
Comment 12 Chris Wilson 2012-09-17 09:40:04 UTC
Created attachment 67271 [details] [review]
Destroy CRTCs after planes.
Comment 13 lu hua 2012-09-18 02:40:17 UTC
(In reply to comment #12)
> Created attachment 67271 [details] [review] [review]
> Destroy CRTCs after planes.

Add this patch into -queued kernel(commit:cd9a8c405d532bb74c6e243),This issue goes away.
Comment 14 Chris Wilson 2012-10-02 14:48:46 UTC
commit 3184009c36da413724f283e3c7ac9cc60c623bc4
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Sep 17 09:38:03 2012 +0000

    drm: Destroy the planes prior to destroying the associated CRTC
    
    As during the plane cleanup, we wish to disable the hardware and
    so may modify state on the associated CRTC, that CRTC must continue to
    exist until we are finished.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54101
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
    Cc: stable@vger.kernel.org
    Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Tested-by: lu hua <huax.lu@intel.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
Comment 15 lu hua 2012-10-19 07:26:20 UTC
(In reply to comment #14)
> commit 3184009c36da413724f283e3c7ac9cc60c623bc4
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Sep 17 09:38:03 2012 +0000
> 
>     drm: Destroy the planes prior to destroying the associated CRTC
>     
>     As during the plane cleanup, we wish to disable the hardware and
>     so may modify state on the associated CRTC, that CRTC must continue to
>     exist until we are finished.
>     
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54101
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
>     Cc: stable@vger.kernel.org
>     Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
>     Tested-by: lu hua <huax.lu@intel.com>
>     Signed-off-by: Dave Airlie <airlied@redhat.com>

Did you cherry pick to -queued/fixed branch?
Comment 16 Daniel Vetter 2012-10-19 14:39:23 UTC
The patch has been merged through the drm-next tree and is included in 3.7-rc1. Until I roll the intel trees forward, you can test Linus' upstream tree to confirm the fix.
Comment 17 lu hua 2012-10-23 07:46:40 UTC
Fixed on upstream tree. Tested on commit 6f0c0580b70
commit 6f0c0580b70c89094b3422ba81118c7b959c7556
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sat Oct 20 12:11:32 2012 -0700
  
  Linux 3.7-rc2
Comment 18 lu hua 2012-11-08 08:48:42 UTC
Verified.Fixed on -nightly branch commit b5a833707960154164cf450647c76547be43a167.
Comment 19 Jari Tahvanainen 2017-09-04 10:12:26 UTC
Closing old verified+fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.