Bug 65953

Summary: [GM45]igt/module_reload cause <3>[ 159.350832] [drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Daniel Vetter <daniel>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: xunx.fang, yangweix.shui
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
don't tear down un-initiaizled stolen drm_mm
none
dmesg
none
debug patch
none
dmesg with patch/2810001
none
dmesg with debug patch 81791 none

Description lu hua 2013-06-20 05:49:53 UTC
System Environment:
--------------------------
Arch:           x86_64
Platform:       GM45
Kernel: (drm-intel-next-queued)cab8b5862acd55019fbeede6940d1a601912d6b8

Bug detailed description:
-----------------------------
run ./module_reload,  <3>[  159.350832] [drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown appears in dmesg.
It happens on MG45 with dinq kernel, and works well on drm-intel-fixes kernel.
I can't find out a good commit.

output:
module successfully unloaded
module successfully loaded again


Reproduce steps:
----------------------------
1. ./module_reload
Comment 1 Daniel Vetter 2013-06-24 17:38:20 UTC
Does this work in -nightly?
Comment 2 lu hua 2013-06-25 02:11:17 UTC
Created attachment 81380 [details]
dmesg

It also happens on -nightly.
Comment 3 Daniel Vetter 2013-06-25 09:11:03 UTC
Hm, a few more questions:
- Does this happen even on a clean boot without running any other tests?
- If -fixes works but there's no working baseline it would be interesting to figure out where exactly it has been fixed in fixes. Currently the baseline for -fixes and -next-queued is 3.10-rc2, can you please test that kernel version, too?
Comment 4 lu hua 2013-06-26 02:56:49 UTC
(In reply to comment #3)
> Hm, a few more questions:
> - Does this happen even on a clean boot without running any other tests?
Yes, clean boot, then run ./module_reload

> - If -fixes works but there's no working baseline it would be interesting to
> figure out where exactly it has been fixed in fixes. Currently the baseline
> for -fixes and -next-queued is 3.10-rc2, can you please test that kernel
> version, too?
It also happens on drm-intel-fixes kernel. I test it on commit:76c425 and commit 19b2db.
It also happens on 3.10-rc2.
Comment 5 Daniel Vetter 2013-06-26 09:34:41 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > - If -fixes works but there's no working baseline it would be interesting to
> > figure out where exactly it has been fixed in fixes. Currently the baseline
> > for -fixes and -next-queued is 3.10-rc2, can you please test that kernel
> > version, too?
> It also happens on drm-intel-fixes kernel. I test it on commit:76c425 and
> commit 19b2db.
> It also happens on 3.10-rc2.

Oh, I've thought that -fixes work from the first comment: "and works well on drm-intel-fixes kernel". Sounds like it's broken everywhere. Thanks for the clarification.
Comment 6 Daniel Vetter 2013-06-27 23:04:37 UTC
Created attachment 81590 [details] [review]
don't tear down un-initiaizled stolen drm_mm

Please test the attached patch, thanks.
Comment 7 lu hua 2013-06-28 05:31:54 UTC
(In reply to comment #6)
> Created attachment 81590 [details] [review] [review]
> don't tear down un-initiaizled stolen drm_mm
> 
> Please test the attached patch, thanks.

It still exists.
Comment 8 Daniel Vetter 2013-06-28 10:07:11 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Created attachment 81590 [details] [review] [review] [review]
> > don't tear down un-initiaizled stolen drm_mm
> > 
> > Please test the attached patch, thanks.
> 
> It still exists.

Can you please double-check the patch with the following debug diff applied on top (and then attach dmesg afterwards)?

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index f9d4873..d303e31 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -699,8 +699,8 @@ void drm_mm_takedown(struct drm_mm * mm)
 {
        struct drm_mm_node *entry, *next;
 
-       if (!list_empty(&mm->head_node.node_list)) {
-               DRM_ERROR("Memory manager not clean. Delaying takedown\n");
+       if (WARN(!list_empty(&mm->head_node.node_list),
+                "Memory manager not clean. Delaying takedown\n")) {
                return;
        }
Comment 9 lu hua 2013-07-01 05:45:44 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > Created attachment 81590 [details] [review] [review] [review] [review]
> > > don't tear down un-initiaizled stolen drm_mm
> > > 
> > > Please test the attached patch, thanks.
> > 
> > It still exists.
> 
> Can you please double-check the patch with the following debug diff applied
> on top (and then attach dmesg afterwards)?
> 
> diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> index f9d4873..d303e31 100644
> --- a/drivers/gpu/drm/drm_mm.c
> +++ b/drivers/gpu/drm/drm_mm.c
> @@ -699,8 +699,8 @@ void drm_mm_takedown(struct drm_mm * mm)
>  {
>         struct drm_mm_node *entry, *next;
>  
> -       if (!list_empty(&mm->head_node.node_list)) {
> -               DRM_ERROR("Memory manager not clean. Delaying takedown\n");
> +       if (WARN(!list_empty(&mm->head_node.node_list),
> +                "Memory manager not clean. Delaying takedown\n")) {
>                 return;
>         }


Test it. It has <3>[  270.095214] [drm:intel_pipe_config_compare] *ERROR* mismatch in clock (expected 66285, found 0.
dmesg -r | egrep "<[1-3]>" |grep drm
<3>[  270.095214] [drm:intel_pipe_config_compare] *ERROR* mismatch in clock (expected 66285, found 0
<3>[  270.173051] [drm:intel_pipe_config_compare] *ERROR* mismatch in clock (expected 66285, found 0
<3>[  270.182479] [drm:intel_pipe_config_compare] *ERROR* mismatch in clock (expected 108000, found 0
<3>[  270.281188] [drm:intel_pipe_config_compare] *ERROR* mismatch in clock (expected 66285, found 0
<3>[  271.283053] [drm:intel_pipe_config_compare] *ERROR* mismatch in clock (expected 69300, found 0
Comment 10 lu hua 2013-07-01 05:46:19 UTC
Created attachment 81778 [details]
dmesg
Comment 11 Daniel Vetter 2013-07-01 08:56:45 UTC
Created attachment 81791 [details] [review]
debug patch

Please apply this patch on top of latest -nightly, reproduce the issue and then attach the dmesg.

The other backtraces are different issues, specifically all the modeset state mismatches. Do we have a bug report for those already?
Comment 12 Daniel Vetter 2013-07-01 20:39:31 UTC
Ok, now hopefully the real patch:

https://patchwork.kernel.org/patch/2810001/
Comment 13 lu hua 2013-07-02 06:19:58 UTC
(In reply to comment #12)
> Ok, now hopefully the real patch:
> 
> https://patchwork.kernel.org/patch/2810001/


Test with this patch.
./module_reload
output:
module successfully unloaded
module successfully loaded again

# echo $?
0

dmesg -r | egrep "<[1-3]>" |grep drm
<3>[11304.293153] [drm:intel_pipe_config_compare] *ERROR* mismatch in clock (expected 108000, found 0)
Comment 14 lu hua 2013-07-02 06:21:22 UTC
Created attachment 81844 [details]
dmesg with patch/2810001
Comment 15 lu hua 2013-07-02 06:58:20 UTC
Created attachment 81845 [details]
dmesg with debug patch 81791

Created attachment 81791 [details] [review] [review]
debug patch

Please apply this patch on top of latest -nightly, reproduce the issue and then attach the dmesg.

The other backtraces are different issues, specifically all the modeset state mismatches. Do we have a bug report for those already?


Run with debug patch 
output:
module successfully unloaded
module successfully loaded again

dmesg -r | egrep "<[1-3]>" |grep drm
<3>[   97.869226] [drm:intel_pipe_config_compare] *ERROR* mismatch in clock (expected 108000, found 0)
Comment 16 Chris Wilson 2013-07-03 15:46:37 UTC
commit 446f8d81ca2d9cefb614e87f2fabcc996a9e4e7e
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Jul 2 10:48:31 2013 +0200

    drm/i915: Don't try to tear down the stolen drm_mm if it's not there
Comment 17 lu hua 2013-07-05 03:24:51 UTC
Verified.Fixed.
Comment 18 Elizabeth 2017-10-06 14:45:44 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.