86507 – [HSW/BYT Bisected]igt/drv_module_reload causes WARNING "Memory manager not clean during takedown." + slab not clean

Bug 86507 - [HSW/BYT Bisected]igt/drv_module_reload causes WARNING "Memory manager not clean during takedown." + slab not clean

Summary: [HSW/BYT Bisected]igt/drv_module_reload causes WARNING "Memory manager not cl...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	high normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-11-21 03:48 UTC by Guo Jinxian
Modified:	2017-10-06 14:33 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg (119.14 KB, text/plain) 2014-11-21 03:48 UTC, Guo Jinxian	no flags	Details
View All

Description Guo Jinxian 2014-11-21 03:48:27 UTC

Created attachment 109787 [details]
dmesg

==System Environment==
--------------------------
Regression: Yes,
Good commit on -next-queued: 77c1aa84de0096792de673aa1c64c36b38553cf5(2014_11_19)

Non-working platforms: HSW

==kernel==
--------------------------
origin/drm-intel-nightly: 18748be7c96accc27327423c384f86a8fae99c35(fails)
    drm-intel-nightly: 2014y-11m-20d-21h-58m-44s UTC integration manifest
origin/drm-intel-next-queued: 89a35ecdc6aa5a88165313ca5cfd52b8e8e7fbbd(fails)
    drm/i915/g4x: fix g4x infoframe readout
origin/drm-intel-fixes: 0485c9dc24ec0939b42ca5104c0373297506b555(another bug 80517)
    drm/i915: Kick fbdev before vgacon

==Bug detailed description==
igt/drv_module_reload causes "WARNING: CPU: 5 PID: 4025 at drivers/gpu/drm/drm_mm.c:765 i915_global_gtt_cleanup+0x3a/0x80 [i915]()"

Output:
[root@x-hsw24 tests]# ./drv_module_reload
unbinding /sys/class/vtconsole/vtcon0/: (M) frame buffer device
module successfully unloaded
module successfully loaded again
[root@x-hsw24 tests]# echo $?
0
[root@x-hsw24 tests]# dmesg -r|egrep ""<[1-4]>""|grep drm
<4>[  198.502172] WARNING: CPU: 5 PID: 4025 at drivers/gpu/drm/drm_mm.c:765 i915_global_gtt_cleanup+0x3a/0x80 [i915]()
<4>[  198.502175] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc ipv6 dm_mod iTCO_wdt iTCO_vendor_support dcdbas snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi serio_raw pcspkr i2c_i801 snd_hda_controller snd_hda_codec lpc_ich snd_hwdep mfd_core snd_pcm shpchp snd_timer snd soundcore battery acpi_cpufreq i915(-) button video drm_kms_helper drm cfbfillrect cfbimgblt cfbcopyarea [last unloaded: snd_hda_intel]
<4>[  198.502199] CPU: 5 PID: 4025 Comm: rmmod Not tainted 3.18.0-rc5_drm-intel-nightly_18748b_20141121+ #1774
<4>[  198.502249]  [<ffffffffa0023959>] ? drm_modeset_unlock_all+0x41/0x50 [drm]
<4>[  198.502254]  [<ffffffffa001308d>] ? drm_dev_unregister+0x1e/0x8b [drm]
<4>[  198.502259]  [<ffffffffa001394f>] ? drm_put_dev+0x3e/0x47 [drm]
<4>[  198.502280]  [<ffffffffa00151a1>] ? drm_pci_exit+0x39/0x9c [drm]
<4>[  198.502337] CPU: 5 PID: 4025 Comm: rmmod Tainted: G    B   W      3.18.0-rc5_drm-intel-nightly_18748b_20141121+ #1774
<4>[  198.502381]  [<ffffffffa0023959>] ? drm_modeset_unlock_all+0x41/0x50 [drm]
<4>[  198.502385]  [<ffffffffa001308d>] ? drm_dev_unregister+0x1e/0x8b [drm]
<4>[  198.502390]  [<ffffffffa001394f>] ? drm_put_dev+0x3e/0x47 [drm]
<4>[  198.502408]  [<ffffffffa00151a1>] ? drm_pci_exit+0x39/0x9c [drm]
<4>[  198.502422] CPU: 5 PID: 4025 Comm: rmmod Tainted: G    B   W      3.18.0-rc5_drm-intel-nightly_18748b_20141121+ #1774
<4>[  198.502447]  [<ffffffffa0023959>] ? drm_modeset_unlock_all+0x41/0x50 [drm]
<4>[  198.502452]  [<ffffffffa001308d>] ? drm_dev_unregister+0x1e/0x8b [drm]
<4>[  198.502456]  [<ffffffffa001394f>] ? drm_put_dev+0x3e/0x47 [drm]
<4>[  198.502474]  [<ffffffffa00151a1>] ? drm_pci_exit+0x39/0x9c [drm]


==Reproduce steps==
---------------------------- 
1. ./drv_module_reload

Comment 1 Daniel Vetter 2014-11-21 08:48:18 UTC

Hm, I've noticed similar "Memory manager not clean" issues when reloading i915.ko on my snb just this week. Do you see this on other platforms, too?

Btw for WARNING there's often a 2nd line with some explanation, that should be the bug summary according to bug filing BKM. I've fixed it.

Bisect should definitely help here since it looks like a leak somewhere.

Comment 2 Guo Jinxian 2014-11-24 08:21:34 UTC

dcb4c12a687710ab745c2cdee8298c3e97f6f707 is the first bad commit
Author:     Oscar Mateo <oscar.mateo@intel.com>
AuthorDate: Thu Nov 13 10:28:10 2014 +0000
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Wed Nov 19 19:32:58 2014 +0100


    drm/i915/bdw: Pin the context backing objects to GGTT on-demand

    Up until now, we have pinned every logical ring context backing object
    during creation, and left it pinned until destruction. This made my life
    easier, but it's a harmful thing to do, because we cause fragmentation
    of the GGTT (and, eventually, we would run out of space).

    This patch makes the pinning on-demand: the backing objects of the two
    contexts that are written to the ELSP are pinned right before submission
    and unpinned once the hardware is done with them. The only context that
    is still pinned regardless is the global default one, so that the HWS can
    still be accessed in the same way (ring->status_page).

    v2: In the early version of this patch, we were pinning the context as
    we put it into the ELSP: on the one hand, this is very efficient because
    only a maximum two contexts are pinned at any given time, but on the other
    hand, we cannot really pin in interrupt time :(

    v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
    Do not unpin default context in free_request.

    v4: Break out pin and unpin into functions.  Fix style problems reported
    by checkpatch

    v5: Remove unpin_lock as all pinning and unpinning is done with the struct
    mutex already locked.  Add WARN_ONs to make sure this is the case in future.

    Issue: VIZ-4277
    Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
    Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
    Reviewed-by: Akash Goel <akash.goels@gmail.com>
    Reviewed-by: Deepak S<deepak.s@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 b99fffd2ff94e9c66c4797886726deb6cdf9d502 d3501260bc640ac87a2a95d13ecfc379caaf4d41 M      drivers

On it's parents commit(c86ee3a9f8cddcf2e637da19d6e7c05bdea11a96), another dmseg warning reproduced.

[root@x-hsw24 tests]# ./drv_module_reload
unbinding /sys/class/vtconsole/vtcon0/: (M) frame buffer device
module successfully unloaded
[root@x-hsw24 tests]# dmesg -r|egrep "<[1-4]>"|grep drm
<4>[   48.255113] WARNING: CPU: 5 PID: 3981 at drivers/gpu/drm/i915/intel_pm.c:6207 intel_disable_gt_powersave+0x33/0x37a [i915]()
<4>[   48.255117]  dm_mod snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi dcdbas serio_raw pcspkr i2c_i801 snd_hda_controller snd_hda_codec snd_hwdep lpc_ich shpchp mfd_core snd_pcm snd_timer snd soundcore battery acpi_cpufreq i915(-) button video drm_kms_helper drm [last unloaded: snd_hda_intel]
<4>[   48.255165]  [<ffffffffa0004dd3>] ? vblank_disable_and_save+0x170/0x17f [drm]
<4>[   48.255197]  [<ffffffffa0007527>] ? drm_dev_unregister+0x1e/0x8b [drm]
<4>[   48.255202]  [<ffffffffa0007757>] ? drm_put_dev+0x3e/0x47 [drm]
<4>[   48.255222]  [<ffffffffa000917a>] ? drm_pci_exit+0x38/0x98 [drm]

Comment 3 Guo Jinxian 2014-11-24 08:35:04 UTC

(In reply to Daniel Vetter from comment #1)
> Hm, I've noticed similar "Memory manager not clean" issues when reloading
> i915.ko on my snb just this week. Do you see this on other platforms, too?
> 
> Btw for WARNING there's often a 2nd line with some explanation, that should
> be the bug summary according to bug filing BKM. I've fixed it.
> 
> Bisect should definitely help here since it looks like a leak somewhere.

I checked on ILK SNB and BYT platforms, and only reproduce this on BYT platform.

Comment 4 Daniel Vetter 2014-11-24 14:25:17 UTC

(In reply to Guo Jinxian from comment #3)
> (In reply to Daniel Vetter from comment #1)
> > Hm, I've noticed similar "Memory manager not clean" issues when reloading
> > i915.ko on my snb just this week. Do you see this on other platforms, too?
> > 
> > Btw for WARNING there's often a 2nd line with some explanation, that should
> > be the bug summary according to bug filing BKM. I've fixed it.
> > 
> > Bisect should definitely help here since it looks like a leak somewhere.
> 
> I checked on ILK SNB and BYT platforms, and only reproduce this on BYT
> platform.

Hm, BYT is likely a different bug since the bisected commit is for gen8+ (bdw/bsw) only. Can you please file a new bug report for BYT? Also please check whether it's a regression and bisect if so.

Comment 5 Daniel Vetter 2014-11-25 13:16:33 UTC

commit 958f8cd96f979b20c45c55cba14bf8d8fbeca64f
Author: Thomas Daniel <thomas.daniel@intel.com>
Date:   Tue Nov 25 10:39:25 2014 +0000

    drm/i915: Fix context object leak for legacy contexts

Comment 6 Guo Jinxian 2014-12-01 01:58:37 UTC

Verified on latest -nightly(0db9cf7742874ee2c09a35b640c1bb04cb379eb6)


[root@x-hsw24 tests]# ./drv_module_reload
unbinding /sys/class/vtconsole/vtcon0/: (M) frame buffer device
module successfully unloaded
[root@x-hsw24 tests]# dmesg -r|egrep "<[1-4]>"|grep drm

Comment 7 Elizabeth 2017-10-06 14:33:37 UTC

Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.