Bug 51018 - kernel >=3.4.0: nouveau triggers kernel BUG in slub.c on GTX 560 Ti
Summary: kernel >=3.4.0: nouveau triggers kernel BUG in slub.c on GTX 560 Ti
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-12 15:12 UTC by Jiri Dluhos
Modified: 2012-06-28 08:41 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Result of "dmesg" (running kernel 3.4.2 with slub_debug=FZ) (70.51 KB, text/plain)
2012-06-12 15:22 UTC, Jiri Dluhos
no flags Details
Dmesg containing the SLUB warning. (74.01 KB, text/plain)
2012-06-17 11:30 UTC, Jiri Dluhos
no flags Details

Description Jiri Dluhos 2012-06-12 15:12:21 UTC
Observed on a workstation running Gentoo, with a Zotac GTX 560 Ti card.

After upgrading kernel to 3.4.0, the machine freezes during startup, approximately during udev initialization. Seemingly randomly the screen either goes completely blank and the monitor shuts down, or a kernel BUG report appears, stating approximately the following:

kernel BUG at mm/slub.c:3474!
invalid opcode: 0000 [#1] SMP
.
.
.
Call trace:
    sysfs_release+0xa1/0xc0
    fput+0xd2/0x240
    filp_close+0x61/0x90
    sys_close+0x7b/0xd0
    system_call_fastpath+0x16/0x1b
RIP: kfree+0xab/0xb0

(The exact location in mm/slub.c alternates between 3471 and 3474 according to the kernel version.)

I suppose that the same BUG occurs when the screen goes blank, only it is not visible. In all cases, the machine becomes unresponsive, except for the MagicSysRq combo which works as expected.

The problem occurs also with the new kernel 3.5.0-rc2. I never observed it with kernel 3.3.7 or older.

The problem disappears if any of these conditions is met:

* the "slub_debug" kernel option is enabled, or
* the SLAB memory manager is used instead of SLUB, or
* the nouveau.ko module is disabled.

When the nouveau.ko module is disabled, the machine passes the whole boot into text mode, and after nouveau is loaded manually with modprobe, it stays working until X is started, in which case the same crash occurs.

No error messages are written to syslog, even with the "slub_debug" kernel option (the bug just disappears), even if the crash is invoked by modprobing the module manually and starting X.
Comment 1 Jiri Dluhos 2012-06-12 15:22:10 UTC
Created attachment 62945 [details]
Result of "dmesg" (running kernel 3.4.2 with slub_debug=FZ)

Result of "dmesg" on the machine.
Comment 2 Jiri Dluhos 2012-06-17 11:30:25 UTC
Created attachment 63140 [details]
Dmesg containing the SLUB warning.

Accidentally, I have observed a SLUB warning (a double freed slab pointer) after booting the system multiple times; it seems to appear roughly at the point where the machine stops booting without the slub_debug feature. Copy of dmesg is attached.
Comment 3 WorMzy Tykashi 2012-06-19 16:04:18 UTC
Same problem here, also on a GTX 560 Ti. Can confirm that slub_debug kernel option allows the system to boot, although on my system it then crashes when SLiM is loaded/X is started, and appears to show garbled images from a previous X session (somehow?).

Kernels 3.4+ (all the way up to 3.5.0-rc2, haven't tried rc3 yet) appear to be affected, but the 3.3.x kernels are not.
Comment 4 Vlad K 2012-06-20 22:45:49 UTC
I had similar problems with my GTX 560 as well after switch to 3.4.X. Monitor would go blank, or machine would reboot, or BUG screen appeared randomly (never bothered to save it). After bisecting, issue is due to a removed variable in drm code (https://bugzilla.kernel.org/show_bug.cgi?id=43353). Can you guys check if you are having same issue? 


include/drm/drm_fb_helper.h:

struct drm_fb_helper_crtc {
        uint32_t crtc_id;
        struct drm_mode_set mode_set;
        struct drm_display_mode *desired_mode;
};
Comment 5 Jiri Dluhos 2012-06-23 05:03:24 UTC
I can confirm that after adding crtc_id back to the drm_fb_helper_crtc, everything started to work again (or at least, it booted into X and so far works without any problem).

Excellent work Vlad, thanks a lot!
Comment 6 Jiri Dluhos 2012-06-23 05:05:06 UTC
To be precise, I reverted the second part of this patch:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=4f988d132d2668b4f3b42bfc70daa531115ccca1

I.e. added "uint32_t crtc_id" back to the front of the drm_fb_helper_crtc structure. Reverting the first part of the patch (adding crtc_id to the enumeration loop) seems unnecessary, so it looks like the 32 bits work just like a padding. :-)
Comment 7 Vlad K 2012-06-23 23:17:41 UTC
Yeah drm_fb_helper.c doesn't seem to have any effect for me.

Also, I am unable to boot with dual screens (both DVI used), screen goes "No Signal" after initial refresh/modprobe. As soon as I unplug second screen, image appears on first monitor and I can proceed booting system. Once in WM, second monitor can be plugged in and used normally. Are you having same issue as well?
Comment 8 Ionut Biru 2012-06-25 07:05:52 UTC
With the help of David, I managed to get some interesting facts.

After applying the http://pkgbuild.com/~ioni/airlied1.patch, the returning "alloced 1 2 4 3"
"conn count 3"

After modifying nv_two_heads arguments, like: http://pkgbuild.com/~ioni/airlied2.patch I don't have any more crashes.

"alloced 2 2 4 3"
"conn count 3"

Drm dmesg: http://pkgbuild.com/~ioni/drm
Comment 9 Marcin Slusarz 2012-06-28 08:41:04 UTC
Fixed by Ben Skeggs: 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=9bd0c15fcfb42f6245447c53347d65ad9e72080b

It will appear in 3.4-stable soon (3.4.5 or 3.4.6).


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.