Bug 58087 - [-next] nouveau corrupts kernel mm allocator
Summary: [-next] nouveau corrupts kernel mm allocator
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-10 14:55 UTC by Peter Hurley
Modified: 2013-02-02 22:55 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
kernel log showing BUG triggered by nouveau (9.33 KB, text/plain)
2012-12-10 14:55 UTC, Peter Hurley
no flags Details
fix (655 bytes, patch)
2012-12-10 20:29 UTC, Marcin Slusarz
no flags Details | Splinter Review

Description Peter Hurley 2012-12-10 14:55:27 UTC
Created attachment 71269 [details]
kernel log showing BUG triggered by nouveau

If nouveau_vm_new() fails in nouveau_drm_open(), the cleanup triggered corrupts the kernel slab allocator (in this case, SLUB).

Attached is the kernel log showing the page allocation failure and the subsequent BUG in mm/slub.c

A similar corruption had previously occurred which triggered a GP fault in the mm allocator from the same code path. This was reported as kernel bug #51291 here
https://bugzilla.kernel.org/show_bug.cgi?id=51291
Comment 1 Marcin Slusarz 2012-12-10 20:29:37 UTC
Created attachment 71290 [details] [review]
fix
Comment 2 Peter Hurley 2012-12-12 21:14:33 UTC
(In reply to comment #1)
> Created attachment 71290 [details] [review] [review]
> fix

-	vm = *pvm = kzalloc(sizeof(*vm), GFP_KERNEL);
+	vm = kzalloc(sizeof(*vm), GFP_KERNEL);

How/why *not* setting cli->base.vm to NULL fixes this?

Also, this assignment idiom is common in the nouveau driver code. Is the above fix just one of many necessary?

core/subdev/vm/base.c:	vm = *pvm = kzalloc(sizeof(*vm), GFP_KERNEL);
core/core/object.c:	object = *pobject = kzalloc(size, GFP_KERNEL);
core/core/ramht.c:	co = ho = nouveau_ramht_hash(ramht, chid, handle);
core/core/handle.c:	handle = *phandle = kzalloc(sizeof(*handle), GFP_KERNEL);
nouveau_abi16.c:		cli->abi16 = abi16 = kzalloc(sizeof(*abi16), GFP_KERNEL);
nouveau_bo.c:	struct nouveau_channel *chan = chan = drm->channel;  /* COMMENT: THIS ONE IS INTERESTING */
nouveau_chan.c:	chan = *pchan = kzalloc(sizeof(*chan), GFP_KERNEL);
nouveau_display.c:	disp = drm->display = kzalloc(sizeof(*disp), GFP_KERNEL);
nouveau_pm.c:	pm = drm->pm = kzalloc(sizeof(*pm), GFP_KERNEL);
nv04_fence.c:	priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);
nv10_fence.c:	fctx = chan->fence = kzalloc(sizeof(*fctx), GFP_KERNEL);
nv10_fence.c:	priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);
nv50_fence.c:	fctx = chan->fence = kzalloc(sizeof(*fctx), GFP_KERNEL);
nv50_fence.c:	priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);
nv84_fence.c:	fctx = chan->fence = kzalloc(sizeof(*fctx), GFP_KERNEL);
nv84_fence.c:	priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);
nvc0_fence.c:	fctx = chan->fence = kzalloc(sizeof(*fctx), GFP_KERNEL);
nvc0_fence.c:	priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);
Comment 3 Marcin Slusarz 2012-12-12 22:03:42 UTC
It's not a problem with vm allocation. The next one (vm->pgt) fails, so we free vm, leaving *pvm pointing at freed memory. When we get to nouveau_drm_open, we call nouveau_cli_destroy(cli), which tries to free cli->base.vm again. Oops.

I already checked other places and some of them also have this bug. I'll post fixes in a few days.
Comment 4 Peter Hurley 2012-12-12 22:51:27 UTC
(In reply to comment #3)
> It's not a problem with vm allocation. The next one (vm->pgt) fails, so we
> free vm, leaving *pvm pointing at freed memory. When we get to
> nouveau_drm_open, we call nouveau_cli_destroy(cli), which tries to free
> cli->base.vm again. Oops.

Thanks for the explanation. That makes sense to me now.

FYI, I did also file a bug in the kernel bugzilla for the memory allocation failure itself (kernel bug 51301 here https://bugzilla.kernel.org/show_bug.cgi?id=51301). A 32k allocation on a 10gb machine shouldn't really ever fail.
Comment 5 Marcin Slusarz 2013-02-02 22:55:23 UTC
Fixed in 3.8-rc2.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.