58087 – [-next] nouveau corrupts kernel mm allocator

Bug 58087 - [-next] nouveau corrupts kernel mm allocator

Summary: [-next] nouveau corrupts kernel mm allocator

Status:	RESOLVED FIXED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/nouveau (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Nouveau Project
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-12-10 14:55 UTC by Peter Hurley
Modified:	2013-02-02 22:55 UTC (History)
CC List:	1 user (show)

See Also:	58984
i915 platform:
i915 features:

Attachments
kernel log showing BUG triggered by nouveau (9.33 KB, text/plain) 2012-12-10 14:55 UTC, Peter Hurley	no flags	Details
fix (655 bytes, patch) 2012-12-10 20:29 UTC, Marcin Slusarz	no flags	Details \| Splinter Review
View All

Description Peter Hurley 2012-12-10 14:55:27 UTC

Created attachment 71269 [details]
kernel log showing BUG triggered by nouveau

If nouveau_vm_new() fails in nouveau_drm_open(), the cleanup triggered corrupts the kernel slab allocator (in this case, SLUB).

Attached is the kernel log showing the page allocation failure and the subsequent BUG in mm/slub.c

A similar corruption had previously occurred which triggered a GP fault in the mm allocator from the same code path. This was reported as kernel bug #51291 here
https://bugzilla.kernel.org/show_bug.cgi?id=51291

Comment 1 Marcin Slusarz 2012-12-10 20:29:37 UTC

Created attachment 71290 [details] [review]
fix

Comment 2 Peter Hurley 2012-12-12 21:14:33 UTC

(In reply to comment #1)
> Created attachment 71290 [details] [review] [review]
> fix

-	vm = *pvm = kzalloc(sizeof(*vm), GFP_KERNEL);
+	vm = kzalloc(sizeof(*vm), GFP_KERNEL);

How/why *not* setting cli->base.vm to NULL fixes this?

Also, this assignment idiom is common in the nouveau driver code. Is the above fix just one of many necessary?

core/subdev/vm/base.c:	vm = *pvm = kzalloc(sizeof(*vm), GFP_KERNEL);
core/core/object.c:	object = *pobject = kzalloc(size, GFP_KERNEL);
core/core/ramht.c:	co = ho = nouveau_ramht_hash(ramht, chid, handle);
core/core/handle.c:	handle = *phandle = kzalloc(sizeof(*handle), GFP_KERNEL);
nouveau_abi16.c:		cli->abi16 = abi16 = kzalloc(sizeof(*abi16), GFP_KERNEL);
nouveau_bo.c:	struct nouveau_channel *chan = chan = drm->channel;  /* COMMENT: THIS ONE IS INTERESTING */
nouveau_chan.c:	chan = *pchan = kzalloc(sizeof(*chan), GFP_KERNEL);
nouveau_display.c:	disp = drm->display = kzalloc(sizeof(*disp), GFP_KERNEL);
nouveau_pm.c:	pm = drm->pm = kzalloc(sizeof(*pm), GFP_KERNEL);
nv04_fence.c:	priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);
nv10_fence.c:	fctx = chan->fence = kzalloc(sizeof(*fctx), GFP_KERNEL);
nv10_fence.c:	priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);
nv50_fence.c:	fctx = chan->fence = kzalloc(sizeof(*fctx), GFP_KERNEL);
nv50_fence.c:	priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);
nv84_fence.c:	fctx = chan->fence = kzalloc(sizeof(*fctx), GFP_KERNEL);
nv84_fence.c:	priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);
nvc0_fence.c:	fctx = chan->fence = kzalloc(sizeof(*fctx), GFP_KERNEL);
nvc0_fence.c:	priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);

Comment 3 Marcin Slusarz 2012-12-12 22:03:42 UTC

It's not a problem with vm allocation. The next one (vm->pgt) fails, so we free vm, leaving *pvm pointing at freed memory. When we get to nouveau_drm_open, we call nouveau_cli_destroy(cli), which tries to free cli->base.vm again. Oops.

I already checked other places and some of them also have this bug. I'll post fixes in a few days.

Comment 4 Peter Hurley 2012-12-12 22:51:27 UTC

(In reply to comment #3)
> It's not a problem with vm allocation. The next one (vm->pgt) fails, so we
> free vm, leaving *pvm pointing at freed memory. When we get to
> nouveau_drm_open, we call nouveau_cli_destroy(cli), which tries to free
> cli->base.vm again. Oops.

Thanks for the explanation. That makes sense to me now.

FYI, I did also file a bug in the kernel bugzilla for the memory allocation failure itself (kernel bug 51301 here https://bugzilla.kernel.org/show_bug.cgi?id=51301). A 32k allocation on a 10gb machine shouldn't really ever fail.

Comment 5 Marcin Slusarz 2013-02-02 22:55:23 UTC

Fixed in 3.8-rc2.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.