Bug 100431

Summary: nv50: memory corruption due to use-after-free of dma_fence
Product: xorg Reporter: Ard Biesheuvel <ard.biesheuvel>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED MOVED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: medium CC: emmanuel.pacaud, jeremy.booker
Version: unspecified   
Hardware: ARM   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Ard Biesheuvel 2017-03-28 09:06:46 UTC
Hello all,

I am trying to debug an elusive memory corruption bug on my arm64
machine which appears to be in the nouveau driver.

I got the following splat from the refcount debugging code:

"""
refcount_t: underflow; use-after-free.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 3366 at lib/refcount.c:128 refcount_sub_and_test+0xe8/0x108
Modules linked in: fuse nouveau ttm drm_kms_helper drm ip_tables x_tables ipv6

CPU: 0 PID: 3366 Comm: gnome-shell Not tainted 4.11.0-rc3-00407-g97da3854c526 #1
Hardware name: AMD Seattle/Seattle, BIOS 11:14:27 Mar 20 2017
task: ffffffd6cca5b600 task.stack: ffffffd6c8120000
PC is at refcount_sub_and_test+0xe8/0x108
LR is at refcount_sub_and_test+0xe8/0x108
pc : [<ffffffa1f81af7f0>] lr : [<ffffffa1f81af7f0>] pstate: 20000145
sp : ffffffd6c81237c0
x29: ffffffd6c81237c0 x28: 0000000000000028
x27: ffffffd6cca5b610 x26: ffffffd6b4417080
x25: ffffffd68715a000 x24: ffffffd6cca5b610
x23: ffffffd68b513680 x22: ffffffa1f8473a30
x21: ffffffd6c7cc9e00 x20: 0000000000000001
x19: ffffffd68b513500 x18: 0000000000000020
x17: 0000007fb64d3db0 x16: ffffffa1f8473f18
x15: 000000003040d230 x14: 0000000200000000
x13: 0000000200000010 x12: ffffffffffffffff
x11: 1ffffff43f3e8e1f x10: ffffff843f3e8e1f
x9 : dfffff9000000000 x8 : ffffffa1f9f470fc
x7 : 0000000000000000 x6 : ffffff843f3e8e20
x5 : ffffff843f3e8e20 x4 : 0000000000000000
x3 : 0000003600d52000 x2 : dfffff9000000000
x1 : ffffff8ad90246c6 x0 : 0000000000000026

---[ end trace d188d18d5d3d25db ]---
Call trace:
Exception stack(0xffffffd6c8123590 to 0xffffffd6c81236c0)
3580:                                   ffffffd68b513500 0000008000000000
35a0: ffffffd6c81237c0 ffffffa1f81af7f0 0000000020000145 000000000000003d
35c0: ffffffd68715a000 ffffffa1f7e591b8 0000000041b58ab3 ffffffa1f8fe11b8
35e0: ffffffa1f7cf15b8 ffffffa1f8473a30 ffffffd68b513680 ffffffd6cca5b610
3600: ffffffd68715a000 ffffffd6b4417080 ffffffd6c81237c0 ffffffd6c81237c0
3620: ffffffd6c8123780 00000000ffffffc8 0000000041b58ab3 ffffffa1f8fea138
3640: ffffffa1f7dc0540 ffffffa1f7cfad28 ffffffa1f7ee9ee8 ffffff9001162e10
3660: ffffff900113f084 ffffff9000ed84b8 ffffff900113200c ffffffa1f7f2c910
3680: ffffffa1f7f2d20c ffffffa1f7cf3730 0000000000000026 ffffff8ad90246c6
36a0: dfffff9000000000 0000003600d52000 0000000000000000 ffffff843f3e8e20
[<ffffffa1f81af7f0>] refcount_sub_and_test+0xe8/0x108
[<ffffffa1f81af824>] refcount_dec_and_test+0x14/0x20
[<ffffffa1f847405c>] reservation_object_add_excl_fence+0x144/0x1e0
[<ffffff900113cce0>] nouveau_bo_fence+0x50/0x60 [nouveau]
[<ffffff900113d1dc>] validate_fini_no_ticket+0xc4/0x190 [nouveau]
[<ffffff900113e1fc>] nouveau_gem_ioctl_pushbuf+0x49c/0x1c78 [nouveau]
[<ffffff9000ed84b8>] drm_ioctl+0x280/0x590 [drm]
[<ffffff900113200c>] nouveau_drm_ioctl+0x8c/0x100 [nouveau]
[<ffffffa1f7f2c910>] do_vfs_ioctl+0x130/0x9a0
[<ffffffa1f7f2d20c>] SyS_ioctl+0x8c/0xa0
[<ffffffa1f7cf3730>] el0_svc_naked+0x24/0x28
"""

Enabling KASAN gives some additional information, many reports similar to


"""
==================================================================
BUG: KASAN: use-after-free in nouveau_fence_sync+0x154/0x398 [nouveau]
                       at addr ffffffd69064f808
Read of size 8 by task gnome-shell/3366
CPU: 4 PID: 3366 Comm: gnome-shell Tainted: G        W
                                   4.11.0-rc3-00407-g97da3854c526 #1
Hardware name: AMD Seattle/Seattle, BIOS 11:14:27 Mar 20 2017
Call trace:
[<ffffffa1f7cfb1f8>] dump_backtrace+0x0/0x300
[<ffffffa1f7cfb50c>] show_stack+0x14/0x20
[<ffffffa1f8188788>] dump_stack+0xa8/0xd0
[<ffffffa1f7eea964>] kasan_object_err+0x24/0x80
[<ffffffa1f7eeabec>] kasan_report_error+0x1cc/0x4f0
[<ffffffa1f7eeb2e8>] kasan_report+0x38/0x40
[<ffffffa1f7ee985c>] __asan_load8+0x84/0x98
[<ffffff9001162a9c>] nouveau_fence_sync+0x154/0x398 [nouveau]
[<ffffff900113e904>] nouveau_gem_ioctl_pushbuf+0xba4/0x1c78 [nouveau]
[<ffffff9000ed84b8>] drm_ioctl+0x280/0x590 [drm]
[<ffffff900113200c>] nouveau_drm_ioctl+0x8c/0x100 [nouveau]
[<ffffffa1f7f2c910>] do_vfs_ioctl+0x130/0x9a0
[<ffffffa1f7f2d20c>] SyS_ioctl+0x8c/0xa0
[<ffffffa1f7cf3730>] el0_svc_naked+0x24/0x28
Object at ffffffd69064f800, in cache kmalloc-256 size: 256
Allocated:
PID = 3366
 save_stack_trace_tsk+0x0/0x220
 save_stack_trace+0x18/0x20
 kasan_kmalloc+0xd8/0x188
 nouveau_fence_new+0xb0/0x150 [nouveau]
 nouveau_gem_ioctl_pushbuf+0x1324/0x1c78 [nouveau]
 drm_ioctl+0x280/0x590 [drm]
 nouveau_drm_ioctl+0x8c/0x100 [nouveau]
 do_vfs_ioctl+0x130/0x9a0
 SyS_ioctl+0x8c/0xa0
 el0_svc_naked+0x24/0x28
Freed:
PID = 0
 save_stack_trace_tsk+0x0/0x220
 save_stack_trace+0x18/0x20
 kasan_slab_free+0x88/0x188
 kfree+0x70/0x1e0
 rcu_process_callbacks+0x290/0x6a8
 __do_softirq+0x1a0/0x328
Memory state around the buggy address:
 ffffffd69064f700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffffffd69064f780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffd69064f800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                      ^
 ffffffd69064f880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffffffd69064f900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
"""

I guess these are both results of the fact that a dma_fence object was
freed but still turns up in some list.
Comment 1 Ard Biesheuvel 2017-03-28 09:09:22 UTC
Additional info:
this happens with v4.10 and later, v4.9 is rock solid on the same hardware
Comment 2 Ben Skeggs 2017-04-05 08:30:53 UTC
Can you give this commit[1] a try?

Ben.

[1] https://github.com/skeggsb/nouveau/commit/23da66b47d9a7ca6a7c4a7f574a8842d8356fec5
Comment 3 Ard Biesheuvel 2017-04-05 11:30:55 UTC
Thanks Ben, the quoted commit applied onto v4.10.7 fixes the problem for me.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

I take it this will be tagged for -stable?

Thanks,
Ard.
Comment 4 Ben Skeggs 2017-04-05 11:57:07 UTC
(In reply to Ard Biesheuvel from comment #3)
> Thanks Ben, the quoted commit applied onto v4.10.7 fixes the problem for me.
> 
> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 
> I take it this will be tagged for -stable?
> 
> Thanks,
> Ard.

Great!  Thanks for testing.

Yup, I'll be sending Dave a fixes tree for 4.11 (with relevant commits Cc'd stable) tomorrow.
Comment 5 Martin Peres 2019-12-04 09:25:55 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/336.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.