Bug 101511

Summary: [BAT][BYT] igt@gem_exec_reloc@basic-cpu-gtt-active generated a general protection fault
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: highest CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BYT i915 features: GEM/Other
Attachments:
Description Flags
boot kernel log
none
kenrel log while running IGT none

Description Martin Peres 2017-06-20 06:46:35 UTC
While executing igt@gem_exec_reloc@basic-cpu-gtt-active on the kernel 4.12.0-rc5-CI-CI_DRM_2737+, fi-byt-j1900 produced the following general protection fault:

[  294.803829] general protection fault: 0000 [#1] PREEMPT SMP
[  294.803865] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core r8169 snd_pcm mii lpc_ich prime_numbers i2c_hid i2c_designware_platform i2c_designware_core
[  294.804051] CPU: 1 PID: 2798 Comm: gem_exec_reloc Not tainted 4.12.0-rc5-CI-CI_DRM_2737+ #1
[  294.804090] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
[  294.804126] task: ffff880130bf4bc0 task.stack: ffffc90000648000
[  294.804259] RIP: 0010:i915_gem_request_retire+0x22c/0x5e0 [i915]
[  294.804290] RSP: 0018:ffffc9000064bb00 EFLAGS: 00010287
[  294.804320] RAX: ffff880102364e50 RBX: 6b6b6b6b6b6b6b63 RCX: 6b6b6b6b6b6b6b6b
[  294.804354] RDX: ffff880102364e58 RSI: ffff8801356af0c0 RDI: ffff880102364e50
[  294.804388] RBP: ffffc9000064bb28 R08: ffff880130bf53d0 R09: 0000000000000000
[  294.804421] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801356af0c0
[  294.804454] R13: ffff8801356af330 R14: ffff880130a82158 R15: ffff88012c67ec88
[  294.804489] FS:  00007fcb925428c0(0000) GS:ffff88013fc80000(0000) knlGS:0000000000000000
[  294.804527] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  294.804555] CR2: 00007fa3c064a020 CR3: 0000000137940000 CR4: 00000000001006e0
[  294.804588] Call Trace:
[  294.804702]  i915_gem_request_alloc+0xbe/0x400 [i915]
[  294.804822]  i915_gem_do_execbuffer+0x51b/0x12a0 [i915]
[  294.804869]  ? __slab_alloc.isra.22.constprop.27+0x54/0x80
[  294.804906]  ? __this_cpu_preempt_check+0x13/0x20
[  294.804942]  ? lock_acquire+0xb5/0x210
[  294.805058]  i915_gem_execbuffer2+0xbd/0x1e0 [i915]
[  294.805095]  drm_ioctl+0x202/0x490
[  294.805204]  ? i915_gem_execbuffer+0x2d0/0x2d0 [i915]
[  294.805243]  ? lock_acquire+0xb5/0x210
[  294.805272]  ? __fget+0xeb/0x200
[  294.805300]  do_vfs_ioctl+0x90/0x6d0
[  294.805328]  ? __fget+0x10c/0x200
[  294.805357]  SyS_ioctl+0x3c/0x70
[  294.805388]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  294.805417] RIP: 0033:0x7fcb90d57357
[  294.805442] RSP: 002b:00007ffd7671e948 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  294.805491] RAX: ffffffffffffffda RBX: ffffffff8146ff23 RCX: 00007fcb90d57357
[  294.805528] RDX: 00007ffd7671e9e0 RSI: 0000000040406469 RDI: 0000000000000004
[  294.805566] RBP: ffffc9000064bf88 R08: 0000000000000008 R09: 0000000000000000
[  294.805604] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000
[  294.805641] R13: 0000000000000004 R14: 0000000040406469 R15: 0000000000000002
[  294.805683]  ? __this_cpu_preempt_check+0x13/0x20
[  294.805716] Code: 02 00 00 48 8b 0a 4c 39 ea 48 8d 42 f8 48 8d 59 f8 74 2f 0f 0d 0b 48 89 50 08 48 89 50 10 48 89 c7 48 c7 00 00 00 00 00 4c 89 e6 <ff> 50 18 48 8d 53 08 48 8b 4b 08 48 89 d8 49 39 d5 48 8d 59 f8 
[  294.806108] RIP: i915_gem_request_retire+0x22c/0x5e0 [i915] RSP: ffffc9000064bb00
[  294.807218] ---[ end trace b25c9e80ebb6c607 ]---
Comment 1 Martin Peres 2017-06-20 06:48:18 UTC
Created attachment 132075 [details]
boot kernel log
Comment 2 Martin Peres 2017-06-20 06:48:42 UTC
Created attachment 132076 [details]
kenrel log while running IGT
Comment 3 Chris Wilson 2017-06-20 07:48:30 UTC
Use-after-free. I will be surprised if it is 100% reproducible, at least from that test alone. (If it is that makes this much easier!) First avenue is kasan. iirc, adding kasan wasn't adding that much overhead to BAT nowadays, so enabling it by default was an option.
Comment 4 Martin Peres 2017-06-20 09:08:02 UTC
(In reply to Chris Wilson from comment #3)
> Use-after-free. I will be surprised if it is 100% reproducible, at least
> from that test alone. (If it is that makes this much easier!) First avenue
> is kasan. iirc, adding kasan wasn't adding that much overhead to BAT
> nowadays, so enabling it by default was an option.

OK, let's see if one of the KASAN builds of Tomi will catch this!
Comment 5 Chris Wilson 2017-06-20 11:09:09 UTC
https://intel-gfx-ci.01.org/CI/Custom_2759/fi-byt-j1900/igt@gem_exec_reloc@basic-write-gtt-active.html

[  321.263363] ==================================================================
[  321.263854] BUG: KASAN: use-after-free in i915_gem_request_retire+0x1728/0x1740 [i915]
[  321.264001] Read of size 8 at addr ffff880100fc67d8 by task gem_exec_reloc/2868

[  321.264181] CPU: 0 PID: 2868 Comm: gem_exec_reloc Not tainted 4.12.0-rc6-CI-Custom_2759+ #1
[  321.264195] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
[  321.264208] Call Trace:
[  321.264234]  dump_stack+0x67/0x99
[  321.264260]  print_address_description+0x77/0x290
[  321.264437]  ? i915_gem_request_retire+0x1728/0x1740 [i915]
[  321.264459]  kasan_report+0x269/0x350
[  321.264487]  __asan_report_load8_noabort+0x14/0x20
[  321.264660]  i915_gem_request_retire+0x1728/0x1740 [i915]
[  321.264841]  ? intel_ring_context_pin+0x131/0x690 [i915]
[  321.265021]  i915_gem_request_alloc+0x2c6/0x1220 [i915]
[  321.265044]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
[  321.265226]  i915_gem_do_execbuffer+0xac0/0x2a20 [i915]
[  321.265250]  ? __lock_acquire+0xceb/0x5450
[  321.265269]  ? entry_SYSCALL_64_fastpath+0x1c/0xb1
[  321.265291]  ? kvmalloc_node+0x6b/0x80
[  321.265310]  ? kvmalloc_node+0x6b/0x80
[  321.265489]  ? eb_relocate_slow+0xbe0/0xbe0 [i915]
[  321.265520]  ? ___slab_alloc.constprop.28+0x2ab/0x3d0
[  321.265549]  ? debug_check_no_locks_freed+0x280/0x280
[  321.265591]  ? __might_fault+0xc6/0x1b0
[  321.265782]  i915_gem_execbuffer2+0x14a/0x3f0 [i915]
[  321.265815]  drm_ioctl+0x4ba/0xaa0
[  321.265986]  ? i915_gem_execbuffer+0xde0/0xde0 [i915]
[  321.266017]  ? drm_getunique+0x270/0x270
[  321.266068]  do_vfs_ioctl+0x17f/0xfa0
[  321.266091]  ? __fget+0x1ba/0x330
[  321.266112]  ? lock_acquire+0x390/0x390
[  321.266133]  ? ioctl_preallocate+0x1d0/0x1d0
[  321.266164]  ? __fget+0x1db/0x330
[  321.266194]  ? __fget_light+0x79/0x1f0
[  321.266219]  SyS_ioctl+0x3c/0x70
[  321.266247]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  321.266265] RIP: 0033:0x7fcede207357
[  321.266279] RSP: 002b:00007ffef0effe58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  321.266307] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fcede207357
[  321.266321] RDX: 00007ffef0effef0 RSI: 0000000040406469 RDI: 0000000000000004
[  321.266335] RBP: ffffffff812097c6 R08: 0000000000000008 R09: 0000000000000000
[  321.266349] R10: 0000000000000008 R11: 0000000000000246 R12: ffff880116bcff98
[  321.266363] R13: ffffffff81cb7cb3 R14: ffff880116bcff70 R15: 0000000000000000
[  321.266385]  ? __this_cpu_preempt_check+0x13/0x20
[  321.266406]  ? trace_hardirqs_off_caller+0x1d6/0x2c0

[  321.266487] Allocated by task 2868:
[  321.266568]  save_stack_trace+0x16/0x20
[  321.266586]  kasan_kmalloc+0xee/0x180
[  321.266602]  kasan_slab_alloc+0x12/0x20
[  321.266620]  kmem_cache_alloc+0xc7/0x2e0
[  321.266795]  i915_vma_instance+0x28c/0x1540 [i915]
[  321.266964]  eb_lookup_vmas+0x5a7/0x2250 [i915]
[  321.267130]  i915_gem_do_execbuffer+0x69a/0x2a20 [i915]
[  321.267296]  i915_gem_execbuffer2+0x14a/0x3f0 [i915]
[  321.267315]  drm_ioctl+0x4ba/0xaa0
[  321.267333]  do_vfs_ioctl+0x17f/0xfa0
[  321.267350]  SyS_ioctl+0x3c/0x70
[  321.267369]  entry_SYSCALL_64_fastpath+0x1c/0xb1

[  321.267428] Freed by task 177:
[  321.267502]  save_stack_trace+0x16/0x20
[  321.267521]  kasan_slab_free+0xad/0x180
[  321.267539]  kmem_cache_free+0xc5/0x340
[  321.267710]  i915_vma_unbind+0x666/0x10a0 [i915]
[  321.267880]  i915_vma_close+0x23a/0x2f0 [i915]
[  321.268048]  __i915_gem_free_objects+0x17d/0xc70 [i915]
[  321.268215]  __i915_gem_free_work+0x49/0x70 [i915]
[  321.268234]  process_one_work+0x66f/0x1410
[  321.268252]  worker_thread+0xe1/0xe90
[  321.268269]  kthread+0x304/0x410
[  321.268285]  ret_from_fork+0x27/0x40

[  321.268346] The buggy address belongs to the object at ffff880100fc6640
                which belongs to the cache i915_vma of size 656
[  321.268550] The buggy address is located 408 bytes inside of
                656-byte region [ffff880100fc6640, ffff880100fc68d0)
[  321.268741] The buggy address belongs to the page:
[  321.268837] page:ffffea000403f000 count:1 mapcount:0 mapping:          (null) index:0xffff880100fc5980 compound_mapcount: 0
[  321.269045] flags: 0x8000000000008100(slab|head)
[  321.269147] raw: 8000000000008100 0000000000000000 ffff880100fc5980 00000001001e001d
[  321.269312] raw: ffffea0004038e20 ffff880116b46240 ffff88011646c640 0000000000000000
[  321.269484] page dumped because: kasan: bad access detected

[  321.269665] Memory state around the buggy address:
[  321.269778]  ffff880100fc6680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  321.269949]  ffff880100fc6700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  321.270115] >ffff880100fc6780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  321.270279]                                                     ^
[  321.270410]  ffff880100fc6800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  321.270576]  ffff880100fc6880: fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc
[  321.270740] ==================================================================
[  321.270903] Disabling lock debugging due to kernel taint
Comment 6 Chris Wilson 2017-06-20 20:13:19 UTC
commit 25ffaa67459e988e73210543f7e05dfbf3f16163
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jun 20 13:43:20 2017 +0100

    drm/i915: Pass the right flags to i915_vma_move_to_active()
    
    i915_vma_move_to_active() takes the execobject flags and not a boolean!
    Instead of passing EXEC_OBJECT_WRITE we passed true [i.e.
    EXEC_OBJECT_NEEDS_FENCE] causing us to start tracking the
    vma->last_fence access and since we forgot to clear that on unbinding,
    we caused a use-after-free.
Comment 7 Humberto Israel Perez Rodriguez 2017-06-27 18:55:38 UTC
after run several times igt@gem_exec_reloc@basic-cpu-gtt-active in the dmesg there is not general protection fault message with latest drm-tip commit

commit e69c796252061299a393499696c38041cbbb106d
Author:     Daniel Vetter <daniel.vetter@ffwll.ch>
AuthorDate: Mon Jun 26 10:26:20 2017 +0200
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Mon Jun 26 10:26:20 2017 +0200

    drm-tip: 2017y-06m-26d-08h-25m-46s UTC integration manifest

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.