Bug 103638

Summary: [CI] igt@gem_exec_reuse@contexts - dmesg-warn - BUG: sleeping function called from invalid context at mm/vmalloc.c:1037
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BXT i915 features: GEM/PPGTT

Description Marta Löfstedt 2017-11-09 07:27:14 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3321/shard-apl1/igt@gem_exec_reuse@contexts.html

<7>[   64.352524] [IGT] gem_exec_reuse: executing
<4>[   64.373208] Setting dangerous option reset - tainting kernel
<6>[   64.389964] gem_exec_reuse (1484): drop_caches: 4
<7>[   85.732725] [IGT] gem_exec_reuse: starting subtest contexts
<3>[  118.024035] BUG: sleeping function called from invalid context at mm/vmalloc.c:1037
<3>[  118.024167] in_atomic(): 0, irqs_disabled(): 0, pid: 1484, name: gem_exec_reuse
<4>[  118.024207] 2 locks held by gem_exec_reuse/1484:
<4>[  118.024211]  #0:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa01056ec>] i915_gem_context_destroy_ioctl+0xdc/0x1e0 [i915]
<4>[  118.024281]  #1:  (rcu_read_lock){....}, at: [<ffffffffa0102d2c>] context_close+0xac/0x2a0 [i915]
<3>[  118.024335] Preemption disabled at:
<4>[  118.024390] [<ffffffffa0114bd6>] gen8_ppgtt_set_pde.isra.22+0x26/0xa0 [i915]
<4>[  118.024416] CPU: 2 PID: 1484 Comm: gem_exec_reuse Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3321+ #1
<4>[  118.024420] Hardware name:                  /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017
<4>[  118.024424] Call Trace:
<4>[  118.024433]  dump_stack+0x68/0x9f
<4>[  118.024441]  ___might_sleep+0x1e5/0x240
<4>[  118.024447]  __might_sleep+0x4a/0x80
<4>[  118.024454]  vm_unmap_aliases+0x43/0x210
<4>[  118.024459]  ? __save_stack_trace+0x83/0xd0
<4>[  118.024467]  change_page_attr_set_clr+0xcd/0x3d0
<4>[  118.024478]  set_pages_array_wb+0x2d/0x80
<4>[  118.024519]  vm_free_pages_release+0xb4/0x110 [i915]
<4>[  118.024560]  cleanup_page_dma.isra.11+0x7d/0x90 [i915]
<4>[  118.024600]  gen8_ppgtt_clear_pd+0x100/0x270 [i915]
<4>[  118.024607]  ? trace_hardirqs_on+0xd/0x10
<4>[  118.024648]  gen8_ppgtt_clear_pdp+0xbe/0x130 [i915]
<4>[  118.024690]  gen8_ppgtt_clear_4lvl+0xbc/0x100 [i915]
<4>[  118.024731]  ppgtt_unbind_vma+0x24/0x30 [i915]
<4>[  118.024773]  i915_vma_unbind+0x232/0x620 [i915]
<4>[  118.024818]  i915_vma_close+0xa8/0xd0 [i915]
<4>[  118.024856]  context_close+0x1f8/0x2a0 [i915]
<4>[  118.024899]  i915_gem_context_destroy_ioctl+0x194/0x1e0 [i915]
<4>[  118.024938]  ? i915_gem_context_create_ioctl+0x130/0x130 [i915]
<4>[  118.024943]  drm_ioctl_kernel+0x69/0xb0
<4>[  118.024950]  drm_ioctl+0x2f9/0x3d0
<4>[  118.024988]  ? i915_gem_context_create_ioctl+0x130/0x130 [i915]
<4>[  118.025000]  ? lock_acquire+0xb0/0x200
<4>[  118.025006]  ? __might_fault+0x3e/0x90
<4>[  118.025013]  do_vfs_ioctl+0x94/0x670
<4>[  118.025019]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
<4>[  118.025025]  ? __this_cpu_preempt_check+0x13/0x20
<4>[  118.025029]  ? trace_hardirqs_on_caller+0xe3/0x1b0
<4>[  118.025036]  SyS_ioctl+0x41/0x70
<4>[  118.025043]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[  118.025047] RIP: 0033:0x7f5f6928d587
<4>[  118.025051] RSP: 002b:00007ffecb347ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[  118.025058] RAX: ffffffffffffffda RBX: ffffffff81492003 RCX: 00007f5f6928d587
<4>[  118.025061] RDX: 00007ffecb347d20 RSI: 000000004008646e RDI: 0000000000000003
<4>[  118.025065] RBP: ffffc9000093bf88 R08: 00005610c3eb4f20 R09: 00007f5f69551600
<4>[  118.025069] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000004
<4>[  118.025072] R13: 0000000000000003 R14: 000000004008646e R15: 00007ffecb347d40
<4>[  118.025078]  ? __this_cpu_preempt_check+0x13/0x20
<3>[  119.071167] BUG: sleeping function called from invalid context at mm/vmalloc.c:1037
<3>[  119.071216] in_atomic(): 0, irqs_disabled(): 0, pid: 1484, name: gem_exec_reuse
<4>[  119.071256] 2 locks held by gem_exec_reuse/1484:
<4>[  119.071260]  #0:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa01056ec>] i915_gem_context_destroy_ioctl+0xdc/0x1e0 [i915]
<4>[  119.071331]  #1:  (rcu_read_lock){....}, at: [<ffffffffa0102d2c>] context_close+0xac/0x2a0 [i915]
<3>[  119.071384] Preemption disabled at:
<4>[  119.071425] [<ffffffffa0114bd6>] gen8_ppgtt_set_pde.isra.22+0x26/0xa0 [i915]
<4>[  119.071453] CPU: 2 PID: 1484 Comm: gem_exec_reuse Tainted: G     U  W       4.14.0-rc8-CI-CI_DRM_3321+ #1
<4>[  119.071457] Hardware name:                  /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017
<4>[  119.071461] Call Trace:
<4>[  119.071470]  dump_stack+0x68/0x9f
<4>[  119.071477]  ___might_sleep+0x1e5/0x240
<4>[  119.071483]  __might_sleep+0x4a/0x80
<4>[  119.071490]  vm_unmap_aliases+0x43/0x210
<4>[  119.071496]  ? __save_stack_trace+0x83/0xd0
<4>[  119.071504]  change_page_attr_set_clr+0xcd/0x3d0
<4>[  119.071516]  set_pages_array_wb+0x2d/0x80
<4>[  119.071559]  vm_free_pages_release+0xb4/0x110 [i915]
<4>[  119.071601]  cleanup_page_dma.isra.11+0x7d/0x90 [i915]
<4>[  119.071643]  gen8_ppgtt_clear_pd+0x100/0x270 [i915]
<4>[  119.071650]  ? trace_hardirqs_on+0xd/0x10
<4>[  119.071693]  gen8_ppgtt_clear_pdp+0xbe/0x130 [i915]
<4>[  119.071738]  gen8_ppgtt_clear_4lvl+0xbc/0x100 [i915]
<4>[  119.071781]  ppgtt_unbind_vma+0x24/0x30 [i915]
<4>[  119.071825]  i915_vma_unbind+0x232/0x620 [i915]
<4>[  119.071871]  i915_vma_close+0xa8/0xd0 [i915]
<4>[  119.071912]  context_close+0x1f8/0x2a0 [i915]
<4>[  119.071956]  i915_gem_context_destroy_ioctl+0x194/0x1e0 [i915]
<4>[  119.071997]  ? i915_gem_context_create_ioctl+0x130/0x130 [i915]
<4>[  119.072003]  drm_ioctl_kernel+0x69/0xb0
<4>[  119.072009]  drm_ioctl+0x2f9/0x3d0
<4>[  119.072049]  ? i915_gem_context_create_ioctl+0x130/0x130 [i915]
<4>[  119.072062]  ? lock_acquire+0xb0/0x200
<4>[  119.072068]  ? __might_fault+0x3e/0x90
<4>[  119.072075]  do_vfs_ioctl+0x94/0x670
<4>[  119.072081]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
<4>[  119.072087]  ? __this_cpu_preempt_check+0x13/0x20
<4>[  119.072092]  ? trace_hardirqs_on_caller+0xe3/0x1b0
<4>[  119.072099]  SyS_ioctl+0x41/0x70
<4>[  119.072106]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[  119.072111] RIP: 0033:0x7f5f6928d587
<4>[  119.072115] RSP: 002b:00007ffecb347ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[  119.072122] RAX: ffffffffffffffda RBX: ffffffff81492003 RCX: 00007f5f6928d587
<4>[  119.072126] RDX: 00007ffecb347d20 RSI: 000000004008646e RDI: 0000000000000003
<4>[  119.072129] RBP: ffffc9000093bf88 R08: 00005610c3eb4f20 R09: 00007f5f69551600
<4>[  119.072133] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000004
<4>[  119.072137] R13: 0000000000000003 R14: 000000004008646e R15: 00007ffecb347d40
<4>[  119.072143]  ? __this_cpu_preempt_check+0x13/0x20
<7>[  119.991223] [IGT] gem_exec_reuse: exiting, ret=0
Comment 1 Chris Wilson 2017-11-09 13:35:36 UTC
commit 94dec87159af6f3dcc0b78d3f909aefa9e29c01a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Nov 9 08:55:40 2017 +0000

    drm/i915: Reorder context-close to avoid calling i915_vma_close() under RCU
    
    When we close the VMA, we unbind it from the ppgtt and tear down the
    page directory pointing at it. That may trigger us to return WC pages
    back to the system, requiring conversion back to WB which itself may
    sleep. That makes i915_vma_close() unsuitable for use inside the RCU
    read lock, which we need to hold to iterate the radixtree.
    
    The fix is quite simple, we can close all the VMA as we close the ppgtt,
    we only need to do that instead of closing them during destruction of
    the LUT.
    
    v2: Order between closing the LUT and the ppgtt is important; we use the
    vma inside the LUT as a means of retrieving the object, and so we must
    clear the LUT before freeing the VMA when closing the ppgtt.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103638
    Fixes: 547da76b5777 ("drm/i915: Hold rcu_read_lock when iterating over the radixtree (vma idr)")
    Fixes: d1b48c1e7184 ("drm/i915: Replace execbuf vma ht with an idr")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Matthew Auld <matthew.william.auld@gmail.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20171109085540.32264-1-chris@chris-wilson.co.uk
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Comment 2 Marta Löfstedt 2017-11-10 10:22:39 UTC
The patch is included from CI_DRM_3327. The issue is not reproduced

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.