Bug 111050 - [CI][BAT] igt@i915_selftest@live_contexts - incomplete - IOMMU and GVT-d SKL platforms
Summary: [CI][BAT] igt@i915_selftest@live_contexts - incomplete - IOMMU and GVT-d SKL ...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: highest normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-03 14:23 UTC by Martin Peres
Modified: 2019-08-22 07:18 UTC (History)
1 user (show)

See Also:
i915 platform: SKL
i915 features: GEM/Other


Attachments

Description Martin Peres 2019-07-03 14:23:31 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6403/fi-skl-gvtdvm/igt@i915_selftest@live_contexts.html

<3> [414.363795] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/i915_gem_gtt.c:472
<3> [414.364167] in_atomic(): 1, irqs_disabled(): 0, pid: 3905, name: i915_selftest
<4> [414.364406] 3 locks held by i915_selftest/3905:
<4> [414.364408]  #0: 0000000034fe8aa8 (&dev->mutex){....}, at: device_driver_attach+0x18/0x50
<4> [414.364415]  #1: 000000006bd8a560 (&dev->struct_mutex){+.+.}, at: igt_ctx_exec+0xb7/0x410 [i915]
<4> [414.364476]  #2: 000000003dfdc766 (&(&pd->lock)->rlock){+.+.}, at: gen8_ppgtt_alloc_pdp+0x448/0x540 [i915]
<3> [414.364529] Preemption disabled at:
<4> [414.364530] [<0000000000000000>] 0x0
<4> [414.364696] CPU: 0 PID: 3905 Comm: i915_selftest Tainted: G     U            5.2.0-rc7-CI-CI_DRM_6403+ #1
<4> [414.364698] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [414.364699] Call Trace:
<4> [414.364704]  dump_stack+0x67/0x9b
<4> [414.364708]  ___might_sleep+0x167/0x250
<4> [414.364777]  vm_free_page+0x24/0xc0 [i915]
<4> [414.364852]  free_pd+0xf/0x20 [i915]
<4> [414.364897]  gen8_ppgtt_alloc_pdp+0x489/0x540 [i915]
<4> [414.364946]  gen8_ppgtt_alloc_4lvl+0x8e/0x2e0 [i915]
<4> [414.364992]  ppgtt_bind_vma+0x2e/0x60 [i915]
<4> [414.365039]  i915_vma_bind+0xe8/0x2c0 [i915]
<4> [414.365088]  __i915_vma_do_pin+0xa1/0xd20 [i915]
<4> [414.365135]  gpu_fill+0x709/0xb60 [i915]
<4> [414.365177]  ? gem_context_register+0xa0/0xf0 [i915]
<4> [414.365220]  igt_ctx_exec+0x148/0x410 [i915]
<4> [414.365278]  __i915_subtests+0xb8/0x210 [i915]
<4> [414.365328]  ? __i915_nop_teardown+0x10/0x10 [i915]
<4> [414.365375]  ? __i915_live_setup+0x10/0x10 [i915]
<4> [414.365424]  __run_selftests+0x112/0x170 [i915]
<4> [414.365476]  i915_live_selftests+0x2c/0x60 [i915]
<4> [414.365515]  i915_pci_probe+0x83/0x1a0 [i915]
<4> [414.365518]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4> [414.365523]  pci_device_probe+0x9e/0x120
<4> [414.365527]  really_probe+0xea/0x3c0
<4> [414.365531]  driver_probe_device+0x10b/0x120
<4> [414.365534]  device_driver_attach+0x4a/0x50
<4> [414.365537]  __driver_attach+0x97/0x130
<4> [414.365540]  ? device_driver_attach+0x50/0x50
<4> [414.365542]  bus_for_each_dev+0x74/0xc0
<4> [414.365547]  bus_add_driver+0x13f/0x210
<4> [414.365549]  ? 0xffffffffa0087000
<4> [414.365552]  driver_register+0x56/0xe0
<4> [414.365554]  ? 0xffffffffa0087000
<4> [414.365557]  do_one_initcall+0x58/0x300
<4> [414.365559]  ? do_init_module+0x1d/0x1f6
<4> [414.365563]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [414.365565]  ? kmem_cache_alloc_trace+0x261/0x290
<4> [414.365570]  do_init_module+0x56/0x1f6
<4> [414.365573]  load_module+0x24d1/0x2990
<4> [414.365587]  ? __se_sys_finit_module+0xd3/0xf0
<4> [414.365589]  __se_sys_finit_module+0xd3/0xf0
<4> [414.365599]  do_syscall_64+0x55/0x1c0
<4> [414.365602]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [414.365604] RIP: 0033:0x7fe79cc1e839
<4> [414.365609] Code: Bad RIP value.
<4> [414.365611] RSP: 002b:00007ffcc0af4ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [414.365613] RAX: ffffffffffffffda RBX: 00005618391eae60 RCX: 00007fe79cc1e839
<4> [414.365614] RDX: 0000000000000000 RSI: 00005618391ea330 RDI: 0000000000000006
<4> [414.365616] RBP: 00005618391ea330 R08: 0000000000000004 R09: 000056183878fc1b
<4> [414.365617] R10: 00007ffcc0af5120 R11: 0000000000000246 R12: 0000000000000000
<4> [414.365619] R13: 00005618391e9f50 R14: 0000000000000020 R15: 0000000000000047
<6> [414.375626] Purging GPU memory, 0 pages freed, 128 pages still pinned, 1048577 pages left available.
<4> [414.375726] in:imklog invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
<4> [414.375743] CPU: 0 PID: 377 Comm: in:imklog Tainted: G     U  W         5.2.0-rc7-CI-CI_DRM_6403+ #1
<4> [414.375744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [414.375746] Call Trace:
<4> [414.375749]  dump_stack+0x67/0x9b
<4> [414.375753]  dump_header+0x52/0x610
<4> [414.375755]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [414.375758]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [414.375761]  ? lockdep_hardirqs_on+0xe3/0x1b0
<4> [414.375763]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4> [414.375768]  oom_kill_process+0x176/0x210
<4> [414.375772]  out_of_memory+0x10e/0x380
<4> [414.375776]  __alloc_pages_nodemask+0xd21/0x1130
<4> [414.375782]  ? lock_acquire+0xa6/0x1c0
<4> [414.375791]  __read_swap_cache_async+0x131/0x1d0
<4> [414.375794]  ? __lock_acquire+0x530/0x24c0
<4> [414.375797]  read_swap_cache_async+0x23/0x60
<4> [414.375800]  swap_cluster_readahead+0x200/0x280
<4> [414.375803]  ? lock_acquire+0xa6/0x1c0
<4> [414.375806]  ? find_get_entry+0x12c/0x300
<4> [414.375809]  ? xas_start+0x16d/0x1d0
<4> [414.375814]  ? swapin_readahead+0x15d/0x3f0
<4> [414.375816]  swapin_readahead+0x15d/0x3f0
<4> [414.375821]  ? pagecache_get_page+0x2b/0x220
<4> [414.375826]  ? do_swap_page+0x2f7/0x920
<4> [414.375828]  do_swap_page+0x2f7/0x920
<4> [414.375834]  __handle_mm_fault+0x676/0xfc0
<4> [414.375843]  handle_mm_fault+0x155/0x350
<4> [414.375847]  __do_page_fault+0x248/0x4f0
<4> [414.375853]  page_fault+0x1e/0x30
<4> [414.375856] RIP: 0010:copy_user_enhanced_fast_string+0xe/0x20
<4> [414.375858] Code: 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 01 ca c3 0f 1f 80 00 00 00 00 0f 01 cb 83 fa 40 0f 82 70 ff ff ff 89 d1 <f3> a4 31 c0 0f 01 ca c3 66 2e 0f 1f 84 00 00 00 00 00 89 d1 f3 a4
<4> [414.375860] RSP: 0000:ffffc90000383df8 EFLAGS: 00050206
<4> [414.375862] RAX: 00007ffffffff000 RBX: 0000000000000070 RCX: 0000000000000070
<4> [414.375864] RDX: 0000000000000070 RSI: ffff888030056a88 RDI: 00007f31241d1d00
<4> [414.375865] RBP: 00007f31241d1d00 R08: 0000000013bf810d R09: 0000000000000001
<4> [414.375867] R10: ffffc90000383d78 R11: ffff8880749fd8b8 R12: ffff888030056a88
<4> [414.375869] R13: 00007f31241d1d00 R14: ffff888030056a88 R15: 0000000000000070
<4> [414.375878]  _copy_to_user+0x56/0x70
<4> [414.375882]  do_syslog+0x3ec/0x870
<4> [414.375885]  ? lock_acquire+0xa6/0x1c0
<4> [414.375888]  ? wait_woken+0xa0/0xa0
<4> [414.375894]  kmsg_read+0x39/0x50
<4> [414.375898]  proc_reg_read+0x34/0x60
<4> [414.375901]  vfs_read+0x9e/0x150
<4> [414.375905]  ksys_read+0x8f/0xe0
<4> [414.375909]  do_syscall_64+0x55/0x1c0
<4> [414.375912]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [414.375914] RIP: 0033:0x7f31270a4384
<4> [414.375917] Code: Bad RIP value.
<4> [414.375919] RSP: 002b:00007f31241d14c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
<4> [414.375921] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f31270a4384
<4> [414.375923] RDX: 0000000000001fa0 RSI: 00007f31241d1d00 RDI: 0000000000000005
<4> [414.375924] RBP: 00007f31241d1d00 R08: 0000000000000000 R09: 000055d475f9bc88
<4> [414.375926] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000001fa0
<4> [414.375927] R13: 0000000000001fa0 R14: 0000000000001f9f R15: 00007f31241d1de1
Comment 1 Martin Peres 2019-07-03 14:24:12 UTC
This is likely introduced by https://patchwork.freedesktop.org/series/63042/
Comment 2 CI Bug Log 2019-07-03 14:24:57 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* GVT-d: igt@i915_selftest@live_contexts - incomplete - BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/i915_gem_gtt.c
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6403/fi-skl-gvtdvm/igt@i915_selftest@live_contexts.html

* IOMMU: igt@i915_selftest@live_contexts - incomplete - general protection fault: 0000 [#1] PREEMPT SMP PTI
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6403/fi-skl-iommu/igt@i915_selftest@live_contexts.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5081/fi-skl-iommu/igt@i915_selftest@live_contexts.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13498/fi-skl-iommu/igt@i915_selftest@live_contexts.html
Comment 3 Chris Wilson 2019-07-03 14:25:36 UTC
https://patchwork.freedesktop.org/series/63127/
Comment 4 Chris Wilson 2019-07-04 10:26:57 UTC
commit 068610895ebd4bd86f496f01eb7b97e56d7269b2 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 3 18:19:12 2019 +0100

    drm/i915/gtt: Defer the free for alloc error paths
    
    If we hit an error while allocating the page tables, we have to unwind
    the incomplete updates, and wish to free the unused pd. However, we are
    not allowed to be hoding the spinlock at that point, and so must use the
    later free to defer it until after we drop the lock.
    
    <3> [414.363795] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/i915_gem_gtt.c:472
    <3> [414.364167] in_atomic(): 1, irqs_disabled(): 0, pid: 3905, name: i915_selftest
    <4> [414.364406] 3 locks held by i915_selftest/3905:
    <4> [414.364408]  #0: 0000000034fe8aa8 (&dev->mutex){....}, at: device_driver_attach+0x18/0x50
    <4> [414.364415]  #1: 000000006bd8a560 (&dev->struct_mutex){+.+.}, at: igt_ctx_exec+0xb7/0x410 [i915]
    <4> [414.364476]  #2: 000000003dfdc766 (&(&pd->lock)->rlock){+.+.}, at: gen8_ppgtt_alloc_pdp+0x448/0x540 [i915]
    <3> [414.364529] Preemption disabled at:
    <4> [414.364530] [<0000000000000000>] 0x0
    <4> [414.364696] CPU: 0 PID: 3905 Comm: i915_selftest Tainted: G     U            5.2.0-rc7-CI-CI_DRM_6403+ #1
    <4> [414.364698] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
    <4> [414.364699] Call Trace:
    <4> [414.364704]  dump_stack+0x67/0x9b
    <4> [414.364708]  ___might_sleep+0x167/0x250
    <4> [414.364777]  vm_free_page+0x24/0xc0 [i915]
    <4> [414.364852]  free_pd+0xf/0x20 [i915]
    <4> [414.364897]  gen8_ppgtt_alloc_pdp+0x489/0x540 [i915]
    <4> [414.364946]  gen8_ppgtt_alloc_4lvl+0x8e/0x2e0 [i915]
    <4> [414.364992]  ppgtt_bind_vma+0x2e/0x60 [i915]
    <4> [414.365039]  i915_vma_bind+0xe8/0x2c0 [i915]
    <4> [414.365088]  __i915_vma_do_pin+0xa1/0xd20 [i915]
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111050
    Fixes: 1d1b5490b91c ("drm/i915/gtt: Replace struct_mutex serialisation for allocation")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190703171913.16585-3-chris@chris-wilson.co.uk
Comment 5 Martin Peres 2019-08-22 07:18:25 UTC
(In reply to Chris Wilson from comment #4)
> commit 068610895ebd4bd86f496f01eb7b97e56d7269b2 (HEAD ->
> drm-intel-next-queued, drm-intel/for-linux-next,
> drm-intel/drm-intel-next-queued)
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Jul 3 18:19:12 2019 +0100
> 
>     drm/i915/gtt: Defer the free for alloc error paths
>     
>     If we hit an error while allocating the page tables, we have to unwind
>     the incomplete updates, and wish to free the unused pd. However, we are
>     not allowed to be hoding the spinlock at that point, and so must use the
>     later free to defer it until after we drop the lock.
>     
>     <3> [414.363795] BUG: sleeping function called from invalid context at
> drivers/gpu/drm/i915/i915_gem_gtt.c:472
>     <3> [414.364167] in_atomic(): 1, irqs_disabled(): 0, pid: 3905, name:
> i915_selftest
>     <4> [414.364406] 3 locks held by i915_selftest/3905:
>     <4> [414.364408]  #0: 0000000034fe8aa8 (&dev->mutex){....}, at:
> device_driver_attach+0x18/0x50
>     <4> [414.364415]  #1: 000000006bd8a560 (&dev->struct_mutex){+.+.}, at:
> igt_ctx_exec+0xb7/0x410 [i915]
>     <4> [414.364476]  #2: 000000003dfdc766 (&(&pd->lock)->rlock){+.+.}, at:
> gen8_ppgtt_alloc_pdp+0x448/0x540 [i915]
>     <3> [414.364529] Preemption disabled at:
>     <4> [414.364530] [<0000000000000000>] 0x0
>     <4> [414.364696] CPU: 0 PID: 3905 Comm: i915_selftest Tainted: G     U  
> 5.2.0-rc7-CI-CI_DRM_6403+ #1
>     <4> [414.364698] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
>     <4> [414.364699] Call Trace:
>     <4> [414.364704]  dump_stack+0x67/0x9b
>     <4> [414.364708]  ___might_sleep+0x167/0x250
>     <4> [414.364777]  vm_free_page+0x24/0xc0 [i915]
>     <4> [414.364852]  free_pd+0xf/0x20 [i915]
>     <4> [414.364897]  gen8_ppgtt_alloc_pdp+0x489/0x540 [i915]
>     <4> [414.364946]  gen8_ppgtt_alloc_4lvl+0x8e/0x2e0 [i915]
>     <4> [414.364992]  ppgtt_bind_vma+0x2e/0x60 [i915]
>     <4> [414.365039]  i915_vma_bind+0xe8/0x2c0 [i915]
>     <4> [414.365088]  __i915_vma_do_pin+0xa1/0xd20 [i915]
>     
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111050
>     Fixes: 1d1b5490b91c ("drm/i915/gtt: Replace struct_mutex serialisation
> for allocation")
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Matthew Auld <matthew.auld@intel.com>
>     Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>     Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20190703171913.16585-3-
> chris@chris-wilson.co.uk

Thanks!
Comment 6 CI Bug Log 2019-08-22 07:18:31 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.