Bug 107129 - [CI][GUC][SKL,KBL] drv_selftests@live_sanity
Summary: [CI][GUC][SKL,KBL] drv_selftests@live_sanity
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-05 14:38 UTC by Tomi Sarvela
Modified: 2018-07-18 01:30 UTC (History)
1 user (show)

See Also:
i915 platform: KBL, SKL
i915 features: GEM/Other


Attachments

Description Tomi Sarvela 2018-07-05 14:38:43 UTC
Intel-GFX-CI hosts are starting to run i915 selftests, and this bug is part of the series "Initial findings"

drv_selftest@live_sanitycheck fails on SKL and KBL GUC-enabled hosts:

[  368.932807] [loop 0] Failed to busy the object
[  368.932825] i915/i915_gem_object_live_selftests: igt_mmap_offset_exhaustion failed with error -5
[  369.026897] i915: probe of 0000:00:02.0 failed with error -5

Full traces:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4432/fi-skl-guc/igt@drv_selftest@live_sanitycheck.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4432/fi-kbl-guc/igt@drv_selftest@live_sanitycheck.html
Comment 1 Tomi Sarvela 2018-07-05 14:42:12 UTC
Wrong cut/paste there, this has probably more relevant information:

[  369.531090] RIP: 0010:plist_check_prev_next+0x0/0x40
[  369.531094] Code: ff 85 c0 89 c2 7e 17 be 01 00 00 00 48 89 df e8 76 c5 b6 ff ba 01 00 00 00 85 c0 0f 4e d0 89 d0 5b c3 90 90 90 90 90 90 90 90 <48> 8b 42 08 48 39 f0 74 2b 49 89 f0 48 8b 4f 08 50 ff 32 52 48 89 
[  369.531147] RSP: 0018:ffffc9000051fa88 EFLAGS: 00010006
[  369.531151] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
[  369.531155] RDX: 0000000000000000 RSI: ffff880212c87820 RDI: ffffffff82243de0
[  369.531159] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[  369.531164] R10: ffffc9000051fa70 R11: 0000000000000000 R12: ffffffff82243de0
[  369.531168] R13: ffffffff82243de0 R14: ffff8802168b7820 R15: 0000000077359400
[  369.531173] FS:  00007f17536e5980(0000) GS:ffff880236c80000(0000) knlGS:0000000000000000
[  369.531178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  369.531182] CR2: 0000000000000008 CR3: 000000014df46004 CR4: 00000000003606e0
[  369.531186] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  369.531190] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  369.531195] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
[  369.531200] in_atomic(): 1, irqs_disabled(): 1, pid: 8902, name: drv_selftest
[  369.531205] INFO: lockdep is turned off.
[  369.531207] irq event stamp: 230888
[  369.531211] hardirqs last  enabled at (230887): [<ffffffff8193e0dc>] _raw_spin_unlock_irqrestore+0x4c/0x60
[  369.531217] hardirqs last disabled at (230888): [<ffffffff8193df4d>] _raw_spin_lock_irqsave+0xd/0x50
[  369.531224] softirqs last  enabled at (230732): [<ffffffff81c0034f>] __do_softirq+0x34f/0x505
[  369.531230] softirqs last disabled at (230725): [<ffffffff8108c759>] irq_exit+0xa9/0xc0
[  369.531235] Preemption disabled at:
[  369.531235] [<0000000000000000>]           (null)
[  369.531242] CPU: 2 PID: 8902 Comm: drv_selftest Tainted: G     UD W         4.18.0-rc3-CI-CI_DRM_4432+ #1
[  369.531247] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018
[  369.531252] Call Trace:
[  369.531256]  dump_stack+0x67/0x9b
[  369.531260]  ___might_sleep+0x167/0x250
[  369.531264]  exit_signals+0x2b/0x2d0
[  369.531268]  do_exit+0xa3/0xd40
[  369.531273]  rewind_stack_do_exit+0x17/0x20
Comment 2 Chris Wilson 2018-07-09 08:34:18 UTC
Fwiw, there was a kasan run https://intel-gfx-ci.01.org/tree/drm-tip/kasan_48/ and that didn't seemingly suggest anything. (The closest, most novel splat, is a hit for live_objects.)
Comment 3 Chris Wilson 2018-07-10 16:51:57 UTC
commit 7ab87ede5078af1daccf26951096e16ac16e19cb (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 10 15:38:21 2018 +0100

    drm/i915: Unwind HW init after GVT setup failure
    
    Following intel_gvt_init() failure, we missed unwinding our setup
    leaving pointers dangling past the module unload. For our example, the
    pm_qos:
    
    [  441.057615] top: 000000006b3baf1c, n: 0000000054d8ef33, p: 0000000097cdf1a2
                   prev: 0000000054d8ef33, n: 0000000097cdf1a2, p: 000000006b3baf1c
                   next: 0000000097cdf1a2, n: 000000006de8fc8b, p: 0000000081087253
    [  441.057627] WARNING: CPU: 4 PID: 9277 at lib/plist.c:42 plist_check_prev_next+0x2d/0x40
    [  441.057628] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core e1000e snd_pcm mei_me mei prime_numbers [last unloaded: i915]
    [  441.057652] CPU: 4 PID: 9277 Comm: drv_selftest Tainted: G     U            4.18.0-rc4-CI-CI_DRM_4464+ #1
    [  441.057653] Hardware name: System manufacturer System Product Name/Z170 PRO GAMING, BIOS 3402 04/26/2017
    [  441.057656] RIP: 0010:plist_check_prev_next+0x2d/0x40
    [  441.057657] Code: 08 48 39 f0 74 2b 49 89 f0 48 8b 4f 08 50 ff 32 52 48 89 fe 41 ff 70 08 48 8b 17 48 c7 c7 d8 ae 14 82 4d 8b 08 e8 63 0e 76 ff <0f> 0b 48 83 c4 20 c3 48 39 10 75 d0 f3 c3 0f 1f 44 00 00 41 54 55
    [  441.057717] RSP: 0018:ffffc900003a3a68 EFLAGS: 00010082
    [  441.057720] RAX: 0000000000000000 RBX: ffff8802193978c0 RCX: 0000000000000002
    [  441.057721] RDX: 0000000080000002 RSI: ffffffff820c65a4 RDI: 00000000ffffffff
    [  441.057722] RBP: ffff8802193978c0 R08: 0000000000000000 R09: 0000000000000001
    [  441.057724] R10: ffffc900003a3a70 R11: 0000000000000000 R12: ffffffff82243de0
    [  441.057725] R13: ffffffff82243de0 R14: ffff88021a6c78c0 R15: 0000000077359400
    [  441.057726] FS:  00007fc23a4a9980(0000) GS:ffff880236d00000(0000) knlGS:0000000000000000
    [  441.057728] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  441.057729] CR2: 0000563e4503d038 CR3: 0000000138f86005 CR4: 00000000003606e0
    [  441.057730] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  441.057731] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  441.057732] Call Trace:
    [  441.057736]  plist_check_list+0x2e/0x40
    [  441.057738]  plist_add+0x23/0x130
    [  441.057743]  pm_qos_update_target+0x1bd/0x2f0
    [  441.057771]  i915_driver_load+0xec4/0x1060 [i915]
    [  441.057775]  ? trace_hardirqs_on_caller+0xe0/0x1b0
    [  441.057800]  i915_pci_probe+0x29/0x90 [i915]
    [  441.057804]  pci_device_probe+0xa1/0x130
    [  441.057807]  driver_probe_device+0x306/0x480
    [  441.057810]  __driver_attach+0xdb/0x100
    [  441.057812]  ? driver_probe_device+0x480/0x480
    [  441.057813]  ? driver_probe_device+0x480/0x480
    [  441.057816]  bus_for_each_dev+0x74/0xc0
    [  441.057819]  bus_add_driver+0x15f/0x250
    [  441.057821]  ? 0xffffffffa0696000
    [  441.057823]  driver_register+0x56/0xe0
    [  441.057825]  ? 0xffffffffa0696000
    [  441.057827]  do_one_initcall+0x58/0x370
    [  441.057830]  ? do_init_module+0x1d/0x1ea
    [  441.057832]  ? rcu_read_lock_sched_held+0x6f/0x80
    [  441.057834]  ? kmem_cache_alloc_trace+0x282/0x2e0
    [  441.057838]  do_init_module+0x56/0x1ea
    [  441.057841]  load_module+0x2435/0x2b20
    [  441.057852]  ? __se_sys_finit_module+0xd3/0xf0
    [  441.057854]  __se_sys_finit_module+0xd3/0xf0
    [  441.057861]  do_syscall_64+0x55/0x190
    [  441.057863]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [  441.057865] RIP: 0033:0x7fc239d75839
    [  441.057866] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
    [  441.057927] RSP: 002b:00007fffb7825d38 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    [  441.057930] RAX: ffffffffffffffda RBX: 0000563e45035dd0 RCX: 00007fc239d75839
    [  441.057931] RDX: 0000000000000000 RSI: 0000563e4502f8a0 RDI: 0000000000000004
    [  441.057932] RBP: 0000563e4502f8a0 R08: 0000000000000004 R09: 0000000000000000
    [  441.057933] R10: 00007fffb7825ea0 R11: 0000000000000246 R12: 0000000000000000
    [  441.057934] R13: 0000563e4502f690 R14: 0000000000000000 R15: 000000000000003f
    [  441.057940] irq event stamp: 231338
    [  441.057943] hardirqs last  enabled at (231337): [<ffffffff8193e3fc>] _raw_spin_unlock_irqrestore+0x4c/0x60
    [  441.057944] hardirqs last disabled at (231338): [<ffffffff8193e26d>] _raw_spin_lock_irqsave+0xd/0x50
    [  441.057947] softirqs last  enabled at (231024): [<ffffffff81c0034f>] __do_softirq+0x34f/0x505
    [  441.057949] softirqs last disabled at (231005): [<ffffffff8108c7b9>] irq_exit+0xa9/0xc0
    [  441.057951] WARNING: CPU: 4 PID: 9277 at lib/plist.c:42 plist_check_prev_next+0x2d/0x40
    
    v2: Add a load failure point to intel_gvt_init() so that we always
    exercise this path in future.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107129
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Cc: Michał Winiarski <michal.winiarski@intel.com>
    Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180710143821.1889-1-chris@chris-wilson.co.uk
Comment 4 James Ausmus 2018-07-18 01:30:47 UTC
This test has been green since CI_DRM_4465 on fi-skl-guc and fi-kbl-guc, closing


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.