Bug 108797 - [CI][BAT] igt@i915_selftest@live_gem - dmesg-warn - WARN_ON(!intel_engines_are_idle(i915))
Summary: [CI][BAT] igt@i915_selftest@live_gem - dmesg-warn - WARN_ON(!intel_engines_ar...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/iGVT-g (show other bugs)
Version: XOrg git
Hardware: Other All
: highest normal
Assignee: Terrence Xu
QA Contact: Terrence Xu
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-19 16:22 UTC by Martin Peres
Modified: 2018-12-10 13:01 UTC (History)
1 user (show)

See Also:
i915 platform: BDW
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-11-19 16:22:06 UTC
<6> [377.158520] [drm] DRM_I915_DEBUG enabled
<6> [377.158521] [drm] DRM_I915_DEBUG_GEM enabled
<6> [377.158523] [drm] DRM_I915_DEBUG_RUNTIME_PM enabled
<6> [377.158526] i915: Performing live selftests with st_random_seed=0x164b03e4 st_timeout=1000
<7> [377.158964] [drm:intel_power_well_enable [i915]] enabling always-on
<7> [377.165809] [drm:intel_power_well_disable [i915]] disabling always-on
<4> [377.168597] ------------[ cut here ]------------
<4> [377.168600] WARN_ON(!intel_engines_are_idle(i915))
<4> [377.168673] WARNING: CPU: 0 PID: 3594 at drivers/gpu/drm/i915/i915_gem.c:5123 i915_gem_suspend+0x120/0x150 [i915]
<4> [377.168675] Modules linked in: i915(+) amdgpu chash gpu_sched ttm snd_hda_codec snd_hwdep snd_hda_core snd_pcm vgem crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000 prime_numbers i2c_piix4 [last unloaded: i915]
<4> [377.168689] CPU: 0 PID: 3594 Comm: drv_selftest Tainted: G U 4.20.0-rc1-g756a0fd616c3-repeat_14+ #1
<4> [377.168691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [377.168736] RIP: 0010:i915_gem_suspend+0x120/0x150 [i915]
<4> [377.168738] Code: 02 00 84 c0 74 11 48 89 ef e8 9c 25 fc ff 48 83 c4 08 31 c0 5b 5d c3 48 c7 c6 20 a1 e5 c0 48 c7 c7 4e 14 e4 c0 e8 a0 06 36 c0 <0f> 0b 48 89 ef e8 86 79 ff ff eb d0 48 c7 c6 0b 18 e4 c0 48 c7 c7
<4> [377.168740] RSP: 0018:ffffa23cc026bac0 EFLAGS: 00010282
<4> [377.168742] RAX: 0000000000000000 RBX: ffff918a22e1b258 RCX: 0000000000000000
<4> [377.168744] RDX: 0000000000000006 RSI: ffffffff820fbc9a RDI: ffffffff820a78ef
<4> [377.168746] RBP: ffff918a22e10000 R08: 0000000052570ee8 R09: 0000000000000000
<4> [377.168748] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4> [377.168749] R13: ffff918a22e10068 R14: ffff918a4f46dfe8 R15: ffff918a22e10000
<4> [377.168752] FS: 00007f4e2cde9980(0000) GS:ffff918a7da00000(0000) knlGS:0000000000000000
<4> [377.168754] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [377.168755] CR2: 00007fff9a65bf48 CR3: 0000000044842003 CR4: 00000000003606f0
<4> [377.168759] Call Trace:
<4> [377.168805] igt_gem_suspend+0x80/0x130 [i915]
<4> [377.168855] __i915_subtests+0x5e/0xf0 [i915]
<4> [377.168906] __run_selftests+0x10b/0x190 [i915]
<4> [377.168955] i915_live_selftests+0x2c/0x60 [i915]
<4> [377.168994] i915_pci_probe+0x50/0xa0 [i915]
<4> [377.168999] pci_device_probe+0xa1/0x130
<4> [377.169004] really_probe+0xf3/0x3e0
<4> [377.169008] driver_probe_device+0x10a/0x120
<4> [377.169014] __driver_attach+0xdb/0x100
<4> [377.169016] ? driver_probe_device+0x120/0x120
<4> [377.169033] ? driver_probe_device+0x120/0x120
<4> [377.169037] bus_for_each_dev+0x74/0xc0
<4> [377.169041] bus_add_driver+0x15f/0x250
<4> [377.169044] ? 0xffffffffc0714000
<4> [377.169047] driver_register+0x56/0xe0
<4> [377.169049] ? 0xffffffffc0714000
<4> [377.169052] do_one_initcall+0x58/0x2e0
<4> [377.169057] ? do_init_module+0x1d/0x1ea
<4> [377.169059] ? rcu_read_lock_sched_held+0x6f/0x80
<4> [377.169063] ? kmem_cache_alloc_trace+0x264/0x290
<4> [377.169068] do_init_module+0x56/0x1ea
<4> [377.169071] load_module+0x2714/0x29f0
<4> [377.169097] ? __se_sys_finit_module+0xd3/0xf0
<4> [377.169099] __se_sys_finit_module+0xd3/0xf0
<4> [377.169107] do_syscall_64+0x55/0x190
<4> [377.169111] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [377.169113] RIP: 0033:0x7f4e2c6b3839
<4> [377.169116] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4> [377.169117] RSP: 002b:00007ffeeb5ef568 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [377.169120] RAX: ffffffffffffffda RBX: 000056174bdca4c0 RCX: 00007f4e2c6b3839
<4> [377.169122] RDX: 0000000000000000 RSI: 000056174bdcb170 RDI: 0000000000000006
<4> [377.169123] RBP: 000056174bdcb170 R08: 0000000000000004 R09: 0000000000000000
<4> [377.169125] R10: 00007ffeeb5ef6e0 R11: 0000000000000246 R12: 0000000000000000
<4> [377.169126] R13: 000056174bdc3e30 R14: 0000000000000020 R15: 0000000000000037
<4> [377.169134] irq event stamp: 253786
<4> [377.169138] hardirqs last enabled at (253785): [<ffffffff810ff884>] vprintk_emit+0x124/0x320
<4> [377.169140] hardirqs last disabled at (253786): [<ffffffff810019b0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [377.169143] softirqs last enabled at (253770): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [377.169146] softirqs last disabled at (253763): [<ffffffff81091cb9>] irq_exit+0xa9/0xc0
<4> [377.169191] WARNING: CPU: 0 PID: 3594 at drivers/gpu/drm/i915/i915_gem.c:5123 i915_gem_suspend+0x120/0x150 [i915]
<4> [377.169193] ---[ end trace cb8da0179d23471c ]---
Comment 1 Joonas Lahtinen 2018-11-20 13:43:45 UTC
Suspend test not passing due to engines not idling when running under GVT-d is most likely a dupe of https://bugs.freedesktop.org/show_bug.cgi?id=89360 .
Comment 2 Hang Yuan 2018-11-30 05:01:24 UTC
Hi Martin, could you share your test steps? Is the call trace in guest or host kernel?
Comment 3 Jani Saarinen 2018-11-30 07:21:13 UTC
This must be in quest?
Comment 4 Jani Saarinen 2018-11-30 07:24:33 UTC
This has been seen CI_DRM_5090_128 (1 week, 6 days ago)
Comment 5 Martin Peres 2018-11-30 13:43:15 UTC
(In reply to Hang Yuan from comment #2)
> Hi Martin, could you share your test steps? 

Sure thing! First of all, here is an html page with the full report (including boot logs, and the logs during the execution): https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5090_128/fi-bdw-gvtdvm/igt%40drv_selftest%40live_gem.html

(In reply to Hang Yuan from comment #2)
> Is the call trace in guest or host kernel?

This is from the guest. We do not have the infrastructure to collect the host kernel's logs, hence why Joonas went on a limb and guessed it was likely https://bugs.freedesktop.org/show_bug.cgi?id=89360.
Comment 6 Martin Peres 2018-11-30 13:48:44 UTC
(In reply to Martin Peres from comment #5)
> (In reply to Hang Yuan from comment #2)
> > Hi Martin, could you share your test steps? 
> 
> Sure thing! First of all, here is an html page with the full report
> (including boot logs, and the logs during the execution):
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5090_128/fi-bdw-gvtdvm/
> igt%40drv_selftest%40live_gem.html

Sorry, forgot to say how to reproduce:
1. Download the IGT test suite: https://gitlab.freedesktop.org/drm/igt-gpu-tools
2. Execute the test igt@i915_selftest@live_gem in the guest until it crashes

Right now, our reproduction rate is low: 1 / 390 runs. It probably has a reproduction rate of .1% or so.
Comment 7 Martin Peres 2018-12-10 13:01:21 UTC
Also seen on BSW: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5292/fi-bsw-n3050/igt@i915_selftest@live_gem.html

<4> [446.462577] ------------[ cut here ]------------
<4> [446.462586] WARN_ON(!intel_engines_are_idle(i915))
<4> [446.462764] WARNING: CPU: 1 PID: 3974 at drivers/gpu/drm/i915/i915_gem.c:5127 i915_gem_suspend+0x120/0x150 [i915]
<4> [446.462770] Modules linked in: i915(+) amdgpu chash gpu_sched ttm vgem btusb btrtl btbcm btintel bluetooth snd_hda_codec_hdmi ecdh_generic snd_hda_codec_realtek snd_hda_codec_generic coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core r8169 snd_pcm lpc_ich i2c_i801 pinctrl_cherryview prime_numbers [last unloaded: i915]
<4> [446.462813] CPU: 1 PID: 3974 Comm: i915_selftest Tainted: G     U            4.20.0-rc5-CI-CI_DRM_5292+ #1
<4> [446.462818] Hardware name:  /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016
<4> [446.462918] RIP: 0010:i915_gem_suspend+0x120/0x150 [i915]
<4> [446.462924] Code: 02 00 84 c0 74 11 48 89 ef e8 5c 0c fc ff 48 83 c4 08 31 c0 5b 5d c3 48 c7 c6 e8 62 c2 a0 48 c7 c7 7f cf c0 a0 e8 10 7d 59 e0 <0f> 0b 48 89 ef e8 c6 79 ff ff eb d0 48 c7 c6 3c d3 c0 a0 48 c7 c7
<4> [446.462929] RSP: 0018:ffffc900001ebac0 EFLAGS: 00010282
<4> [446.462935] RAX: 0000000000000000 RBX: ffff88817132b160 RCX: 0000000000000000
<4> [446.462940] RDX: 0000000000000006 RSI: ffffffff821298aa RDI: ffffffff820d7ca7
<4> [446.462945] RBP: ffff888171320000 R08: 00000000028bc995 R09: 0000000000000000
<4> [446.462949] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4> [446.462954] R13: ffff888171320068 R14: ffff888173d76538 R15: ffff888171320000
<4> [446.462959] FS:  00007f7d4e906980(0000) GS:ffff88817bb00000(0000) knlGS:0000000000000000
<4> [446.462964] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [446.462969] CR2: 00007f93690e65a0 CR3: 00000001774ca000 CR4: 00000000001006e0
<4> [446.462973] Call Trace:
<4> [446.463081]  igt_gem_suspend+0x80/0x130 [i915]
<4> [446.463191]  __i915_subtests+0x5e/0xf0 [i915]
<4> [446.463303]  __run_selftests+0x10b/0x190 [i915]
<4> [446.463411]  i915_live_selftests+0x2c/0x60 [i915]
<4> [446.463504]  i915_pci_probe+0x50/0xa0 [i915]
<4> [446.463518]  pci_device_probe+0xa1/0x130
<4> [446.463532]  really_probe+0xf3/0x3e0
<4> [446.463543]  driver_probe_device+0x10a/0x120
<4> [446.463552]  __driver_attach+0xdb/0x100
<4> [446.463558]  ? driver_probe_device+0x120/0x120
<4> [446.463565]  ? driver_probe_device+0x120/0x120
<4> [446.463571]  bus_for_each_dev+0x74/0xc0
<4> [446.463583]  bus_add_driver+0x15f/0x250
<4> [446.463591]  ? 0xffffffffa00fb000
<4> [446.463598]  driver_register+0x56/0xe0
<4> [446.463605]  ? 0xffffffffa00fb000
<4> [446.463612]  do_one_initcall+0x58/0x2e0
<4> [446.463622]  ? do_init_module+0x1d/0x1ea
<4> [446.463630]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [446.463637]  ? kmem_cache_alloc_trace+0x264/0x290
<4> [446.463649]  do_init_module+0x56/0x1ea
<4> [446.463659]  load_module+0x227a/0x29c0
<4> [446.463697]  ? __se_sys_finit_module+0xd3/0xf0
<4> [446.463702]  __se_sys_finit_module+0xd3/0xf0
<4> [446.463726]  do_syscall_64+0x55/0x190
<4> [446.463736]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [446.463742] RIP: 0033:0x7f7d4e1d7839
<4> [446.463748] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4> [446.463753] RSP: 002b:00007ffdc83bfa88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [446.463759] RAX: ffffffffffffffda RBX: 000055bf6dbb9b90 RCX: 00007f7d4e1d7839
<4> [446.463764] RDX: 0000000000000000 RSI: 000055bf6dbba8e0 RDI: 0000000000000006
<4> [446.463768] RBP: 000055bf6dbba8e0 R08: 0000000000000004 R09: 0000000000000000
<4> [446.463773] R10: 00007ffdc83bfc00 R11: 0000000000000246 R12: 0000000000000000
<4> [446.463777] R13: 000055bf6dbb3da0 R14: 0000000000000020 R15: 0000000000000037
<4> [446.463799] irq event stamp: 264646
<4> [446.463807] hardirqs last  enabled at (264645): [<ffffffff810fc764>] vprintk_emit+0x124/0x320
<4> [446.463813] hardirqs last disabled at (264646): [<ffffffff810019b0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [446.463819] softirqs last  enabled at (264630): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [446.463826] softirqs last disabled at (264617): [<ffffffff8108e6a9>] irq_exit+0xa9/0xc0
<4> [446.463926] WARNING: CPU: 1 PID: 3974 at drivers/gpu/drm/i915/i915_gem.c:5127 i915_gem_suspend+0x120/0x150 [i915]
<4> [446.463931] ---[ end trace e16c44d0567d05c5 ]---


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.