Bug 108797 - [CI][BAT] igt@i915_selftest@live_gem / igt@gem_ctx_isolation@vcs0-s3 - dmesg-warn - WARN_ON(!intel_engines_are_idle(i915))
Summary: [CI][BAT] igt@i915_selftest@live_gem / igt@gem_ctx_isolation@vcs0-s3 - dmesg-...
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/iGVT-g (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Terrence Xu
QA Contact: Terrence Xu
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-19 16:22 UTC by Martin Peres
Modified: 2019-06-14 12:17 UTC (History)
1 user (show)

See Also:
i915 platform: BDW
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-11-19 16:22:06 UTC
<6> [377.158520] [drm] DRM_I915_DEBUG enabled
<6> [377.158521] [drm] DRM_I915_DEBUG_GEM enabled
<6> [377.158523] [drm] DRM_I915_DEBUG_RUNTIME_PM enabled
<6> [377.158526] i915: Performing live selftests with st_random_seed=0x164b03e4 st_timeout=1000
<7> [377.158964] [drm:intel_power_well_enable [i915]] enabling always-on
<7> [377.165809] [drm:intel_power_well_disable [i915]] disabling always-on
<4> [377.168597] ------------[ cut here ]------------
<4> [377.168600] WARN_ON(!intel_engines_are_idle(i915))
<4> [377.168673] WARNING: CPU: 0 PID: 3594 at drivers/gpu/drm/i915/i915_gem.c:5123 i915_gem_suspend+0x120/0x150 [i915]
<4> [377.168675] Modules linked in: i915(+) amdgpu chash gpu_sched ttm snd_hda_codec snd_hwdep snd_hda_core snd_pcm vgem crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000 prime_numbers i2c_piix4 [last unloaded: i915]
<4> [377.168689] CPU: 0 PID: 3594 Comm: drv_selftest Tainted: G U 4.20.0-rc1-g756a0fd616c3-repeat_14+ #1
<4> [377.168691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [377.168736] RIP: 0010:i915_gem_suspend+0x120/0x150 [i915]
<4> [377.168738] Code: 02 00 84 c0 74 11 48 89 ef e8 9c 25 fc ff 48 83 c4 08 31 c0 5b 5d c3 48 c7 c6 20 a1 e5 c0 48 c7 c7 4e 14 e4 c0 e8 a0 06 36 c0 <0f> 0b 48 89 ef e8 86 79 ff ff eb d0 48 c7 c6 0b 18 e4 c0 48 c7 c7
<4> [377.168740] RSP: 0018:ffffa23cc026bac0 EFLAGS: 00010282
<4> [377.168742] RAX: 0000000000000000 RBX: ffff918a22e1b258 RCX: 0000000000000000
<4> [377.168744] RDX: 0000000000000006 RSI: ffffffff820fbc9a RDI: ffffffff820a78ef
<4> [377.168746] RBP: ffff918a22e10000 R08: 0000000052570ee8 R09: 0000000000000000
<4> [377.168748] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4> [377.168749] R13: ffff918a22e10068 R14: ffff918a4f46dfe8 R15: ffff918a22e10000
<4> [377.168752] FS: 00007f4e2cde9980(0000) GS:ffff918a7da00000(0000) knlGS:0000000000000000
<4> [377.168754] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [377.168755] CR2: 00007fff9a65bf48 CR3: 0000000044842003 CR4: 00000000003606f0
<4> [377.168759] Call Trace:
<4> [377.168805] igt_gem_suspend+0x80/0x130 [i915]
<4> [377.168855] __i915_subtests+0x5e/0xf0 [i915]
<4> [377.168906] __run_selftests+0x10b/0x190 [i915]
<4> [377.168955] i915_live_selftests+0x2c/0x60 [i915]
<4> [377.168994] i915_pci_probe+0x50/0xa0 [i915]
<4> [377.168999] pci_device_probe+0xa1/0x130
<4> [377.169004] really_probe+0xf3/0x3e0
<4> [377.169008] driver_probe_device+0x10a/0x120
<4> [377.169014] __driver_attach+0xdb/0x100
<4> [377.169016] ? driver_probe_device+0x120/0x120
<4> [377.169033] ? driver_probe_device+0x120/0x120
<4> [377.169037] bus_for_each_dev+0x74/0xc0
<4> [377.169041] bus_add_driver+0x15f/0x250
<4> [377.169044] ? 0xffffffffc0714000
<4> [377.169047] driver_register+0x56/0xe0
<4> [377.169049] ? 0xffffffffc0714000
<4> [377.169052] do_one_initcall+0x58/0x2e0
<4> [377.169057] ? do_init_module+0x1d/0x1ea
<4> [377.169059] ? rcu_read_lock_sched_held+0x6f/0x80
<4> [377.169063] ? kmem_cache_alloc_trace+0x264/0x290
<4> [377.169068] do_init_module+0x56/0x1ea
<4> [377.169071] load_module+0x2714/0x29f0
<4> [377.169097] ? __se_sys_finit_module+0xd3/0xf0
<4> [377.169099] __se_sys_finit_module+0xd3/0xf0
<4> [377.169107] do_syscall_64+0x55/0x190
<4> [377.169111] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [377.169113] RIP: 0033:0x7f4e2c6b3839
<4> [377.169116] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4> [377.169117] RSP: 002b:00007ffeeb5ef568 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [377.169120] RAX: ffffffffffffffda RBX: 000056174bdca4c0 RCX: 00007f4e2c6b3839
<4> [377.169122] RDX: 0000000000000000 RSI: 000056174bdcb170 RDI: 0000000000000006
<4> [377.169123] RBP: 000056174bdcb170 R08: 0000000000000004 R09: 0000000000000000
<4> [377.169125] R10: 00007ffeeb5ef6e0 R11: 0000000000000246 R12: 0000000000000000
<4> [377.169126] R13: 000056174bdc3e30 R14: 0000000000000020 R15: 0000000000000037
<4> [377.169134] irq event stamp: 253786
<4> [377.169138] hardirqs last enabled at (253785): [<ffffffff810ff884>] vprintk_emit+0x124/0x320
<4> [377.169140] hardirqs last disabled at (253786): [<ffffffff810019b0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [377.169143] softirqs last enabled at (253770): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [377.169146] softirqs last disabled at (253763): [<ffffffff81091cb9>] irq_exit+0xa9/0xc0
<4> [377.169191] WARNING: CPU: 0 PID: 3594 at drivers/gpu/drm/i915/i915_gem.c:5123 i915_gem_suspend+0x120/0x150 [i915]
<4> [377.169193] ---[ end trace cb8da0179d23471c ]---
Comment 1 Joonas Lahtinen 2018-11-20 13:43:45 UTC
Suspend test not passing due to engines not idling when running under GVT-d is most likely a dupe of https://bugs.freedesktop.org/show_bug.cgi?id=89360 .
Comment 2 Hang Yuan 2018-11-30 05:01:24 UTC
Hi Martin, could you share your test steps? Is the call trace in guest or host kernel?
Comment 3 Jani Saarinen 2018-11-30 07:21:13 UTC
This must be in quest?
Comment 4 Jani Saarinen 2018-11-30 07:24:33 UTC
This has been seen CI_DRM_5090_128 (1 week, 6 days ago)
Comment 5 Martin Peres 2018-11-30 13:43:15 UTC
(In reply to Hang Yuan from comment #2)
> Hi Martin, could you share your test steps? 

Sure thing! First of all, here is an html page with the full report (including boot logs, and the logs during the execution): https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5090_128/fi-bdw-gvtdvm/igt%40drv_selftest%40live_gem.html

(In reply to Hang Yuan from comment #2)
> Is the call trace in guest or host kernel?

This is from the guest. We do not have the infrastructure to collect the host kernel's logs, hence why Joonas went on a limb and guessed it was likely https://bugs.freedesktop.org/show_bug.cgi?id=89360.
Comment 6 Martin Peres 2018-11-30 13:48:44 UTC
(In reply to Martin Peres from comment #5)
> (In reply to Hang Yuan from comment #2)
> > Hi Martin, could you share your test steps? 
> 
> Sure thing! First of all, here is an html page with the full report
> (including boot logs, and the logs during the execution):
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5090_128/fi-bdw-gvtdvm/
> igt%40drv_selftest%40live_gem.html

Sorry, forgot to say how to reproduce:
1. Download the IGT test suite: https://gitlab.freedesktop.org/drm/igt-gpu-tools
2. Execute the test igt@i915_selftest@live_gem in the guest until it crashes

Right now, our reproduction rate is low: 1 / 390 runs. It probably has a reproduction rate of .1% or so.
Comment 7 Martin Peres 2018-12-10 13:01:21 UTC
Also seen on BSW: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5292/fi-bsw-n3050/igt@i915_selftest@live_gem.html

<4> [446.462577] ------------[ cut here ]------------
<4> [446.462586] WARN_ON(!intel_engines_are_idle(i915))
<4> [446.462764] WARNING: CPU: 1 PID: 3974 at drivers/gpu/drm/i915/i915_gem.c:5127 i915_gem_suspend+0x120/0x150 [i915]
<4> [446.462770] Modules linked in: i915(+) amdgpu chash gpu_sched ttm vgem btusb btrtl btbcm btintel bluetooth snd_hda_codec_hdmi ecdh_generic snd_hda_codec_realtek snd_hda_codec_generic coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core r8169 snd_pcm lpc_ich i2c_i801 pinctrl_cherryview prime_numbers [last unloaded: i915]
<4> [446.462813] CPU: 1 PID: 3974 Comm: i915_selftest Tainted: G     U            4.20.0-rc5-CI-CI_DRM_5292+ #1
<4> [446.462818] Hardware name:  /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016
<4> [446.462918] RIP: 0010:i915_gem_suspend+0x120/0x150 [i915]
<4> [446.462924] Code: 02 00 84 c0 74 11 48 89 ef e8 5c 0c fc ff 48 83 c4 08 31 c0 5b 5d c3 48 c7 c6 e8 62 c2 a0 48 c7 c7 7f cf c0 a0 e8 10 7d 59 e0 <0f> 0b 48 89 ef e8 c6 79 ff ff eb d0 48 c7 c6 3c d3 c0 a0 48 c7 c7
<4> [446.462929] RSP: 0018:ffffc900001ebac0 EFLAGS: 00010282
<4> [446.462935] RAX: 0000000000000000 RBX: ffff88817132b160 RCX: 0000000000000000
<4> [446.462940] RDX: 0000000000000006 RSI: ffffffff821298aa RDI: ffffffff820d7ca7
<4> [446.462945] RBP: ffff888171320000 R08: 00000000028bc995 R09: 0000000000000000
<4> [446.462949] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4> [446.462954] R13: ffff888171320068 R14: ffff888173d76538 R15: ffff888171320000
<4> [446.462959] FS:  00007f7d4e906980(0000) GS:ffff88817bb00000(0000) knlGS:0000000000000000
<4> [446.462964] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [446.462969] CR2: 00007f93690e65a0 CR3: 00000001774ca000 CR4: 00000000001006e0
<4> [446.462973] Call Trace:
<4> [446.463081]  igt_gem_suspend+0x80/0x130 [i915]
<4> [446.463191]  __i915_subtests+0x5e/0xf0 [i915]
<4> [446.463303]  __run_selftests+0x10b/0x190 [i915]
<4> [446.463411]  i915_live_selftests+0x2c/0x60 [i915]
<4> [446.463504]  i915_pci_probe+0x50/0xa0 [i915]
<4> [446.463518]  pci_device_probe+0xa1/0x130
<4> [446.463532]  really_probe+0xf3/0x3e0
<4> [446.463543]  driver_probe_device+0x10a/0x120
<4> [446.463552]  __driver_attach+0xdb/0x100
<4> [446.463558]  ? driver_probe_device+0x120/0x120
<4> [446.463565]  ? driver_probe_device+0x120/0x120
<4> [446.463571]  bus_for_each_dev+0x74/0xc0
<4> [446.463583]  bus_add_driver+0x15f/0x250
<4> [446.463591]  ? 0xffffffffa00fb000
<4> [446.463598]  driver_register+0x56/0xe0
<4> [446.463605]  ? 0xffffffffa00fb000
<4> [446.463612]  do_one_initcall+0x58/0x2e0
<4> [446.463622]  ? do_init_module+0x1d/0x1ea
<4> [446.463630]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [446.463637]  ? kmem_cache_alloc_trace+0x264/0x290
<4> [446.463649]  do_init_module+0x56/0x1ea
<4> [446.463659]  load_module+0x227a/0x29c0
<4> [446.463697]  ? __se_sys_finit_module+0xd3/0xf0
<4> [446.463702]  __se_sys_finit_module+0xd3/0xf0
<4> [446.463726]  do_syscall_64+0x55/0x190
<4> [446.463736]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [446.463742] RIP: 0033:0x7f7d4e1d7839
<4> [446.463748] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4> [446.463753] RSP: 002b:00007ffdc83bfa88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [446.463759] RAX: ffffffffffffffda RBX: 000055bf6dbb9b90 RCX: 00007f7d4e1d7839
<4> [446.463764] RDX: 0000000000000000 RSI: 000055bf6dbba8e0 RDI: 0000000000000006
<4> [446.463768] RBP: 000055bf6dbba8e0 R08: 0000000000000004 R09: 0000000000000000
<4> [446.463773] R10: 00007ffdc83bfc00 R11: 0000000000000246 R12: 0000000000000000
<4> [446.463777] R13: 000055bf6dbb3da0 R14: 0000000000000020 R15: 0000000000000037
<4> [446.463799] irq event stamp: 264646
<4> [446.463807] hardirqs last  enabled at (264645): [<ffffffff810fc764>] vprintk_emit+0x124/0x320
<4> [446.463813] hardirqs last disabled at (264646): [<ffffffff810019b0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [446.463819] softirqs last  enabled at (264630): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [446.463826] softirqs last disabled at (264617): [<ffffffff8108e6a9>] irq_exit+0xa9/0xc0
<4> [446.463926] WARNING: CPU: 1 PID: 3974 at drivers/gpu/drm/i915/i915_gem.c:5127 i915_gem_suspend+0x120/0x150 [i915]
<4> [446.463931] ---[ end trace e16c44d0567d05c5 ]---
Comment 8 Martin Peres 2018-12-11 16:08:28 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_152/fi-bsw-n3050/igt@gem_ctx_isolation@vcs0-s3.html

<4> [93.854930] ------------[ cut here ]------------
<4> [93.854946] WARN_ON(!intel_engines_are_idle(i915))
<4> [93.855098] WARNING: CPU: 1 PID: 1084 at drivers/gpu/drm/i915/i915_gem.c:5161 i915_gem_suspend+0x120/0x150 [i915]
<4> [93.855104] Modules linked in: btusb btrtl btbcm btintel snd_hda_codec_hdmi bluetooth snd_hda_codec_realtek snd_hda_codec_generic ecdh_generic i915 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 snd_pcm i2c_i801 lpc_ich prime_numbers pinctrl_cherryview
<4> [93.855147] CPU: 1 PID: 1084 Comm: rtcwake Tainted: G     U            4.20.0-rc4-gf9cd6a3ffb44-drmtip_152+ #1
<4> [93.855152] Hardware name:  /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016
<4> [93.855231] RIP: 0010:i915_gem_suspend+0x120/0x150 [i915]
<4> [93.855237] Code: 02 00 84 c0 74 11 48 89 ef e8 3c 13 fc ff 48 83 c4 08 31 c0 5b 5d c3 48 c7 c6 20 4d 30 c0 48 c7 c7 35 bd 2e c0 e8 30 a8 eb d4 <0f> 0b 48 89 ef e8 76 79 ff ff eb d0 48 c7 c6 f2 c0 2e c0 48 c7 c7
<4> [93.855243] RSP: 0018:ffffb6e0806a3c88 EFLAGS: 00010286
<4> [93.855249] RAX: 0000000000000000 RBX: ffff982c2314b1f8 RCX: 0000000000000006
<4> [93.855255] RDX: 0000000000000006 RSI: ffffffff960ff162 RDI: ffffffff960aacc7
<4> [93.855260] RBP: ffff982c23140000 R08: 0000000038a2d861 R09: 0000000000000000
<4> [93.855265] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4> [93.855270] R13: ffffffff96339370 R14: ffffffff962aea80 R15: ffff982c3b2f3508
<4> [93.855276] FS:  00007fa40073b540(0000) GS:ffff982c3bb00000(0000) knlGS:0000000000000000
<4> [93.855281] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [93.855287] CR2: 000055e8a337bd80 CR3: 0000000163078000 CR4: 00000000001006e0
<4> [93.855292] Call Trace:
<4> [93.855371]  i915_pm_prepare+0x24/0x70 [i915]
<4> [93.855386]  pci_pm_prepare+0x26/0x50
<4> [93.855396]  dpm_prepare+0x131/0x6b0
<4> [93.855405]  ? pci_legacy_resume+0x40/0x40
<4> [93.855418]  dpm_suspend_start+0x8/0x50
<4> [93.855427]  suspend_devices_and_enter+0xcc/0xbc0
<4> [93.855448]  pm_suspend+0x573/0x870
<4> [93.855462]  state_store+0x78/0xe0
<4> [93.855479]  kernfs_fop_write+0x104/0x190
<4> [93.855494]  __vfs_write+0x31/0x180
<4> [93.855506]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [93.855512]  ? rcu_sync_lockdep_assert+0x29/0x50
<4> [93.855519]  ? __sb_start_write+0x152/0x1f0
<4> [93.855526]  ? __sb_start_write+0x163/0x1f0
<4> [93.855538]  vfs_write+0xbd/0x1b0
<4> [93.855550]  ksys_write+0x50/0xc0
<4> [93.855565]  do_syscall_64+0x55/0x190
<4> [93.855576]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [93.855582] RIP: 0033:0x7fa400256154
<4> [93.855589] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 b1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5
<4> [93.855594] RSP: 002b:00007ffc82b009e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [93.855601] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fa400256154
<4> [93.855606] RDX: 0000000000000004 RSI: 00005630c18195b0 RDI: 000000000000000a
<4> [93.855611] RBP: 00005630c18195b0 R08: 00005630c18165e0 R09: 00007fa40073b540
<4> [93.855616] R10: 00005630c1814010 R11: 0000000000000246 R12: 00005630c1816500
<4> [93.855621] R13: 0000000000000004 R14: 00007fa40052e2a0 R15: 00007fa40052d760
<4> [93.855650] irq event stamp: 9378
<4> [93.855658] hardirqs last  enabled at (9377): [<ffffffff95100614>] vprintk_emit+0x124/0x320
<4> [93.855664] hardirqs last disabled at (9378): [<ffffffff950019b0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [93.855671] softirqs last  enabled at (9360): [<ffffffff95c0033a>] __do_softirq+0x33a/0x4b9
<4> [93.855678] softirqs last disabled at (9323): [<ffffffff950929f9>] irq_exit+0xa9/0xc0
<4> [93.855756] WARNING: CPU: 1 PID: 1084 at drivers/gpu/drm/i915/i915_gem.c:5161 i915_gem_suspend+0x120/0x150 [i915]
<4> [93.855761] ---[ end trace 88ce2c29e43ece49 ]---
Comment 9 Francesco Balestrieri 2019-03-21 10:57:55 UTC
This hasn't shown in the past 3 months, before it occurred once a week. Was it fixed?
Comment 10 Lakshmi 2019-03-21 14:53:56 UTC
Dropping the priority to Medium.
Last seen this issue 3 months, 1 week ago where as the first occurrence of this issue is 4 months ago. 
From the history we can see that the second instance of this failure took 1 month 1 week or 827 runs to occur.

We can keep this bug open for 3 more weeks and close it if there are no occurrences by then.
Comment 11 Lakshmi 2019-06-14 12:17:05 UTC
On an average this issue used to occur once in 72 CI_DRM runs. 
Current CI_DRM run is 6265 and last seen on CI_DRM-5307. Closing this issue as WORKSFORME according to 10x rule.
Comment 12 CI Bug Log 2019-06-14 12:17:15 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.