Summary: | [CI][BAT] igt@i915_selftest@live_gem / igt@gem_ctx_isolation@vcs0-s3 - dmesg-warn - WARN_ON(!intel_engines_are_idle(i915)) | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/iGVT-g | Assignee: | Terrence Xu <terrence.xu> |
Status: | RESOLVED WORKSFORME | QA Contact: | Terrence Xu <terrence.xu> |
Severity: | normal | ||
Priority: | medium | CC: | intel-gfx-bugs |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | BDW | i915 features: | GEM/Other |
Description
Martin Peres
2018-11-19 16:22:06 UTC
Suspend test not passing due to engines not idling when running under GVT-d is most likely a dupe of https://bugs.freedesktop.org/show_bug.cgi?id=89360 . Hi Martin, could you share your test steps? Is the call trace in guest or host kernel? This must be in quest? This has been seen CI_DRM_5090_128 (1 week, 6 days ago) (In reply to Hang Yuan from comment #2) > Hi Martin, could you share your test steps? Sure thing! First of all, here is an html page with the full report (including boot logs, and the logs during the execution): https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5090_128/fi-bdw-gvtdvm/igt%40drv_selftest%40live_gem.html (In reply to Hang Yuan from comment #2) > Is the call trace in guest or host kernel? This is from the guest. We do not have the infrastructure to collect the host kernel's logs, hence why Joonas went on a limb and guessed it was likely https://bugs.freedesktop.org/show_bug.cgi?id=89360. (In reply to Martin Peres from comment #5) > (In reply to Hang Yuan from comment #2) > > Hi Martin, could you share your test steps? > > Sure thing! First of all, here is an html page with the full report > (including boot logs, and the logs during the execution): > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5090_128/fi-bdw-gvtdvm/ > igt%40drv_selftest%40live_gem.html Sorry, forgot to say how to reproduce: 1. Download the IGT test suite: https://gitlab.freedesktop.org/drm/igt-gpu-tools 2. Execute the test igt@i915_selftest@live_gem in the guest until it crashes Right now, our reproduction rate is low: 1 / 390 runs. It probably has a reproduction rate of .1% or so. Also seen on BSW: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5292/fi-bsw-n3050/igt@i915_selftest@live_gem.html <4> [446.462577] ------------[ cut here ]------------ <4> [446.462586] WARN_ON(!intel_engines_are_idle(i915)) <4> [446.462764] WARNING: CPU: 1 PID: 3974 at drivers/gpu/drm/i915/i915_gem.c:5127 i915_gem_suspend+0x120/0x150 [i915] <4> [446.462770] Modules linked in: i915(+) amdgpu chash gpu_sched ttm vgem btusb btrtl btbcm btintel bluetooth snd_hda_codec_hdmi ecdh_generic snd_hda_codec_realtek snd_hda_codec_generic coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core r8169 snd_pcm lpc_ich i2c_i801 pinctrl_cherryview prime_numbers [last unloaded: i915] <4> [446.462813] CPU: 1 PID: 3974 Comm: i915_selftest Tainted: G U 4.20.0-rc5-CI-CI_DRM_5292+ #1 <4> [446.462818] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016 <4> [446.462918] RIP: 0010:i915_gem_suspend+0x120/0x150 [i915] <4> [446.462924] Code: 02 00 84 c0 74 11 48 89 ef e8 5c 0c fc ff 48 83 c4 08 31 c0 5b 5d c3 48 c7 c6 e8 62 c2 a0 48 c7 c7 7f cf c0 a0 e8 10 7d 59 e0 <0f> 0b 48 89 ef e8 c6 79 ff ff eb d0 48 c7 c6 3c d3 c0 a0 48 c7 c7 <4> [446.462929] RSP: 0018:ffffc900001ebac0 EFLAGS: 00010282 <4> [446.462935] RAX: 0000000000000000 RBX: ffff88817132b160 RCX: 0000000000000000 <4> [446.462940] RDX: 0000000000000006 RSI: ffffffff821298aa RDI: ffffffff820d7ca7 <4> [446.462945] RBP: ffff888171320000 R08: 00000000028bc995 R09: 0000000000000000 <4> [446.462949] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4> [446.462954] R13: ffff888171320068 R14: ffff888173d76538 R15: ffff888171320000 <4> [446.462959] FS: 00007f7d4e906980(0000) GS:ffff88817bb00000(0000) knlGS:0000000000000000 <4> [446.462964] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [446.462969] CR2: 00007f93690e65a0 CR3: 00000001774ca000 CR4: 00000000001006e0 <4> [446.462973] Call Trace: <4> [446.463081] igt_gem_suspend+0x80/0x130 [i915] <4> [446.463191] __i915_subtests+0x5e/0xf0 [i915] <4> [446.463303] __run_selftests+0x10b/0x190 [i915] <4> [446.463411] i915_live_selftests+0x2c/0x60 [i915] <4> [446.463504] i915_pci_probe+0x50/0xa0 [i915] <4> [446.463518] pci_device_probe+0xa1/0x130 <4> [446.463532] really_probe+0xf3/0x3e0 <4> [446.463543] driver_probe_device+0x10a/0x120 <4> [446.463552] __driver_attach+0xdb/0x100 <4> [446.463558] ? driver_probe_device+0x120/0x120 <4> [446.463565] ? driver_probe_device+0x120/0x120 <4> [446.463571] bus_for_each_dev+0x74/0xc0 <4> [446.463583] bus_add_driver+0x15f/0x250 <4> [446.463591] ? 0xffffffffa00fb000 <4> [446.463598] driver_register+0x56/0xe0 <4> [446.463605] ? 0xffffffffa00fb000 <4> [446.463612] do_one_initcall+0x58/0x2e0 <4> [446.463622] ? do_init_module+0x1d/0x1ea <4> [446.463630] ? rcu_read_lock_sched_held+0x6f/0x80 <4> [446.463637] ? kmem_cache_alloc_trace+0x264/0x290 <4> [446.463649] do_init_module+0x56/0x1ea <4> [446.463659] load_module+0x227a/0x29c0 <4> [446.463697] ? __se_sys_finit_module+0xd3/0xf0 <4> [446.463702] __se_sys_finit_module+0xd3/0xf0 <4> [446.463726] do_syscall_64+0x55/0x190 <4> [446.463736] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [446.463742] RIP: 0033:0x7f7d4e1d7839 <4> [446.463748] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 <4> [446.463753] RSP: 002b:00007ffdc83bfa88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 <4> [446.463759] RAX: ffffffffffffffda RBX: 000055bf6dbb9b90 RCX: 00007f7d4e1d7839 <4> [446.463764] RDX: 0000000000000000 RSI: 000055bf6dbba8e0 RDI: 0000000000000006 <4> [446.463768] RBP: 000055bf6dbba8e0 R08: 0000000000000004 R09: 0000000000000000 <4> [446.463773] R10: 00007ffdc83bfc00 R11: 0000000000000246 R12: 0000000000000000 <4> [446.463777] R13: 000055bf6dbb3da0 R14: 0000000000000020 R15: 0000000000000037 <4> [446.463799] irq event stamp: 264646 <4> [446.463807] hardirqs last enabled at (264645): [<ffffffff810fc764>] vprintk_emit+0x124/0x320 <4> [446.463813] hardirqs last disabled at (264646): [<ffffffff810019b0>] trace_hardirqs_off_thunk+0x1a/0x1c <4> [446.463819] softirqs last enabled at (264630): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9 <4> [446.463826] softirqs last disabled at (264617): [<ffffffff8108e6a9>] irq_exit+0xa9/0xc0 <4> [446.463926] WARNING: CPU: 1 PID: 3974 at drivers/gpu/drm/i915/i915_gem.c:5127 i915_gem_suspend+0x120/0x150 [i915] <4> [446.463931] ---[ end trace e16c44d0567d05c5 ]--- https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_152/fi-bsw-n3050/igt@gem_ctx_isolation@vcs0-s3.html <4> [93.854930] ------------[ cut here ]------------ <4> [93.854946] WARN_ON(!intel_engines_are_idle(i915)) <4> [93.855098] WARNING: CPU: 1 PID: 1084 at drivers/gpu/drm/i915/i915_gem.c:5161 i915_gem_suspend+0x120/0x150 [i915] <4> [93.855104] Modules linked in: btusb btrtl btbcm btintel snd_hda_codec_hdmi bluetooth snd_hda_codec_realtek snd_hda_codec_generic ecdh_generic i915 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 snd_pcm i2c_i801 lpc_ich prime_numbers pinctrl_cherryview <4> [93.855147] CPU: 1 PID: 1084 Comm: rtcwake Tainted: G U 4.20.0-rc4-gf9cd6a3ffb44-drmtip_152+ #1 <4> [93.855152] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016 <4> [93.855231] RIP: 0010:i915_gem_suspend+0x120/0x150 [i915] <4> [93.855237] Code: 02 00 84 c0 74 11 48 89 ef e8 3c 13 fc ff 48 83 c4 08 31 c0 5b 5d c3 48 c7 c6 20 4d 30 c0 48 c7 c7 35 bd 2e c0 e8 30 a8 eb d4 <0f> 0b 48 89 ef e8 76 79 ff ff eb d0 48 c7 c6 f2 c0 2e c0 48 c7 c7 <4> [93.855243] RSP: 0018:ffffb6e0806a3c88 EFLAGS: 00010286 <4> [93.855249] RAX: 0000000000000000 RBX: ffff982c2314b1f8 RCX: 0000000000000006 <4> [93.855255] RDX: 0000000000000006 RSI: ffffffff960ff162 RDI: ffffffff960aacc7 <4> [93.855260] RBP: ffff982c23140000 R08: 0000000038a2d861 R09: 0000000000000000 <4> [93.855265] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4> [93.855270] R13: ffffffff96339370 R14: ffffffff962aea80 R15: ffff982c3b2f3508 <4> [93.855276] FS: 00007fa40073b540(0000) GS:ffff982c3bb00000(0000) knlGS:0000000000000000 <4> [93.855281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [93.855287] CR2: 000055e8a337bd80 CR3: 0000000163078000 CR4: 00000000001006e0 <4> [93.855292] Call Trace: <4> [93.855371] i915_pm_prepare+0x24/0x70 [i915] <4> [93.855386] pci_pm_prepare+0x26/0x50 <4> [93.855396] dpm_prepare+0x131/0x6b0 <4> [93.855405] ? pci_legacy_resume+0x40/0x40 <4> [93.855418] dpm_suspend_start+0x8/0x50 <4> [93.855427] suspend_devices_and_enter+0xcc/0xbc0 <4> [93.855448] pm_suspend+0x573/0x870 <4> [93.855462] state_store+0x78/0xe0 <4> [93.855479] kernfs_fop_write+0x104/0x190 <4> [93.855494] __vfs_write+0x31/0x180 <4> [93.855506] ? rcu_read_lock_sched_held+0x6f/0x80 <4> [93.855512] ? rcu_sync_lockdep_assert+0x29/0x50 <4> [93.855519] ? __sb_start_write+0x152/0x1f0 <4> [93.855526] ? __sb_start_write+0x163/0x1f0 <4> [93.855538] vfs_write+0xbd/0x1b0 <4> [93.855550] ksys_write+0x50/0xc0 <4> [93.855565] do_syscall_64+0x55/0x190 <4> [93.855576] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [93.855582] RIP: 0033:0x7fa400256154 <4> [93.855589] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 b1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5 <4> [93.855594] RSP: 002b:00007ffc82b009e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 <4> [93.855601] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fa400256154 <4> [93.855606] RDX: 0000000000000004 RSI: 00005630c18195b0 RDI: 000000000000000a <4> [93.855611] RBP: 00005630c18195b0 R08: 00005630c18165e0 R09: 00007fa40073b540 <4> [93.855616] R10: 00005630c1814010 R11: 0000000000000246 R12: 00005630c1816500 <4> [93.855621] R13: 0000000000000004 R14: 00007fa40052e2a0 R15: 00007fa40052d760 <4> [93.855650] irq event stamp: 9378 <4> [93.855658] hardirqs last enabled at (9377): [<ffffffff95100614>] vprintk_emit+0x124/0x320 <4> [93.855664] hardirqs last disabled at (9378): [<ffffffff950019b0>] trace_hardirqs_off_thunk+0x1a/0x1c <4> [93.855671] softirqs last enabled at (9360): [<ffffffff95c0033a>] __do_softirq+0x33a/0x4b9 <4> [93.855678] softirqs last disabled at (9323): [<ffffffff950929f9>] irq_exit+0xa9/0xc0 <4> [93.855756] WARNING: CPU: 1 PID: 1084 at drivers/gpu/drm/i915/i915_gem.c:5161 i915_gem_suspend+0x120/0x150 [i915] <4> [93.855761] ---[ end trace 88ce2c29e43ece49 ]--- This hasn't shown in the past 3 months, before it occurred once a week. Was it fixed? Dropping the priority to Medium. Last seen this issue 3 months, 1 week ago where as the first occurrence of this issue is 4 months ago. From the history we can see that the second instance of this failure took 1 month 1 week or 827 runs to occur. We can keep this bug open for 3 more weeks and close it if there are no occurrences by then. On an average this issue used to occur once in 72 CI_DRM runs. Current CI_DRM run is 6265 and last seen on CI_DRM-5307. Closing this issue as WORKSFORME according to 10x rule. The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.