The test igt@drv_module_reload@basic-reload got the GPU wedged on CI_DRM_2587 on fi-elk-e7500. Relevant trace: [ 460.639411] [drm:i915_driver_unload [i915]] *ERROR* failed to idle hardware; continuing to unload! [ 460.660016] ------------[ cut here ]------------ [ 460.660016] WARNING: CPU: 1 PID: 2402 at drivers/gpu/drm/i915/intel_ringbuffer.c:1610 intel_engine_cleanup+0xd8/0xe0 [i915] [ 460.660016] WARN_ON(((dev_priv)->info.gen) > 2 && (dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const i915_reg_t){ .reg = (((engine)->mmio_base)+0x9c) })), true) & (1 << 9)) == 0) [ 460.660016] Modules linked in: vgem i915(-) snd_hda_codec_realtek snd_hda_codec_generic coretemp snd_hda_codec snd_hwdep snd_hda_core e1000e snd_pcm lpc_ich ptp mei_me pps_core mei prime_numbers [last unloaded: snd_hda_intel] [ 460.660016] CPU: 1 PID: 2402 Comm: drv_module_relo Tainted: G U W 4.11.0-CI-CI_DRM_2587+ #1 [ 460.660016] Hardware name: Hewlett-Packard HP Compaq 8000 Elite CMT PC/3647h, BIOS 786G7 v01.02 10/22/2009 [ 460.660016] Call Trace: [ 460.660016] dump_stack+0x67/0x92 [ 460.660016] __warn+0xc6/0xe0 [ 460.660016] warn_slowpath_fmt+0x46/0x50 [ 460.660016] ? gen2_read32+0x128/0x200 [i915] [ 460.660016] intel_engine_cleanup+0xd8/0xe0 [i915] [ 460.660016] i915_gem_cleanup_engines+0x2a/0x40 [i915] [ 460.660016] i915_gem_fini+0x28/0x90 [i915] [ 460.660016] i915_driver_unload+0x131/0x190 [i915] [ 460.660016] i915_pci_remove+0x14/0x20 [i915] [ 460.660016] pci_device_remove+0x34/0xb0 [ 460.660016] device_release_driver_internal+0x158/0x210 [ 460.660016] driver_detach+0x3b/0x80 [ 460.660016] bus_remove_driver+0x53/0xd0 [ 460.660016] driver_unregister+0x27/0x50 [ 460.660016] pci_unregister_driver+0x25/0xa0 [ 460.660016] i915_exit+0x1a/0xb1 [i915] [ 460.660016] SyS_delete_module+0x193/0x1e0 [ 460.660016] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 460.660016] RIP: 0033:0x7f9529bbb687 [ 460.660016] RSP: 002b:00007ffd45d2b628 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 [ 460.660016] RAX: ffffffffffffffda RBX: ffffffff8147ce13 RCX: 00007f9529bbb687 [ 460.660016] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 0000000000a6a748 [ 460.660016] RBP: ffffc90000657f88 R08: 0000000000000000 R09: 00007f9529c07ea0 [ 460.660016] R10: 0000000000a6a6e0 R11: 0000000000000246 R12: 0000000000000000 [ 460.660016] R13: 00007ffd45d2b800 R14: 0000000000000000 R15: 0000000000000000 [ 460.660016] ? __this_cpu_preempt_check+0x13/0x20 [ 460.667266] ---[ end trace 15d57680f6e3af9c ]--- [ 460.671600] ------------[ cut here ]------------ [ 460.671677] WARNING: CPU: 1 PID: 2402 at drivers/gpu/drm/i915/i915_drv.c:560 i915_gem_fini+0x89/0x90 [i915] [ 460.671680] WARN_ON(!list_empty(&dev_priv->context_list)) [ 460.671682] Modules linked in: vgem i915(-) snd_hda_codec_realtek snd_hda_codec_generic coretemp snd_hda_codec snd_hwdep snd_hda_core e1000e snd_pcm lpc_ich ptp mei_me pps_core mei prime_numbers [last unloaded: snd_hda_intel] [ 460.671723] CPU: 1 PID: 2402 Comm: drv_module_relo Tainted: G U W 4.11.0-CI-CI_DRM_2587+ #1 [ 460.671726] Hardware name: Hewlett-Packard HP Compaq 8000 Elite CMT PC/3647h, BIOS 786G7 v01.02 10/22/2009 [ 460.671728] Call Trace: [ 460.671734] dump_stack+0x67/0x92 [ 460.671739] __warn+0xc6/0xe0 [ 460.671744] warn_slowpath_fmt+0x46/0x50 [ 460.671749] ? _rcu_barrier+0x137/0x160 [ 460.671771] i915_gem_fini+0x89/0x90 [i915] [ 460.671792] i915_driver_unload+0x131/0x190 [i915] [ 460.671814] i915_pci_remove+0x14/0x20 [i915] [ 460.671819] pci_device_remove+0x34/0xb0 [ 460.671824] device_release_driver_internal+0x158/0x210 [ 460.671828] driver_detach+0x3b/0x80 [ 460.671831] bus_remove_driver+0x53/0xd0 [ 460.671835] driver_unregister+0x27/0x50 [ 460.671838] pci_unregister_driver+0x25/0xa0 [ 460.671865] i915_exit+0x1a/0xb1 [i915] [ 460.671869] SyS_delete_module+0x193/0x1e0 [ 460.671875] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 460.671878] RIP: 0033:0x7f9529bbb687 [ 460.671881] RSP: 002b:00007ffd45d2b628 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 [ 460.671885] RAX: ffffffffffffffda RBX: ffffffff8147ce13 RCX: 00007f9529bbb687 [ 460.671887] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 0000000000a6a748 [ 460.671890] RBP: ffffc90000657f88 R08: 0000000000000000 R09: 00007f9529c07ea0 [ 460.671892] R10: 0000000000a6a6e0 R11: 0000000000000246 R12: 0000000000000000 [ 460.671894] R13: 00007ffd45d2b800 R14: 0000000000000000 R15: 0000000000000000 [ 460.671900] ? __this_cpu_preempt_check+0x13/0x20 [ 460.671906] ---[ end trace 15d57680f6e3af9d ]--- [ 462.018190] [drm:stop_ring [i915]] *ERROR* rcs0 : timed out trying to stop ring [ 462.066806] hpet1: lost 2 rtc interrupts [ 463.069878] [drm:stop_ring [i915]] *ERROR* rcs0 : timed out trying to stop ring [ 463.069984] [drm:init_ring_common [i915]] *ERROR* failed to set rcs0 head to zero ctl 0001f401 head 0260c808 tail 0000c870 start 00004000 [ 463.070061] [drm:i915_gem_init [i915]] *ERROR* Failed to initialize GPU, declaring it wedged Full logs: https://intel-gfx-ci.01.org/CI/CI_DRM_2587/fi-elk-e7500/igt@drv_module_reload@basic-reload.html
*** Bug 100940 has been marked as a duplicate of this bug. ***
Following the machine wedge in bug 100942, we failed to end up with the hw in an idle state. So this is a consequence of i915_gem_set_wedged() not doing its job correctly - but is a secondary bug, if we fix the GPU reset we might never hit this path again outside of igt/gem_eio.
Adding tag into "Whiteboard" field - ReadyForDev The bug still active *Status is correct *Platform is included *Feature is included *Priority and Severity correctly set
(In reply to Chris Wilson from comment #2) > Following the machine wedge in bug 100942, we failed to end up with the hw > in an idle state. So this is a consequence of i915_gem_set_wedged() not > doing its job correctly - but is a secondary bug, if we fix the GPU reset we > might never hit this path again outside of igt/gem_eio. The reset wedging the gpu should be now covered: https://bugs.freedesktop.org/show_bug.cgi?id=100942 And as this was the secondary effect of that bug, reloading on wedged gpu, we should not take any action. *** This bug has been marked as a duplicate of bug 100942 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.