Bug 100943 - [BAT][ELK] *ERROR* failed to idle hardware; continuing to unload!
Summary: [BAT][ELK] *ERROR* failed to idle hardware; continuing to unload!
Status: CLOSED DUPLICATE of bug 100942
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: highest critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 100940 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-05-05 07:26 UTC by Martin Peres
Modified: 2017-07-27 16:50 UTC (History)
1 user (show)

See Also:
i915 platform: G45
i915 features: power/Other


Attachments

Description Martin Peres 2017-05-05 07:26:20 UTC
The test igt@drv_module_reload@basic-reload got the GPU wedged on CI_DRM_2587 on fi-elk-e7500.

Relevant trace:

[  460.639411] [drm:i915_driver_unload [i915]] *ERROR* failed to idle hardware; continuing to unload!
[  460.660016] ------------[ cut here ]------------
[  460.660016] WARNING: CPU: 1 PID: 2402 at drivers/gpu/drm/i915/intel_ringbuffer.c:1610 intel_engine_cleanup+0xd8/0xe0 [i915]
[  460.660016] WARN_ON(((dev_priv)->info.gen) > 2 && (dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const i915_reg_t){ .reg = (((engine)->mmio_base)+0x9c) })), true) & (1 << 9)) == 0)
[  460.660016] Modules linked in: vgem i915(-) snd_hda_codec_realtek snd_hda_codec_generic coretemp snd_hda_codec snd_hwdep snd_hda_core e1000e snd_pcm lpc_ich ptp mei_me pps_core mei prime_numbers [last unloaded: snd_hda_intel]
[  460.660016] CPU: 1 PID: 2402 Comm: drv_module_relo Tainted: G     U  W       4.11.0-CI-CI_DRM_2587+ #1
[  460.660016] Hardware name: Hewlett-Packard HP Compaq 8000 Elite CMT PC/3647h, BIOS 786G7 v01.02 10/22/2009
[  460.660016] Call Trace:
[  460.660016]  dump_stack+0x67/0x92
[  460.660016]  __warn+0xc6/0xe0
[  460.660016]  warn_slowpath_fmt+0x46/0x50
[  460.660016]  ? gen2_read32+0x128/0x200 [i915]
[  460.660016]  intel_engine_cleanup+0xd8/0xe0 [i915]
[  460.660016]  i915_gem_cleanup_engines+0x2a/0x40 [i915]
[  460.660016]  i915_gem_fini+0x28/0x90 [i915]
[  460.660016]  i915_driver_unload+0x131/0x190 [i915]
[  460.660016]  i915_pci_remove+0x14/0x20 [i915]
[  460.660016]  pci_device_remove+0x34/0xb0
[  460.660016]  device_release_driver_internal+0x158/0x210
[  460.660016]  driver_detach+0x3b/0x80
[  460.660016]  bus_remove_driver+0x53/0xd0
[  460.660016]  driver_unregister+0x27/0x50
[  460.660016]  pci_unregister_driver+0x25/0xa0
[  460.660016]  i915_exit+0x1a/0xb1 [i915]
[  460.660016]  SyS_delete_module+0x193/0x1e0
[  460.660016]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  460.660016] RIP: 0033:0x7f9529bbb687
[  460.660016] RSP: 002b:00007ffd45d2b628 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[  460.660016] RAX: ffffffffffffffda RBX: ffffffff8147ce13 RCX: 00007f9529bbb687
[  460.660016] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 0000000000a6a748
[  460.660016] RBP: ffffc90000657f88 R08: 0000000000000000 R09: 00007f9529c07ea0
[  460.660016] R10: 0000000000a6a6e0 R11: 0000000000000246 R12: 0000000000000000
[  460.660016] R13: 00007ffd45d2b800 R14: 0000000000000000 R15: 0000000000000000
[  460.660016]  ? __this_cpu_preempt_check+0x13/0x20
[  460.667266] ---[ end trace 15d57680f6e3af9c ]---
[  460.671600] ------------[ cut here ]------------
[  460.671677] WARNING: CPU: 1 PID: 2402 at drivers/gpu/drm/i915/i915_drv.c:560 i915_gem_fini+0x89/0x90 [i915]
[  460.671680] WARN_ON(!list_empty(&dev_priv->context_list))
[  460.671682] Modules linked in: vgem i915(-) snd_hda_codec_realtek snd_hda_codec_generic coretemp snd_hda_codec snd_hwdep snd_hda_core e1000e snd_pcm lpc_ich ptp mei_me pps_core mei prime_numbers [last unloaded: snd_hda_intel]
[  460.671723] CPU: 1 PID: 2402 Comm: drv_module_relo Tainted: G     U  W       4.11.0-CI-CI_DRM_2587+ #1
[  460.671726] Hardware name: Hewlett-Packard HP Compaq 8000 Elite CMT PC/3647h, BIOS 786G7 v01.02 10/22/2009
[  460.671728] Call Trace:
[  460.671734]  dump_stack+0x67/0x92
[  460.671739]  __warn+0xc6/0xe0
[  460.671744]  warn_slowpath_fmt+0x46/0x50
[  460.671749]  ? _rcu_barrier+0x137/0x160
[  460.671771]  i915_gem_fini+0x89/0x90 [i915]
[  460.671792]  i915_driver_unload+0x131/0x190 [i915]
[  460.671814]  i915_pci_remove+0x14/0x20 [i915]
[  460.671819]  pci_device_remove+0x34/0xb0
[  460.671824]  device_release_driver_internal+0x158/0x210
[  460.671828]  driver_detach+0x3b/0x80
[  460.671831]  bus_remove_driver+0x53/0xd0
[  460.671835]  driver_unregister+0x27/0x50
[  460.671838]  pci_unregister_driver+0x25/0xa0
[  460.671865]  i915_exit+0x1a/0xb1 [i915]
[  460.671869]  SyS_delete_module+0x193/0x1e0
[  460.671875]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  460.671878] RIP: 0033:0x7f9529bbb687
[  460.671881] RSP: 002b:00007ffd45d2b628 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[  460.671885] RAX: ffffffffffffffda RBX: ffffffff8147ce13 RCX: 00007f9529bbb687
[  460.671887] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 0000000000a6a748
[  460.671890] RBP: ffffc90000657f88 R08: 0000000000000000 R09: 00007f9529c07ea0
[  460.671892] R10: 0000000000a6a6e0 R11: 0000000000000246 R12: 0000000000000000
[  460.671894] R13: 00007ffd45d2b800 R14: 0000000000000000 R15: 0000000000000000
[  460.671900]  ? __this_cpu_preempt_check+0x13/0x20
[  460.671906] ---[ end trace 15d57680f6e3af9d ]---
[  462.018190] [drm:stop_ring [i915]] *ERROR* rcs0 : timed out trying to stop ring
[  462.066806] hpet1: lost 2 rtc interrupts
[  463.069878] [drm:stop_ring [i915]] *ERROR* rcs0 : timed out trying to stop ring
[  463.069984] [drm:init_ring_common [i915]] *ERROR* failed to set rcs0 head to zero ctl 0001f401 head 0260c808 tail 0000c870 start 00004000
[  463.070061] [drm:i915_gem_init [i915]] *ERROR* Failed to initialize GPU, declaring it wedged

Full logs: https://intel-gfx-ci.01.org/CI/CI_DRM_2587/fi-elk-e7500/igt@drv_module_reload@basic-reload.html
Comment 1 Chris Wilson 2017-05-05 08:46:29 UTC
*** Bug 100940 has been marked as a duplicate of this bug. ***
Comment 2 Chris Wilson 2017-05-05 08:48:29 UTC
Following the machine wedge in bug 100942, we failed to end up with the hw in an idle state. So this is a consequence of i915_gem_set_wedged() not doing its job correctly - but is a secondary bug, if we fix the GPU reset we might never hit this path again outside of igt/gem_eio.
Comment 3 Ricardo 2017-05-09 17:11:15 UTC
Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
Comment 4 Mika Kuoppala 2017-05-19 13:27:42 UTC
(In reply to Chris Wilson from comment #2)
> Following the machine wedge in bug 100942, we failed to end up with the hw
> in an idle state. So this is a consequence of i915_gem_set_wedged() not
> doing its job correctly - but is a secondary bug, if we fix the GPU reset we
> might never hit this path again outside of igt/gem_eio.

The reset wedging the gpu should be now covered:
https://bugs.freedesktop.org/show_bug.cgi?id=100942

And as this was the secondary effect of that bug, reloading on wedged gpu, we should not take any action.

*** This bug has been marked as a duplicate of bug 100942 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.