Summary: | [KBL][IGT] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:539! | ||
---|---|---|---|
Product: | DRI | Reporter: | Tomi Sarvela <tomi.p.sarvela> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | major | ||
Priority: | medium | CC: | intel-gfx-bugs, matwey.kornilov, tomislav.ivek |
Version: | XOrg git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | KBL | i915 features: | power/runtime PM |
Description
Tomi Sarvela
2017-08-04 14:45:16 UTC
Short story is that we are resuming with requests still in the ELSP: [ 33.779244] [drm:gen8_init_common_ring [i915]] Restarting rcs0:0 from 0x73 We shouldn't be sleeping with ELSP unfinished... Ah, we did go to sleep too early: <3>[ 24.870045] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle <4>[ 33.335633] WARN_ON(!intel_engines_are_idle(dev_priv)) <4>[ 33.335678] ------------[ cut here ]------------ <4>[ 33.335772] WARNING: CPU: 1 PID: 1316 at drivers/gpu/drm/i915/i915_gem.c:4573 i915_gem_suspend+0x112/0x130 [i915] <4>[ 33.335774] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel e1000e snd_hda_codec ptp snd_hwdep snd_hda_core pps_core snd_pcm mei_me mei prime_numbers pinctrl_sunrisepoint pinctrl_intel i2c_hid <4>[ 33.335842] CPU: 1 PID: 1316 Comm: kworker/u8:31 Tainted: G W 4.13.0-rc3-CI-CI_DRM_2919+ #1 <4>[ 33.335846] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017 <4>[ 33.335856] Workqueue: events_unbound async_run_entry_fn <4>[ 33.335863] task: ffff88026f16a840 task.stack: ffffc90001ddc000 <4>[ 33.335951] RIP: 0010:i915_gem_suspend+0x112/0x130 [i915] <4>[ 33.335955] RSP: 0018:ffffc90001ddfc80 EFLAGS: 00010282 <4>[ 33.335962] RAX: 000000000000002a RBX: ffff88026cc30000 RCX: 0000000000000000 <4>[ 33.335966] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810ebd71 <4>[ 33.335969] RBP: ffffc90001ddfc98 R08: 0000000000000001 R09: 0000000000000000 <4>[ 33.335972] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[ 33.335976] R13: ffff88026cc380b8 R14: 0000000000000000 R15: ffffffff81ccf32f <4>[ 33.335981] FS: 0000000000000000(0000) GS:ffff88027ec80000(0000) knlGS:0000000000000000 <4>[ 33.335985] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 33.335989] CR2: 00007f140af8db04 CR3: 0000000003e0f000 CR4: 00000000003406e0 <4>[ 33.335992] Call Trace: <4>[ 33.336066] i915_pm_suspend+0x81/0x1a0 [i915] <4>[ 33.336077] pci_pm_suspend+0x73/0x140 <4>[ 33.336087] dpm_run_callback+0x6a/0x310 <4>[ 33.336094] ? pci_pm_resume+0x90/0x90 <4>[ 33.336102] __device_suspend+0xfd/0x380 <4>[ 33.336108] ? check_preempt_curr+0x87/0xb0 <4>[ 33.336116] ? dpm_watchdog_set+0x60/0x60 <4>[ 33.336126] async_suspend+0x1a/0x90 <4>[ 33.336134] async_run_entry_fn+0x33/0x160 <4>[ 33.336142] process_one_work+0x21f/0x630 <4>[ 33.336152] worker_thread+0x49/0x3b0 <4>[ 33.336161] kthread+0x10f/0x150 <4>[ 33.336167] ? process_one_work+0x630/0x630 <4>[ 33.336172] ? kthread_create_on_node+0x40/0x40 <4>[ 33.336180] ret_from_fork+0x27/0x40 <4>[ 33.336190] Code: 74 16 48 89 df e8 cf fe ff ff e9 59 ff ff ff 48 83 7a 08 00 74 8b 0f 0b 48 c7 c6 68 a1 2a a0 48 c7 c7 df 7b 29 a0 e8 da 36 f3 e0 <0f> ff eb d3 48 c7 c6 9c 7c 29 a0 48 c7 c7 df 7b 29 a0 e8 c3 36 <4>[ 33.336341] ---[ end trace e6fc4c5de8ebe67a ]--- commit f36325f3789c1cf7f0d795ff180cade25ec3a586 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Aug 26 12:09:34 2017 +0100 drm/i915: Clear wedged status upon resume When we wake up from suspend, the device has been powered down and should come back afresh. We should be able to safely remove the wedged status from the previous session and start afresh. commit fc692bd31bc9dd17c7cc59abdb514a58964fc2a7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Aug 26 12:09:35 2017 +0100 drm/i915: Discard the request queue if we fail to sleep before suspend If we fail to clear the outstanding request queue before suspending, mark those requests as lost. will prevent the BUG after resuming, but leave the underlying issue unresolved. At least it is lessening the severity of the bug. Sorry if this is SPAM, not sure if this is related to this bug: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3241/shard-kbl6/dmesg-1508176798_Panic_1.log (In reply to Elizabeth from comment #4) > Sorry if this is SPAM, not sure if this is related to this bug: > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3241/shard-kbl6/dmesg- > 1508176798_Panic_1.log No, that's mmio death on kbl. (Working theory is dmc related.) This here was just suspending too early (marking ourselves idle before the context switch interrupt arrived.) |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.