Summary: | [KVMR][drm:skl_set_power_well] *ERROR* power well 2 disable timeout | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Jason A. Donenfeld <jason> | ||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | normal | ||||||
Priority: | medium | CC: | chris, codronm+circlecode, cpaul37, david.weinehall, imre.deak, intel-gfx-bugs, lyude, mattst88, nicolopiazzalunga | ||||
Version: | DRI git | ||||||
Hardware: | Other | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
i915 platform: | SKL | i915 features: | power/runtime PM | ||||
Attachments: |
|
Description
Jason A. Donenfeld
2016-11-03 08:03:23 UTC
On Thu, Nov 3, 2016 at 4:13 PM, Lyude Paul <cpaul@redhat.com> wrote: > Sounds like some other clock got set up by the BIOS and is keeping the power well on… We seem to clear all request bits so not sure what keeps PW#2 up. Is there any KVM related option in your BIOS setup, could you try with disabling it if so? Also could you give a try to the following to increase the timeout and get some debug info?: diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index 0599408..9c7b841 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -764,9 +764,15 @@ static void skl_set_power_well(struct drm_i915_private *dev_priv, } if (wait_for(!!(I915_READ(HSW_PWR_WELL_DRIVER) & state_mask) == enable, - 1)) - DRM_ERROR("%s %s timeout\n", - power_well->name, enable ? "enable" : "disable"); + 10)) + DRM_ERROR("%s %s timeout (%08x %08x %08x %08x %08x %08x)\n", + power_well->name, enable ? "enable" : "disable", + I915_READ(HSW_PWR_WELL_BIOS), + I915_READ(HSW_PWR_WELL_DRIVER), + I915_READ(HSW_PWR_WELL_KVMR), + I915_READ(HSW_PWR_WELL_DEBUG), + I915_READ(HSW_PWR_WELL_CTL5), + I915_READ(HSW_PWR_WELL_CTL6)); if (check_fuse_status) { if (power_well->id == SKL_DISP_PW_1) { (In reply to Imre Deak from comment #2) > We seem to clear all request bits so not sure what keeps PW#2 up. Is there > any KVM related option in your BIOS setup, could you try with disabling it > if so? Disabling VT-x and VT-d in the BIOS have no effect. However, this error is not observed when restarting the system. Rather, it is only observed when cold booting. > > Also could you give a try to the following to increase the timeout and get > some debug info?: Sure. The added message evaluates to: [ 0.735000] [drm:skl_set_power_well] *ERROR* power well 2 disable timeout (50000005 7000000f c0000000 50000005 050f0000 0000050f) (In reply to Jason A. Donenfeld from comment #3) > (In reply to Imre Deak from comment #2) > > We seem to clear all request bits so not sure what keeps PW#2 up. Is there > > any KVM related option in your BIOS setup, could you try with disabling it > > if so? > > Disabling VT-x and VT-d in the BIOS have no effect. I meant something AMT/KVM specific. If there is no such option in your BIOS or some other bootup AMT setup, I'm not sure what keeps the KVM request active. Normally this is used for remote control. If you don't need that it will only result in increased power consumption (since PW#2 will be forced on). > However, this error is not observed when restarting the system. Rather, it > is only observed when cold booting. Strange, no idea why the request would be cleared by a warm-reset. In any case not much we can do about that, the request bit is only accessible for the Intel ME and can be set/cleared arbitrarily at the back of the driver. > > Also could you give a try to the following to increase the timeout and get > > some debug info?: > > Sure. The added message evaluates to: > > [ 0.735000] [drm:skl_set_power_well] *ERROR* power well 2 disable timeout > (50000005 7000000f c0000000 50000005 050f0000 0000050f) Thanks. So the KVM request bit is set and we can't clear it. In any case this shouldn't cause other issues besides the increased power consumption, so I'll convert the WARN to be a note. Could you still read the following regs after a cold and a subsequent warm reboot? (need intel-gpu-tools): # intel_reg read 0x42314 0x45408 I had this happen on my Thinkpad P50 as well (Jason has the same). # intel_reg read 0x42314 0x45408 (0x00042314): 0x00000000 PWR_WELL_KVM (0x00045408): 0x00000000 after both cold and warm starts. I'm on BIOS 1.35. (In reply to Matt Turner from comment #5) > I had this happen on my Thinkpad P50 as well (Jason has the same). > > # intel_reg read 0x42314 0x45408 > (0x00042314): 0x00000000 > PWR_WELL_KVM (0x00045408): 0x00000000 > > after both cold and warm starts. > > I'm on BIOS 1.35. Hi Matt, thanks for the report. Does the WARN appear only after a cold start and not after a warm start? Could you provide a drm.debug=14 dmesg log after cold start with the change in comment 2? What is the ME firmware version on your machine? It should be visible in the "Platform Information" menu in the BIOS setup. Do you have any AMT features in BIOS enabled or use any redirection like SOL, IDEr? These are questions from AMT people who are trying to figure out why KVM would be active at all. (In reply to Imre Deak from comment #6) ME firmware version is 11.0.16.1000 AMT options in the BIOS are Enabled, Disabled, and Permanently Disabled. It it set to Disabled. I do not see options that seem to correspond to SOL or IDEr. I will collect the log you requested. I have not been able to reproduce the problem in some time. I will continue to watch for it. Jason, can you still reproduce? Created attachment 130966 [details]
journalctl extract
it seems I'm facing the same problem on a thinkpad X1 Carbon 4th gen (type 20FB).
Here are the corresponding journalctl parts. Ask if I can do anything to help understand this bug.
Hello, I also get the error kernel: [drm:skl_set_power_well [i915]] *ERROR* power well 2 disable timeout Does it mean the CPU is kept at an higher power level than necessary? is there a workaround known? (In reply to CircleCode from comment #10) > Created attachment 130966 [details] > journalctl extract > > it seems I'm facing the same problem on a thinkpad X1 Carbon 4th gen (type > 20FB). > Here are the corresponding journalctl parts. Ask if I can do anything to > help understand this bug. Do the things requested in the bug: Apply the patch in comment #2. Run "intel_reg read 0x42314 0x45408" While it was said that 4.11 would fix this error, after installing 4.11 and rebooting into it, I see this in dmesg: [ 0.661407] ------------[ cut here ]------------ [ 0.661413] WARNING: CPU: 2 PID: 1 at drivers/gpu/drm/i915/intel_runtime_pm.c:720 skl_set_power_well+0x5b8/0x5d0 [ 0.661416] Clearing unexpected KVMR request for power well 2 [ 0.661418] Modules linked in: [ 0.661421] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.11.0-gentoo #1 [ 0.661423] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET65W (1.38 ) 02/09/2017 [ 0.661426] Call Trace: [ 0.661429] ? dump_stack+0x46/0x5e [ 0.661433] ? __warn+0xb9/0xe0 [ 0.661436] ? warn_slowpath_fmt+0x46/0x50 [ 0.661439] ? fwtable_read32+0x7e/0x1b0 [ 0.661442] ? skl_set_power_well+0x5b8/0x5d0 [ 0.661444] ? intel_display_power_put+0xca/0x140 [ 0.661447] ? intel_display_set_init_power+0x1f/0x30 [ 0.661456] ? intel_modeset_setup_hw_state+0xafd/0xce0 [ 0.661458] ? fwtable_read8+0x1b0/0x1b0 [ 0.661461] ? intel_modeset_init+0xdbe/0x1830 [ 0.661464] ? intel_i2c_reset+0x39/0x40 [ 0.661467] ? intel_setup_gmbus+0x2e4/0x300 [ 0.661470] ? i915_driver_load+0x938/0x1430 [ 0.661473] ? local_pci_probe+0x38/0x90 [ 0.661476] ? pci_device_probe+0xd4/0x130 [ 0.661479] ? driver_probe_device+0x1ef/0x2d0 [ 0.661482] ? __driver_attach+0x8f/0xa0 [ 0.661485] ? driver_probe_device+0x2d0/0x2d0 [ 0.661487] ? bus_for_each_dev+0x55/0x90 [ 0.661490] ? bus_add_driver+0x115/0x210 [ 0.661493] ? set_debug_rodata+0xc/0xc [ 0.661496] ? driver_register+0x52/0xc0 [ 0.661499] ? mipi_dsi_bus_init+0xc/0xc [ 0.661502] ? do_one_initcall+0x39/0x160 [ 0.661505] ? set_debug_rodata+0xc/0xc [ 0.661508] ? set_debug_rodata+0xc/0xc [ 0.661511] ? kernel_init_freeable+0x158/0x1d4 [ 0.661514] ? rest_init+0x70/0x70 [ 0.661516] ? kernel_init+0x5/0xf0 [ 0.661518] ? ret_from_fork+0x23/0x30 [ 0.661522] ---[ end trace 72d34686d3e43b95 ]--- [ 0.665751] [drm:skl_set_power_well] *ERROR* power well 2 disable timeout Sorry, I'm not very comfortable with manual kernel compilation, so I'm afraid I won't be able to apply asked patch :( Fix merged to drm-tip. (In reply to Imre Deak from comment #15) > Fix merged to drm-tip. Can you please give the name of the commit? Otherwise, the only way to find the fix is to grep the kernel log. (In reply to Matt Turner from comment #16) > (In reply to Imre Deak from comment #15) > > Fix merged to drm-tip. > > Can you please give the name of the commit? Otherwise, the only way to find > the fix is to grep the kernel log. Yes, it's commit 42d9366d41a992631abaa15f5a881ae1235a8203 Author: Imre Deak <imre.deak@intel.com> Date: Thu Jun 29 18:37:01 2017 +0300 drm/i915/gen9+: Don't remove secondary power well requests |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.