Bug 98564 - [KVMR][drm:skl_set_power_well] *ERROR* power well 2 disable timeout
Summary: [KVMR][drm:skl_set_power_well] *ERROR* power well 2 disable timeout
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-03 08:03 UTC by Jason A. Donenfeld
Modified: 2017-07-21 16:52 UTC (History)
9 users (show)

See Also:
i915 platform: SKL
i915 features: power/runtime PM


Attachments
journalctl extract (3.93 KB, text/plain)
2017-04-21 12:11 UTC, CircleCode
no flags Details

Description Jason A. Donenfeld 2016-11-03 08:03:23 UTC
I get the following warning on "Intel(R) Xeon(R) CPU E3-1505M v5" which has a "Intel(R) HD Graphics P530" on kernel 4.8.6:

[    0.699782] [drm] Initialized drm 1.1.0 20060810
[    0.699820] i915 0000:00:02.0: enabling device (0006 -> 0007)
[    0.700556] [drm] Memory usable by graphics device = 4096M
[    0.700559] [drm] VT-d active for gfx access
[    0.700562] [drm] Replacing VGA console driver
[    0.706467] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    0.706471] [drm] Driver supports precise vblank timestamp query.
[    0.709225] [drm] Disabling framebuffer compression (FBC) to prevent screen flicker with VT-d enabled
[    0.709262] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=none:owns=mem
[    0.714707] ACPI: Battery Slot [BAT0] (battery present)
[    0.714989] [drm] Finished loading i915/skl_dmc_ver1_26.bin (v1.26)
[    0.717683] ------------[ cut here ]------------
[    0.717690] WARNING: CPU: 1 PID: 1 at drivers/gpu/drm/i915/intel_runtime_pm.c:667 skl_set_power_well+0x5fc/0x610
[    0.717693] Clearing unexpected KVMR request for power well 2
[    0.717696] Modules linked in:
[    0.717699] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.8.6-gentoo #1
[    0.717702] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET58W (1.31 ) 09/19/2016
[    0.717705]  0000000000000000 ffffffff81307229 ffff881058bb7b68 0000000000000000
[    0.717710]  ffffffff81082309 ffff8810560e0000 ffff881058bb7bb8 0000000080000000
[    0.717715]  000000000000000f 0000000000000000 ffffffff81c89ea0 ffffffff8108237a
[    0.717719] Call Trace:
[    0.717724]  [<ffffffff81307229>] ? dump_stack+0x46/0x5d
[    0.717728]  [<ffffffff81082309>] ? __warn+0xb9/0xe0
[    0.717731]  [<ffffffff8108237a>] ? warn_slowpath_fmt+0x4a/0x50
[    0.717735]  [<ffffffff814225ce>] ? gen8_irq_power_well_pre_disable+0xde/0x100
[    0.717739]  [<ffffffff81434e2c>] ? skl_set_power_well+0x5fc/0x610
[    0.717742]  [<ffffffff814361e5>] ? intel_display_power_put+0x95/0xe0
[    0.717745]  [<ffffffff8143624f>] ? intel_display_set_init_power+0x1f/0x40
[    0.717749]  [<ffffffff81497b68>] ? intel_modeset_setup_hw_state+0xc48/0xd90
[    0.717752]  [<ffffffff81470f00>] ? gen9_write16+0x3a0/0x3a0
[    0.717755]  [<ffffffff8149aa2f>] ? intel_modeset_init+0xccf/0x16b0
[    0.717758]  [<ffffffff814c7379>] ? intel_i2c_reset+0x39/0x40
[    0.717761]  [<ffffffff814c767a>] ? intel_setup_gmbus+0x29a/0x300
[    0.717765]  [<ffffffff8141827c>] ? i915_driver_load+0x77c/0x1430
[    0.717768]  [<ffffffff8133bf0a>] ? local_pci_probe+0x3a/0x90
[    0.717771]  [<ffffffff8133cac1>] ? pci_device_probe+0xd1/0x120
[    0.717775]  [<ffffffff814d8e51>] ? driver_probe_device+0x181/0x2b0
[    0.717778]  [<ffffffff814d900f>] ? __driver_attach+0x8f/0xa0
[    0.717781]  [<ffffffff814d8f80>] ? driver_probe_device+0x2b0/0x2b0
[    0.717784]  [<ffffffff814d7075>] ? bus_for_each_dev+0x55/0x90
[    0.717787]  [<ffffffff814d832f>] ? bus_add_driver+0x19f/0x210
[    0.717791]  [<ffffffff81d23c19>] ? mipi_dsi_bus_init+0xc/0xc
[    0.717794]  [<ffffffff814d97b2>] ? driver_register+0x52/0xc0
[    0.717804]  [<ffffffff810003e3>] ? do_one_initcall+0x33/0x140
[    0.717808]  [<ffffffff81ceafa1>] ? kernel_init_freeable+0x148/0x1c7
[    0.717812]  [<ffffffff81641015>] ? kernel_init+0x5/0x100
[    0.717815]  [<ffffffff8164630f>] ? ret_from_fork+0x1f/0x40
[    0.717818]  [<ffffffff81641010>] ? rest_init+0x70/0x70
[    0.717821] ---[ end trace e87d156c40e5f35e ]---
[    0.721959] [drm:skl_set_power_well] *ERROR* power well 2 disable timeout
[    0.724743] [drm] GuC firmware load skipped
[    0.747350] random: fast init done
[    0.757903] [Firmware Bug]: ACPI(PEGP) defines _DOD but not _DOS
[    0.757922] ACPI: Video Device [PEGP] (multi-head: yes  rom: yes  post: no)
[    0.758073] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:01/LNXVIDEO:00/input/input3
[    0.760831] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[    0.760949] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:01/input/input4
[    0.760967] [drm] Initialized i915 1.6.0 20160711 for 0000:00:02.0 on minor 0
Comment 1 Jason A. Donenfeld 2016-11-03 15:19:58 UTC
On Thu, Nov 3, 2016 at 4:13 PM, Lyude Paul <cpaul@redhat.com> wrote:
> Sounds like some other clock got set up by the BIOS and is keeping the power well on…
Comment 2 Imre Deak 2016-11-03 17:19:23 UTC
We seem to clear all request bits so not sure what keeps PW#2 up. Is there any KVM related option in your BIOS setup, could you try with disabling it if so?

Also could you give a try to the following to increase the timeout and get some debug info?:

diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 0599408..9c7b841 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -764,9 +764,15 @@ static void skl_set_power_well(struct drm_i915_private *dev_priv,
 	}
 
 	if (wait_for(!!(I915_READ(HSW_PWR_WELL_DRIVER) & state_mask) == enable,
-		     1))
-		DRM_ERROR("%s %s timeout\n",
-			  power_well->name, enable ? "enable" : "disable");
+		     10))
+		DRM_ERROR("%s %s timeout (%08x %08x %08x %08x %08x %08x)\n",
+			  power_well->name, enable ? "enable" : "disable",
+			  I915_READ(HSW_PWR_WELL_BIOS),
+			  I915_READ(HSW_PWR_WELL_DRIVER),
+			  I915_READ(HSW_PWR_WELL_KVMR),
+			  I915_READ(HSW_PWR_WELL_DEBUG),
+			  I915_READ(HSW_PWR_WELL_CTL5),
+			  I915_READ(HSW_PWR_WELL_CTL6));
 
 	if (check_fuse_status) {
 		if (power_well->id == SKL_DISP_PW_1) {
Comment 3 Jason A. Donenfeld 2016-11-03 21:24:55 UTC
(In reply to Imre Deak from comment #2)
> We seem to clear all request bits so not sure what keeps PW#2 up. Is there
> any KVM related option in your BIOS setup, could you try with disabling it
> if so?

Disabling VT-x and VT-d in the BIOS have no effect.

However, this error is not observed when restarting the system. Rather, it is only observed when cold booting.

> 
> Also could you give a try to the following to increase the timeout and get
> some debug info?:

Sure. The added message evaluates to:

[    0.735000] [drm:skl_set_power_well] *ERROR* power well 2 disable timeout (50000005 7000000f c0000000 50000005 050f0000 0000050f)
Comment 4 Imre Deak 2016-11-04 22:34:09 UTC
(In reply to Jason A. Donenfeld from comment #3)
> (In reply to Imre Deak from comment #2)
> > We seem to clear all request bits so not sure what keeps PW#2 up. Is there
> > any KVM related option in your BIOS setup, could you try with disabling it
> > if so?
> 
> Disabling VT-x and VT-d in the BIOS have no effect.

I meant something AMT/KVM specific. If there is no such option in your BIOS or some other bootup AMT setup, I'm not sure what keeps the KVM request active. Normally this is used for remote control. If you don't need that it will only result in increased power consumption (since PW#2 will be forced on).

> However, this error is not observed when restarting the system. Rather, it
> is only observed when cold booting.

Strange, no idea why the request would be cleared by a warm-reset. In any case not much we can do about that, the request bit is only accessible for the Intel ME and can be set/cleared arbitrarily at the back of the driver.

> > Also could you give a try to the following to increase the timeout and get
> > some debug info?:
> 
> Sure. The added message evaluates to:
> 
> [    0.735000] [drm:skl_set_power_well] *ERROR* power well 2 disable timeout
> (50000005 7000000f c0000000 50000005 050f0000 0000050f)

Thanks. So the KVM request bit is set and we can't clear it. In any case this shouldn't cause other issues besides the increased power consumption, so I'll convert the WARN to be a note.

Could you still read the following regs after a cold and a subsequent warm reboot? (need intel-gpu-tools):

# intel_reg read 0x42314 0x45408
Comment 5 Matt Turner 2017-02-17 02:12:39 UTC
I had this happen on my Thinkpad P50 as well (Jason has the same).

# intel_reg read 0x42314 0x45408
                                    (0x00042314): 0x00000000
                       PWR_WELL_KVM (0x00045408): 0x00000000

after both cold and warm starts.

I'm on BIOS 1.35.
Comment 6 Imre Deak 2017-02-22 09:51:13 UTC
(In reply to Matt Turner from comment #5)
> I had this happen on my Thinkpad P50 as well (Jason has the same).
> 
> # intel_reg read 0x42314 0x45408
>                                     (0x00042314): 0x00000000
>                        PWR_WELL_KVM (0x00045408): 0x00000000
> 
> after both cold and warm starts.
> 
> I'm on BIOS 1.35.

Hi Matt, thanks for the report.

Does the WARN appear only after a cold start and not after a warm start?
Could you provide a drm.debug=14 dmesg log after cold start with the change in comment 2?
What is the ME firmware version on your machine? It should be visible in the "Platform Information" menu in the BIOS setup.
Do you have any AMT features in BIOS enabled or use any redirection like SOL, IDEr? These are questions from AMT people who are trying to figure out why KVM would be active at all.
Comment 7 Matt Turner 2017-02-22 17:57:22 UTC
(In reply to Imre Deak from comment #6)

ME firmware version is 11.0.16.1000

AMT options in the BIOS are Enabled, Disabled, and Permanently Disabled. It it set to Disabled. I do not see options that seem to correspond to SOL or IDEr.

I will collect the log you requested.
Comment 8 Matt Turner 2017-03-08 21:52:56 UTC
I have not been able to reproduce the problem in some time. I will continue to watch for it.

Jason, can you still reproduce?
Comment 9 Jari Tahvanainen 2017-04-11 07:52:46 UTC
Jason - pinging ... see comment 8.
Comment 10 CircleCode 2017-04-21 12:11:56 UTC
Created attachment 130966 [details]
journalctl extract

it seems I'm facing the same problem on a thinkpad X1 Carbon 4th gen (type 20FB).
Here are the corresponding journalctl parts. Ask if I can do anything to help understand this bug.
Comment 11 nicolo 2017-04-23 02:01:07 UTC
Hello, I also get the error

kernel: [drm:skl_set_power_well [i915]] *ERROR* power well 2 disable timeout

Does it mean the CPU is kept at an higher power level than necessary? is there a workaround known?
Comment 12 Matt Turner 2017-04-23 15:49:21 UTC
(In reply to CircleCode from comment #10)
> Created attachment 130966 [details]
> journalctl extract
> 
> it seems I'm facing the same problem on a thinkpad X1 Carbon 4th gen (type
> 20FB).
> Here are the corresponding journalctl parts. Ask if I can do anything to
> help understand this bug.


Do the things requested in the bug: Apply the patch in comment #2. Run "intel_reg read 0x42314 0x45408"
Comment 13 Jason A. Donenfeld 2017-05-01 19:10:17 UTC
While it was said that 4.11 would fix this error, after installing 4.11 and rebooting into it, I see this in dmesg:

[    0.661407] ------------[ cut here ]------------
[    0.661413] WARNING: CPU: 2 PID: 1 at drivers/gpu/drm/i915/intel_runtime_pm.c:720 skl_set_power_well+0x5b8/0x5d0
[    0.661416] Clearing unexpected KVMR request for power well 2
[    0.661418] Modules linked in:
[    0.661421] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.11.0-gentoo #1
[    0.661423] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET65W (1.38 ) 02/09/2017
[    0.661426] Call Trace:
[    0.661429]  ? dump_stack+0x46/0x5e
[    0.661433]  ? __warn+0xb9/0xe0
[    0.661436]  ? warn_slowpath_fmt+0x46/0x50
[    0.661439]  ? fwtable_read32+0x7e/0x1b0
[    0.661442]  ? skl_set_power_well+0x5b8/0x5d0
[    0.661444]  ? intel_display_power_put+0xca/0x140
[    0.661447]  ? intel_display_set_init_power+0x1f/0x30
[    0.661456]  ? intel_modeset_setup_hw_state+0xafd/0xce0
[    0.661458]  ? fwtable_read8+0x1b0/0x1b0
[    0.661461]  ? intel_modeset_init+0xdbe/0x1830
[    0.661464]  ? intel_i2c_reset+0x39/0x40
[    0.661467]  ? intel_setup_gmbus+0x2e4/0x300
[    0.661470]  ? i915_driver_load+0x938/0x1430
[    0.661473]  ? local_pci_probe+0x38/0x90
[    0.661476]  ? pci_device_probe+0xd4/0x130
[    0.661479]  ? driver_probe_device+0x1ef/0x2d0
[    0.661482]  ? __driver_attach+0x8f/0xa0
[    0.661485]  ? driver_probe_device+0x2d0/0x2d0
[    0.661487]  ? bus_for_each_dev+0x55/0x90
[    0.661490]  ? bus_add_driver+0x115/0x210
[    0.661493]  ? set_debug_rodata+0xc/0xc
[    0.661496]  ? driver_register+0x52/0xc0
[    0.661499]  ? mipi_dsi_bus_init+0xc/0xc
[    0.661502]  ? do_one_initcall+0x39/0x160
[    0.661505]  ? set_debug_rodata+0xc/0xc
[    0.661508]  ? set_debug_rodata+0xc/0xc
[    0.661511]  ? kernel_init_freeable+0x158/0x1d4
[    0.661514]  ? rest_init+0x70/0x70
[    0.661516]  ? kernel_init+0x5/0xf0
[    0.661518]  ? ret_from_fork+0x23/0x30
[    0.661522] ---[ end trace 72d34686d3e43b95 ]---
[    0.665751] [drm:skl_set_power_well] *ERROR* power well 2 disable timeout
Comment 14 CircleCode 2017-05-02 09:19:09 UTC
Sorry, I'm not very comfortable with manual kernel compilation, so I'm afraid I won't be able to apply asked patch :(
Comment 15 Imre Deak 2017-07-07 20:11:10 UTC
Fix merged to drm-tip.
Comment 16 Matt Turner 2017-07-07 20:17:26 UTC
(In reply to Imre Deak from comment #15)
> Fix merged to drm-tip.

Can you please give the name of the commit? Otherwise, the only way to find the fix is to grep the kernel log.
Comment 17 Imre Deak 2017-07-07 20:23:15 UTC
(In reply to Matt Turner from comment #16)
> (In reply to Imre Deak from comment #15)
> > Fix merged to drm-tip.
> 
> Can you please give the name of the commit? Otherwise, the only way to find
> the fix is to grep the kernel log.

Yes, it's
commit 42d9366d41a992631abaa15f5a881ae1235a8203
Author: Imre Deak <imre.deak@intel.com>
Date:   Thu Jun 29 18:37:01 2017 +0300

    drm/i915/gen9+: Don't remove secondary power well requests


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.