Bug 65497

Summary: [pnv] Warnings after resume in i915 (misbehaving bios?)
Product: DRI Reporter: Zoltán Böszörményi <zboszor>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED INVALID QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: low CC: ricardo.vega, sangshuduo
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: PNV i915 features: display/Other
Attachments:
Description Flags
Dmesg after resuming
none
Kernel .config
none
Verbose lspci with device names
none
Verbose lspci with vendor:device IDs
none
Dmesg with drm.debug=0x0e
none
Dmesg with drm.debug=0x0e
none
Dmesg with drm.debug=0x0e
none
dmesg after resume on 3.16.3
none
dmesg after resume on 3.16.4
none
dmesg after resuming with 3.18.10 none

Description Zoltán Böszörményi 2013-06-07 08:11:10 UTC
We are working on a prototype Atom-based computer and dmesg shows this warning on boot:

[drm:intel_enable_lvds] *ERROR* timed out waiting for panel to power on

The machine does have LVDS and is working great.

However, after resuming from S3 suspend, these warnings also appear in dmesg:

WARNING: at drivers/gpu/drm/i915/intel_display.c:3927 intel_modeset_check_state+0xbd/0x539()
active connector not linked to encoder

WARNING: at drivers/gpu/drm/i915/intel_display.c:3933 intel_modeset_check_state+0x119/0x539()

WARNING: at drivers/gpu/drm/i915/intel_display.c:7873 intel_modeset_check_state+0x4d3/0x539()
encoder's active_connectors set, but no crtc

WARNING: at drivers/gpu/drm/i915/intel_display.c:7892 intel_modeset_check_state+0x2c8/0x539()
encoder's computed active state doesn't match tracked active state (expected 0, found 1)

Full dmesg is also attached. The machine otherwise works okay with modesetting.

Are the warnings unneeded or is the video chip and the LVDS wired improperly somehow?
Comment 1 Zoltán Böszörményi 2013-06-07 08:11:57 UTC
Created attachment 80459 [details]
Dmesg after resuming
Comment 2 Zoltán Böszörményi 2013-06-07 08:12:17 UTC
Created attachment 80460 [details]
Kernel .config
Comment 3 Zoltán Böszörményi 2013-06-07 08:12:39 UTC
Created attachment 80461 [details]
Verbose lspci with device names
Comment 4 Zoltán Böszörményi 2013-06-07 08:12:56 UTC
Created attachment 80462 [details]
Verbose lspci with vendor:device IDs
Comment 5 Daniel Vetter 2013-06-07 08:33:22 UTC
For the time out warning, can you please check whether it goes away if you increase the timeout:

diff --git a/drivers/gpu/drm/i915/intel_lvds.c b/drivers/gpu/drm/i915/intel_lvds.c
index 10c3d56..7e0ee01 100644
--- a/drivers/gpu/drm/i915/intel_lvds.c
+++ b/drivers/gpu/drm/i915/intel_lvds.c
@@ -200,7 +200,7 @@ static void intel_enable_lvds(struct intel_encoder *encoder)
 
        I915_WRITE(ctl_reg, I915_READ(ctl_reg) | POWER_TARGET_ON);
        POSTING_READ(lvds_encoder->reg);
-       if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 1000))
+       if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 2000))
                DRM_ERROR("timed out waiting for panel to power on\n");
 
        intel_panel_enable_backlight(dev, intel_crtc->pipe);

If that works then most likely the firmware on your board sets up a ridiculously high panel power on delay.

For the WARNs we need dmesg with drm.debug=0xe added to your kernel commandline. They are self-checks of our modeset code to make sure the hw state agrees with our sw tracking. Usually they do indicate real issues, but sometimes there's also a tiny bug in the checker itself.
Comment 6 Chris Wilson 2013-06-07 08:53:05 UTC
The synopsis of the warnings here is that the machine resumes with the bios setting up a pipe unexpectedly - which could also account for our inability to manipulate the LVDS registers. I think that is consistent enough to suggest something is amiss inside the BIOS upon resume -- hopefully the debug logs bear that out.
Comment 7 Zoltán Böszörményi 2013-06-07 10:24:43 UTC
(In reply to comment #5)
> For the time out warning, can you please check whether it goes away if you
> increase the timeout:
> 
> diff --git a/drivers/gpu/drm/i915/intel_lvds.c
> b/drivers/gpu/drm/i915/intel_lvds.c
> index 10c3d56..7e0ee01 100644
> --- a/drivers/gpu/drm/i915/intel_lvds.c
> +++ b/drivers/gpu/drm/i915/intel_lvds.c
> @@ -200,7 +200,7 @@ static void intel_enable_lvds(struct intel_encoder
> *encoder)
>  
>         I915_WRITE(ctl_reg, I915_READ(ctl_reg) | POWER_TARGET_ON);
>         POSTING_READ(lvds_encoder->reg);
> -       if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 1000))
> +       if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 2000))
>                 DRM_ERROR("timed out waiting for panel to power on\n");
>  
>         intel_panel_enable_backlight(dev, intel_crtc->pipe);
> 
> If that works then most likely the firmware on your board sets up a
> ridiculously high panel power on delay.

It seems to be the case, since this error doesn't appear with the patch anymore.

> For the WARNs we need dmesg with drm.debug=0xe added to your kernel
> commandline. They are self-checks of our modeset code to make sure the hw
> state agrees with our sw tracking. Usually they do indicate real issues, but
> sometimes there's also a tiny bug in the checker itself.

The new dmesg will be attached in a minute.
Comment 8 Zoltán Böszörményi 2013-06-07 10:25:11 UTC
Created attachment 80465 [details]
Dmesg with drm.debug=0x0e
Comment 9 Zoltán Böszörményi 2013-06-07 10:36:44 UTC
Created attachment 80468 [details]
Dmesg with drm.debug=0x0e

The previous one was without a suspend/resume cycle, there are two in this one.
Comment 10 Zoltán Böszörményi 2013-06-07 10:40:18 UTC
Created attachment 80469 [details]
Dmesg with drm.debug=0x0e

The third suspend resulted in an immediate resume.
Comment 11 Chris Wilson 2013-06-07 12:17:21 UTC
That the BIOS lit up a pipe, also means that it probably scribbled over quite a few of our data structures in device memory.
Comment 12 Zoltán Böszörményi 2013-06-07 12:54:29 UTC
(In reply to comment #11)
> That the BIOS lit up a pipe, also means that it probably scribbled over
> quite a few of our data structures in device memory.

I see.

Does it mean that my other bug reported at
https://bugzilla.kernel.org/show_bug.cgi?id=59401
is a consequence of this (supposedly BIOS) bug?
The bug is about immediate resuming in about 40% of the suspend attempts.

Also how can it be worked around? Windows (XP and 7) must keep its data
somewhere else otherwise it wouldn't be able to suspend the machine reliably.
Comment 13 Chris Wilson 2013-06-07 13:08:18 UTC
At this moment in time I am very suspicious of that BIOS :)
Comment 14 Chris Wilson 2013-10-02 12:00:01 UTC
Any news?
Comment 15 Zoltán Böszörményi 2013-12-10 18:37:19 UTC
The news is that https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/log/?h=linux-next (up to commit efef6dba52777eac8bf2a866caa2d8d80f4e84b0) makes the suspend/resume cycles reliable on this machine, so it's not conclusive anymore that it's the BIOS that's buggy.

However, the previously seen warnings still appear in the logs after resuming but the i915 driver can recover from that.
Comment 16 Jani Nikula 2014-08-14 13:32:21 UTC
(In reply to comment #15)
> The news is that
> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/log/
> ?h=linux-next (up to commit efef6dba52777eac8bf2a866caa2d8d80f4e84b0) makes
> the suspend/resume cycles reliable on this machine, so it's not conclusive
> anymore that it's the BIOS that's buggy.
> 
> However, the previously seen warnings still appear in the logs after
> resuming but the i915 driver can recover from that.

Do the warnings still appear with recent kernels?
Comment 17 Zoltán Böszörményi 2014-09-23 11:14:30 UTC
Every warnings except the one about the timeout has gone, suspend/resume works after 3.14.0, currently tested with 3.16.3. The panel seems to be a little slow to wake up, the patch in comment 5 is still needed.
Comment 18 Zoltán Böszörményi 2014-09-23 11:31:33 UTC
Spoke too soon, 3.16.3 has all the warnings after resume.
Comment 19 Zoltán Böszörményi 2014-09-23 11:32:29 UTC
Created attachment 106725 [details]
dmesg after resume on 3.16.3

dmesg after resume on 3.16.3
Comment 20 Zoltán Böszörményi 2014-10-06 13:24:04 UTC
Created attachment 107428 [details]
dmesg after resume on 3.16.4

The kernel still has those warnings after resume on 3.16.4.
Comment 21 Jani Nikula 2015-01-29 13:23:01 UTC
Sadly, we don't have any news. Do you, with new kernels?
Comment 22 Jesse Barnes 2015-03-20 21:20:15 UTC
Well hopefully Zoltan figured this one out and it's been fixed so we haven't heard anything.  If not, please re-open after testing against current drm-intel-nightly code.
Comment 23 Zoltán Böszörményi 2015-04-16 10:00:58 UTC
Created attachment 115117 [details]
dmesg after resuming with 3.18.10

3.18.10 still has the problem.
Comment 24 Zoltán Böszörményi 2015-04-16 10:01:56 UTC
I will try kernel 4.0 soon, currently it is at the same level as drm-intel-nightly.
Comment 25 Zoltán Böszörményi 2015-06-22 16:45:33 UTC
I have tried kernel 4.1.0 that was just released and the warnings are still there.
For now, I use the Fedora kernel patch called
  drm-i915-Disable-verbose-state-checks.patch
to tone down these messages from scary backtraces to one-line DRM_ERRORs.
Comment 26 Jani Nikula 2016-01-18 12:58:54 UTC
Please try kernel v4.4.
Comment 27 Ricardo 2017-02-21 01:11:55 UTC
Request to test with newer kernel was made - one year ago.. Will leave bug open for 30 days if not will be closed due to lack of information
Comment 28 Zoltán Böszörményi 2017-02-22 08:52:38 UTC
I have carried the same patch adapted to different kernels for over one and a half years.

Now I tested kernel 4.10 without the patch and the warning doesn't occur after resume.
Comment 29 yann 2017-02-22 08:56:24 UTC
(In reply to Zoltán Böszörményi from comment #28)
> I have carried the same patch adapted to different kernels for over one and
> a half years.
> 
> Now I tested kernel 4.10 without the patch and the warning doesn't occur
> after resume.

Thanks Zoltán Böszörményi for your confirmation. Closing then the ticket.
Comment 30 Zoltán Böszörményi 2017-02-23 12:22:13 UTC
The previous test was wrong, the patch to increase the panel power on timeout from 1 second to 2 seconds was still applied accidentally.

Since the other patch to tone down the DRM messages to one liners, I get this at boot:

[    8.897390] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on

and later when resuming from S3 suspend:

[  496.056118] Suspended for 8.704 seconds
[  496.056118] Enabling non-boot CPUs ...
[  496.059143] x86: Booting SMP configuration:
[  496.059145] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  496.028018] Initializing CPU#1
[  496.061759]  cache: parent cpu1 should not be sleeping
[  496.062227] CPU1 is up
[  496.066129] smpboot: Booting Node 0 Processor 2 APIC 0x1
[  496.037243] Initializing CPU#2
[  496.068754]  cache: parent cpu2 should not be sleeping
[  496.069177] CPU2 is up
[  496.078180] smpboot: Booting Node 0 Processor 3 APIC 0x3
[  496.049174] Initializing CPU#3
[  496.080774]  cache: parent cpu3 should not be sleeping
[  496.081222] CPU3 is up
[  496.083107] ACPI: Waking up from system sleep state S3
[  496.084549] uhci_hcd 0000:00:1d.0: System wakeup disabled by ACPI
[  496.084650] uhci_hcd 0000:00:1d.1: System wakeup disabled by ACPI
[  496.084727] uhci_hcd 0000:00:1d.2: System wakeup disabled by ACPI
[  496.084790] uhci_hcd 0000:00:1d.3: System wakeup disabled by ACPI
[  496.096184] ehci-pci 0000:00:1d.7: System wakeup disabled by ACPI
[  496.096479] PM: noirq resume of devices complete after 12.407 msecs
[  496.097219] PM: early resume of devices complete after 0.694 msecs
[  496.098609] usb usb2: root hub lost power or was reset
[  496.098612] usb usb4: root hub lost power or was reset
[  496.098618] usb usb3: root hub lost power or was reset
[  496.098681] usb usb5: root hub lost power or was reset
[  496.098700] rtc_cmos 00:01: System wakeup disabled by ACPI
[  496.098723] pcieport 0000:00:1c.1: System wakeup disabled by ACPI
[  496.099209] i8042 kbd 00:02: System wakeup disabled by ACPI
[  496.101167] sd 0:0:0:0: [sda] Starting disk
[  496.104216] serial 00:03: activated
[  496.108043] serial 00:04: activated
[  496.110331] serial 00:05: activated
[  496.112595] serial 00:06: activated
[  496.114970] serial 00:07: activated
[  496.219414] r8169 0000:02:00.0 enp2s0: link down
[  496.404549] ata2: SATA link down (SStatus 0 SControl 300)
[  496.560066] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[  496.561125] ata1.00: configured for UDMA/66
[  496.571065] usb 2-2: reset low-speed USB device number 2 using uhci_hcd
[  497.173783] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on
[  497.175153] PM: resume of devices complete after 1077.922 msecs
[  497.175676] PM: Finishing wakeup.
[  497.175679] Restarting tasks ... done.

So, we still need to raise the panel power on timeout.
Comment 31 Florent Flament 2017-03-04 17:32:21 UTC
Hi,

I have a similar 'error' message in dmesg at startup on my laptop:

[    2.585176] psmouse serio1: synaptics: Touchpad model: 1, fw: 7.5, id: 0x1e0b1, caps: 0xf00473/0x240000/0xa2400/0x0, board id: 1514, fw id: 1008586
[    2.617552] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input7
[    2.931350] clocksource: Switched to clocksource tsc
[    3.348561] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on
[    3.412467] Console: switching to colour frame buffer device 170x48
[    3.436206] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    3.708987] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)

Additional information:

$ uname -a
Linux toshnux 4.9.10-200.fc25.x86_64 #1 SMP Wed Feb 15 23:28:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Laptop model: toshiba satellite z930

Apart from this error message at startup, everything seems to be working fine. I'll try to test the latest Linux Kernel without, then with the increased timeout as described in comment #5.
Comment 32 Elizabeth 2017-08-04 19:10:18 UTC
(In reply to Florent Flament from comment #31)
> Apart from this error message at startup, everything seems to be working
> fine. I'll try to test the latest Linux Kernel without, then with the
> increased timeout as described in comment #5.
Hello Florent,
Still the same with latest kernel https://www.kernel.org/? 
Since this case is some specific hardware, could you please file a new bug with HW and SW information, and dmesg with drm.debug=0xe with latest kernel for your case?
Thanks.
Comment 33 Elizabeth 2017-08-04 19:11:53 UTC
(In reply to Zoltán Böszörményi from comment #30)
> The previous test was wrong, the patch to increase the panel power on
> timeout from 1 second to 2 seconds was still applied accidentally.
Hello Zoltan, 
Any news with the latest kernels/drm-tip?
Thank you.
Comment 34 Elizabeth 2017-12-05 18:21:31 UTC
(In reply to Elizabeth from comment #33)
> (In reply to Zoltán Böszörményi from comment #30)
> > The previous test was wrong, the patch to increase the panel power on
> > timeout from 1 second to 2 seconds was still applied accidentally.
> Hello Zoltan, 
> Any news with the latest kernels/drm-tip?
> Thank you.
Hello Zoltan, this bug seems to be about two issues, first the timeout, that as per comment #5 to #13 points to BIOS. And as it seems in comment #30 the issue may exist while BIOS keep the same. So this would be taken as a BIOS problem instead of kernel, at least while proved otherwise.
And the second, the warns messages. Do this messages keep constant with latest kernel? As we don't have a way to reproduce, if we don't get feedback from reporters, it is assumed to be fixed or non-reproducible upstream. Hopefully the problem will be fixed already over 4.14 and up. Can you confirm? Thank you.
Comment 35 Elizabeth 2018-01-18 21:36:11 UTC
Due to lack of update from reporter, I'm closing the current bug. If warn messages keep being an issue in newest kernel versions, please file a new bug to give a fresh follow up to this issue. Thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.