Bug 65497

Summary:

[pnv] Warnings after resume in i915 (misbehaving bios?)

Product:

DRI

Reporter:

Zoltán Böszörményi <zboszor>

Component:

DRM/Intel

Assignee:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Status:

CLOSED INVALID

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

normal

Priority:

low

CC:

ricardo.vega, sangshuduo

Version:

XOrg git

Hardware:

Other

OS:

All

Whiteboard:

i915 platform:

PNV

i915 features:

display/Other

Attachments:

Description	Flags
Dmesg after resuming	none
Kernel .config	none
Verbose lspci with device names	none
Verbose lspci with vendor:device IDs	none
Dmesg with drm.debug=0x0e	none
Dmesg with drm.debug=0x0e	none
Dmesg with drm.debug=0x0e	none
dmesg after resume on 3.16.3	none
dmesg after resume on 3.16.4	none
dmesg after resuming with 3.18.10	none

Description Zoltán Böszörményi 2013-06-07 08:11:10 UTC

We are working on a prototype Atom-based computer and dmesg shows this warning on boot:

[drm:intel_enable_lvds] *ERROR* timed out waiting for panel to power on

The machine does have LVDS and is working great.

However, after resuming from S3 suspend, these warnings also appear in dmesg:

WARNING: at drivers/gpu/drm/i915/intel_display.c:3927 intel_modeset_check_state+0xbd/0x539()
active connector not linked to encoder

WARNING: at drivers/gpu/drm/i915/intel_display.c:3933 intel_modeset_check_state+0x119/0x539()

WARNING: at drivers/gpu/drm/i915/intel_display.c:7873 intel_modeset_check_state+0x4d3/0x539()
encoder's active_connectors set, but no crtc

WARNING: at drivers/gpu/drm/i915/intel_display.c:7892 intel_modeset_check_state+0x2c8/0x539()
encoder's computed active state doesn't match tracked active state (expected 0, found 1)

Full dmesg is also attached. The machine otherwise works okay with modesetting.

Are the warnings unneeded or is the video chip and the LVDS wired improperly somehow?

Comment 1 Zoltán Böszörményi 2013-06-07 08:11:57 UTC

Created attachment 80459 [details]
Dmesg after resuming

Comment 2 Zoltán Böszörményi 2013-06-07 08:12:17 UTC

Created attachment 80460 [details]
Kernel .config

Comment 3 Zoltán Böszörményi 2013-06-07 08:12:39 UTC

Created attachment 80461 [details]
Verbose lspci with device names

Comment 4 Zoltán Böszörményi 2013-06-07 08:12:56 UTC

Created attachment 80462 [details]
Verbose lspci with vendor:device IDs

Comment 5 Daniel Vetter 2013-06-07 08:33:22 UTC

For the time out warning, can you please check whether it goes away if you increase the timeout:

diff --git a/drivers/gpu/drm/i915/intel_lvds.c b/drivers/gpu/drm/i915/intel_lvds.c
index 10c3d56..7e0ee01 100644
--- a/drivers/gpu/drm/i915/intel_lvds.c
+++ b/drivers/gpu/drm/i915/intel_lvds.c
@@ -200,7 +200,7 @@ static void intel_enable_lvds(struct intel_encoder *encoder)
 
        I915_WRITE(ctl_reg, I915_READ(ctl_reg) | POWER_TARGET_ON);
        POSTING_READ(lvds_encoder->reg);
-       if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 1000))
+       if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 2000))
                DRM_ERROR("timed out waiting for panel to power on\n");
 
        intel_panel_enable_backlight(dev, intel_crtc->pipe);

If that works then most likely the firmware on your board sets up a ridiculously high panel power on delay.

For the WARNs we need dmesg with drm.debug=0xe added to your kernel commandline. They are self-checks of our modeset code to make sure the hw state agrees with our sw tracking. Usually they do indicate real issues, but sometimes there's also a tiny bug in the checker itself.

Comment 6 Chris Wilson 2013-06-07 08:53:05 UTC

The synopsis of the warnings here is that the machine resumes with the bios setting up a pipe unexpectedly - which could also account for our inability to manipulate the LVDS registers. I think that is consistent enough to suggest something is amiss inside the BIOS upon resume -- hopefully the debug logs bear that out.

Comment 7 Zoltán Böszörményi 2013-06-07 10:24:43 UTC

(In reply to comment #5)
> For the time out warning, can you please check whether it goes away if you
> increase the timeout:
> 
> diff --git a/drivers/gpu/drm/i915/intel_lvds.c
> b/drivers/gpu/drm/i915/intel_lvds.c
> index 10c3d56..7e0ee01 100644
> --- a/drivers/gpu/drm/i915/intel_lvds.c
> +++ b/drivers/gpu/drm/i915/intel_lvds.c
> @@ -200,7 +200,7 @@ static void intel_enable_lvds(struct intel_encoder
> *encoder)
>  
>         I915_WRITE(ctl_reg, I915_READ(ctl_reg) | POWER_TARGET_ON);
>         POSTING_READ(lvds_encoder->reg);
> -       if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 1000))
> +       if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 2000))
>                 DRM_ERROR("timed out waiting for panel to power on\n");
>  
>         intel_panel_enable_backlight(dev, intel_crtc->pipe);
> 
> If that works then most likely the firmware on your board sets up a
> ridiculously high panel power on delay.

It seems to be the case, since this error doesn't appear with the patch anymore.

> For the WARNs we need dmesg with drm.debug=0xe added to your kernel
> commandline. They are self-checks of our modeset code to make sure the hw
> state agrees with our sw tracking. Usually they do indicate real issues, but
> sometimes there's also a tiny bug in the checker itself.

The new dmesg will be attached in a minute.

Comment 8 Zoltán Böszörményi 2013-06-07 10:25:11 UTC

Created attachment 80465 [details]
Dmesg with drm.debug=0x0e

Comment 9 Zoltán Böszörményi 2013-06-07 10:36:44 UTC

Created attachment 80468 [details]
Dmesg with drm.debug=0x0e

The previous one was without a suspend/resume cycle, there are two in this one.

Comment 10 Zoltán Böszörményi 2013-06-07 10:40:18 UTC

Created attachment 80469 [details]
Dmesg with drm.debug=0x0e

The third suspend resulted in an immediate resume.

Comment 11 Chris Wilson 2013-06-07 12:17:21 UTC

That the BIOS lit up a pipe, also means that it probably scribbled over quite a few of our data structures in device memory.

Comment 12 Zoltán Böszörményi 2013-06-07 12:54:29 UTC

(In reply to comment #11)
> That the BIOS lit up a pipe, also means that it probably scribbled over
> quite a few of our data structures in device memory.

I see.

Does it mean that my other bug reported at
https://bugzilla.kernel.org/show_bug.cgi?id=59401
is a consequence of this (supposedly BIOS) bug?
The bug is about immediate resuming in about 40% of the suspend attempts.

Also how can it be worked around? Windows (XP and 7) must keep its data
somewhere else otherwise it wouldn't be able to suspend the machine reliably.

Comment 13 Chris Wilson 2013-06-07 13:08:18 UTC

At this moment in time I am very suspicious of that BIOS :)

Comment 14 Chris Wilson 2013-10-02 12:00:01 UTC

Any news?

Comment 15 Zoltán Böszörményi 2013-12-10 18:37:19 UTC

The news is that https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/log/?h=linux-next (up to commit efef6dba52777eac8bf2a866caa2d8d80f4e84b0) makes the suspend/resume cycles reliable on this machine, so it's not conclusive anymore that it's the BIOS that's buggy.

However, the previously seen warnings still appear in the logs after resuming but the i915 driver can recover from that.

Comment 16 Jani Nikula 2014-08-14 13:32:21 UTC

(In reply to comment #15)
> The news is that
> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/log/
> ?h=linux-next (up to commit efef6dba52777eac8bf2a866caa2d8d80f4e84b0) makes
> the suspend/resume cycles reliable on this machine, so it's not conclusive
> anymore that it's the BIOS that's buggy.
> 
> However, the previously seen warnings still appear in the logs after
> resuming but the i915 driver can recover from that.

Do the warnings still appear with recent kernels?

Comment 17 Zoltán Böszörményi 2014-09-23 11:14:30 UTC

Every warnings except the one about the timeout has gone, suspend/resume works after 3.14.0, currently tested with 3.16.3. The panel seems to be a little slow to wake up, the patch in comment 5 is still needed.

Comment 18 Zoltán Böszörményi 2014-09-23 11:31:33 UTC

Spoke too soon, 3.16.3 has all the warnings after resume.

Comment 19 Zoltán Böszörményi 2014-09-23 11:32:29 UTC

Created attachment 106725 [details]
dmesg after resume on 3.16.3

dmesg after resume on 3.16.3

Comment 20 Zoltán Böszörményi 2014-10-06 13:24:04 UTC

Created attachment 107428 [details]
dmesg after resume on 3.16.4

The kernel still has those warnings after resume on 3.16.4.

Comment 21 Jani Nikula 2015-01-29 13:23:01 UTC

Sadly, we don't have any news. Do you, with new kernels?

Comment 22 Jesse Barnes 2015-03-20 21:20:15 UTC

Well hopefully Zoltan figured this one out and it's been fixed so we haven't heard anything.  If not, please re-open after testing against current drm-intel-nightly code.

Comment 23 Zoltán Böszörményi 2015-04-16 10:00:58 UTC

Created attachment 115117 [details]
dmesg after resuming with 3.18.10

3.18.10 still has the problem.

Comment 24 Zoltán Böszörményi 2015-04-16 10:01:56 UTC

I will try kernel 4.0 soon, currently it is at the same level as drm-intel-nightly.

Comment 25 Zoltán Böszörményi 2015-06-22 16:45:33 UTC

I have tried kernel 4.1.0 that was just released and the warnings are still there.
For now, I use the Fedora kernel patch called
  drm-i915-Disable-verbose-state-checks.patch
to tone down these messages from scary backtraces to one-line DRM_ERRORs.

Comment 26 Jani Nikula 2016-01-18 12:58:54 UTC

Please try kernel v4.4.

Comment 27 Ricardo 2017-02-21 01:11:55 UTC

Request to test with newer kernel was made - one year ago.. Will leave bug open for 30 days if not will be closed due to lack of information

Comment 28 Zoltán Böszörményi 2017-02-22 08:52:38 UTC

I have carried the same patch adapted to different kernels for over one and a half years.

Now I tested kernel 4.10 without the patch and the warning doesn't occur after resume.

Comment 29 yann 2017-02-22 08:56:24 UTC

(In reply to Zoltán Böszörményi from comment #28)
> I have carried the same patch adapted to different kernels for over one and
> a half years.
> 
> Now I tested kernel 4.10 without the patch and the warning doesn't occur
> after resume.

Thanks Zoltán Böszörményi for your confirmation. Closing then the ticket.

Comment 30 Zoltán Böszörményi 2017-02-23 12:22:13 UTC

The previous test was wrong, the patch to increase the panel power on timeout from 1 second to 2 seconds was still applied accidentally.

Since the other patch to tone down the DRM messages to one liners, I get this at boot:

[    8.897390] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on

and later when resuming from S3 suspend:

[  496.056118] Suspended for 8.704 seconds
[  496.056118] Enabling non-boot CPUs ...
[  496.059143] x86: Booting SMP configuration:
[  496.059145] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  496.028018] Initializing CPU#1
[  496.061759]  cache: parent cpu1 should not be sleeping
[  496.062227] CPU1 is up
[  496.066129] smpboot: Booting Node 0 Processor 2 APIC 0x1
[  496.037243] Initializing CPU#2
[  496.068754]  cache: parent cpu2 should not be sleeping
[  496.069177] CPU2 is up
[  496.078180] smpboot: Booting Node 0 Processor 3 APIC 0x3
[  496.049174] Initializing CPU#3
[  496.080774]  cache: parent cpu3 should not be sleeping
[  496.081222] CPU3 is up
[  496.083107] ACPI: Waking up from system sleep state S3
[  496.084549] uhci_hcd 0000:00:1d.0: System wakeup disabled by ACPI
[  496.084650] uhci_hcd 0000:00:1d.1: System wakeup disabled by ACPI
[  496.084727] uhci_hcd 0000:00:1d.2: System wakeup disabled by ACPI
[  496.084790] uhci_hcd 0000:00:1d.3: System wakeup disabled by ACPI
[  496.096184] ehci-pci 0000:00:1d.7: System wakeup disabled by ACPI
[  496.096479] PM: noirq resume of devices complete after 12.407 msecs
[  496.097219] PM: early resume of devices complete after 0.694 msecs
[  496.098609] usb usb2: root hub lost power or was reset
[  496.098612] usb usb4: root hub lost power or was reset
[  496.098618] usb usb3: root hub lost power or was reset
[  496.098681] usb usb5: root hub lost power or was reset
[  496.098700] rtc_cmos 00:01: System wakeup disabled by ACPI
[  496.098723] pcieport 0000:00:1c.1: System wakeup disabled by ACPI
[  496.099209] i8042 kbd 00:02: System wakeup disabled by ACPI
[  496.101167] sd 0:0:0:0: [sda] Starting disk
[  496.104216] serial 00:03: activated
[  496.108043] serial 00:04: activated
[  496.110331] serial 00:05: activated
[  496.112595] serial 00:06: activated
[  496.114970] serial 00:07: activated
[  496.219414] r8169 0000:02:00.0 enp2s0: link down
[  496.404549] ata2: SATA link down (SStatus 0 SControl 300)
[  496.560066] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[  496.561125] ata1.00: configured for UDMA/66
[  496.571065] usb 2-2: reset low-speed USB device number 2 using uhci_hcd
[  497.173783] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on
[  497.175153] PM: resume of devices complete after 1077.922 msecs
[  497.175676] PM: Finishing wakeup.
[  497.175679] Restarting tasks ... done.

So, we still need to raise the panel power on timeout.

Comment 31 Florent Flament 2017-03-04 17:32:21 UTC

Hi,

I have a similar 'error' message in dmesg at startup on my laptop:

[    2.585176] psmouse serio1: synaptics: Touchpad model: 1, fw: 7.5, id: 0x1e0b1, caps: 0xf00473/0x240000/0xa2400/0x0, board id: 1514, fw id: 1008586
[    2.617552] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input7
[    2.931350] clocksource: Switched to clocksource tsc
[    3.348561] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on
[    3.412467] Console: switching to colour frame buffer device 170x48
[    3.436206] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    3.708987] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)

Additional information:

$ uname -a
Linux toshnux 4.9.10-200.fc25.x86_64 #1 SMP Wed Feb 15 23:28:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Laptop model: toshiba satellite z930

Apart from this error message at startup, everything seems to be working fine. I'll try to test the latest Linux Kernel without, then with the increased timeout as described in comment #5.

Comment 32 Elizabeth 2017-08-04 19:10:18 UTC

(In reply to Florent Flament from comment #31)
> Apart from this error message at startup, everything seems to be working
> fine. I'll try to test the latest Linux Kernel without, then with the
> increased timeout as described in comment #5.
Hello Florent,
Still the same with latest kernel https://www.kernel.org/? 
Since this case is some specific hardware, could you please file a new bug with HW and SW information, and dmesg with drm.debug=0xe with latest kernel for your case?
Thanks.

Comment 33 Elizabeth 2017-08-04 19:11:53 UTC

(In reply to Zoltán Böszörményi from comment #30)
> The previous test was wrong, the patch to increase the panel power on
> timeout from 1 second to 2 seconds was still applied accidentally.
Hello Zoltan, 
Any news with the latest kernels/drm-tip?
Thank you.

Comment 34 Elizabeth 2017-12-05 18:21:31 UTC

(In reply to Elizabeth from comment #33)
> (In reply to Zoltán Böszörményi from comment #30)
> > The previous test was wrong, the patch to increase the panel power on
> > timeout from 1 second to 2 seconds was still applied accidentally.
> Hello Zoltan, 
> Any news with the latest kernels/drm-tip?
> Thank you.
Hello Zoltan, this bug seems to be about two issues, first the timeout, that as per comment #5 to #13 points to BIOS. And as it seems in comment #30 the issue may exist while BIOS keep the same. So this would be taken as a BIOS problem instead of kernel, at least while proved otherwise.
And the second, the warns messages. Do this messages keep constant with latest kernel? As we don't have a way to reproduce, if we don't get feedback from reporters, it is assumed to be fixed or non-reproducible upstream. Hopefully the problem will be fixed already over 4.14 and up. Can you confirm? Thank you.

Comment 35 Elizabeth 2018-01-18 21:36:11 UTC

Due to lack of update from reporter, I'm closing the current bug. If warn messages keep being an issue in newest kernel versions, please file a new bug to give a fresh follow up to this issue. Thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.