Summary: | [pnv] Warnings after resume in i915 (misbehaving bios?) | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Zoltán Böszörményi <zboszor> | ||||||||||||||||||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||||||||
Status: | CLOSED INVALID | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||||||||
Severity: | normal | ||||||||||||||||||||||||
Priority: | low | CC: | ricardo.vega, sangshuduo | ||||||||||||||||||||||
Version: | XOrg git | ||||||||||||||||||||||||
Hardware: | Other | ||||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||
i915 platform: | PNV | i915 features: | display/Other | ||||||||||||||||||||||
Attachments: |
|
Description
Zoltán Böszörményi
2013-06-07 08:11:10 UTC
Created attachment 80459 [details]
Dmesg after resuming
Created attachment 80460 [details]
Kernel .config
Created attachment 80461 [details]
Verbose lspci with device names
Created attachment 80462 [details]
Verbose lspci with vendor:device IDs
For the time out warning, can you please check whether it goes away if you increase the timeout: diff --git a/drivers/gpu/drm/i915/intel_lvds.c b/drivers/gpu/drm/i915/intel_lvds.c index 10c3d56..7e0ee01 100644 --- a/drivers/gpu/drm/i915/intel_lvds.c +++ b/drivers/gpu/drm/i915/intel_lvds.c @@ -200,7 +200,7 @@ static void intel_enable_lvds(struct intel_encoder *encoder) I915_WRITE(ctl_reg, I915_READ(ctl_reg) | POWER_TARGET_ON); POSTING_READ(lvds_encoder->reg); - if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 1000)) + if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 2000)) DRM_ERROR("timed out waiting for panel to power on\n"); intel_panel_enable_backlight(dev, intel_crtc->pipe); If that works then most likely the firmware on your board sets up a ridiculously high panel power on delay. For the WARNs we need dmesg with drm.debug=0xe added to your kernel commandline. They are self-checks of our modeset code to make sure the hw state agrees with our sw tracking. Usually they do indicate real issues, but sometimes there's also a tiny bug in the checker itself. The synopsis of the warnings here is that the machine resumes with the bios setting up a pipe unexpectedly - which could also account for our inability to manipulate the LVDS registers. I think that is consistent enough to suggest something is amiss inside the BIOS upon resume -- hopefully the debug logs bear that out. (In reply to comment #5) > For the time out warning, can you please check whether it goes away if you > increase the timeout: > > diff --git a/drivers/gpu/drm/i915/intel_lvds.c > b/drivers/gpu/drm/i915/intel_lvds.c > index 10c3d56..7e0ee01 100644 > --- a/drivers/gpu/drm/i915/intel_lvds.c > +++ b/drivers/gpu/drm/i915/intel_lvds.c > @@ -200,7 +200,7 @@ static void intel_enable_lvds(struct intel_encoder > *encoder) > > I915_WRITE(ctl_reg, I915_READ(ctl_reg) | POWER_TARGET_ON); > POSTING_READ(lvds_encoder->reg); > - if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 1000)) > + if (wait_for((I915_READ(stat_reg) & PP_ON) != 0, 2000)) > DRM_ERROR("timed out waiting for panel to power on\n"); > > intel_panel_enable_backlight(dev, intel_crtc->pipe); > > If that works then most likely the firmware on your board sets up a > ridiculously high panel power on delay. It seems to be the case, since this error doesn't appear with the patch anymore. > For the WARNs we need dmesg with drm.debug=0xe added to your kernel > commandline. They are self-checks of our modeset code to make sure the hw > state agrees with our sw tracking. Usually they do indicate real issues, but > sometimes there's also a tiny bug in the checker itself. The new dmesg will be attached in a minute. Created attachment 80465 [details]
Dmesg with drm.debug=0x0e
Created attachment 80468 [details]
Dmesg with drm.debug=0x0e
The previous one was without a suspend/resume cycle, there are two in this one.
Created attachment 80469 [details]
Dmesg with drm.debug=0x0e
The third suspend resulted in an immediate resume.
That the BIOS lit up a pipe, also means that it probably scribbled over quite a few of our data structures in device memory. (In reply to comment #11) > That the BIOS lit up a pipe, also means that it probably scribbled over > quite a few of our data structures in device memory. I see. Does it mean that my other bug reported at https://bugzilla.kernel.org/show_bug.cgi?id=59401 is a consequence of this (supposedly BIOS) bug? The bug is about immediate resuming in about 40% of the suspend attempts. Also how can it be worked around? Windows (XP and 7) must keep its data somewhere else otherwise it wouldn't be able to suspend the machine reliably. At this moment in time I am very suspicious of that BIOS :) Any news? The news is that https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/log/?h=linux-next (up to commit efef6dba52777eac8bf2a866caa2d8d80f4e84b0) makes the suspend/resume cycles reliable on this machine, so it's not conclusive anymore that it's the BIOS that's buggy. However, the previously seen warnings still appear in the logs after resuming but the i915 driver can recover from that. (In reply to comment #15) > The news is that > https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/log/ > ?h=linux-next (up to commit efef6dba52777eac8bf2a866caa2d8d80f4e84b0) makes > the suspend/resume cycles reliable on this machine, so it's not conclusive > anymore that it's the BIOS that's buggy. > > However, the previously seen warnings still appear in the logs after > resuming but the i915 driver can recover from that. Do the warnings still appear with recent kernels? Every warnings except the one about the timeout has gone, suspend/resume works after 3.14.0, currently tested with 3.16.3. The panel seems to be a little slow to wake up, the patch in comment 5 is still needed. Spoke too soon, 3.16.3 has all the warnings after resume. Created attachment 106725 [details]
dmesg after resume on 3.16.3
dmesg after resume on 3.16.3
Created attachment 107428 [details]
dmesg after resume on 3.16.4
The kernel still has those warnings after resume on 3.16.4.
Sadly, we don't have any news. Do you, with new kernels? Well hopefully Zoltan figured this one out and it's been fixed so we haven't heard anything. If not, please re-open after testing against current drm-intel-nightly code. Created attachment 115117 [details]
dmesg after resuming with 3.18.10
3.18.10 still has the problem.
I will try kernel 4.0 soon, currently it is at the same level as drm-intel-nightly. I have tried kernel 4.1.0 that was just released and the warnings are still there. For now, I use the Fedora kernel patch called drm-i915-Disable-verbose-state-checks.patch to tone down these messages from scary backtraces to one-line DRM_ERRORs. Please try kernel v4.4. Request to test with newer kernel was made - one year ago.. Will leave bug open for 30 days if not will be closed due to lack of information I have carried the same patch adapted to different kernels for over one and a half years. Now I tested kernel 4.10 without the patch and the warning doesn't occur after resume. (In reply to Zoltán Böszörményi from comment #28) > I have carried the same patch adapted to different kernels for over one and > a half years. > > Now I tested kernel 4.10 without the patch and the warning doesn't occur > after resume. Thanks Zoltán Böszörményi for your confirmation. Closing then the ticket. The previous test was wrong, the patch to increase the panel power on timeout from 1 second to 2 seconds was still applied accidentally. Since the other patch to tone down the DRM messages to one liners, I get this at boot: [ 8.897390] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on and later when resuming from S3 suspend: [ 496.056118] Suspended for 8.704 seconds [ 496.056118] Enabling non-boot CPUs ... [ 496.059143] x86: Booting SMP configuration: [ 496.059145] smpboot: Booting Node 0 Processor 1 APIC 0x2 [ 496.028018] Initializing CPU#1 [ 496.061759] cache: parent cpu1 should not be sleeping [ 496.062227] CPU1 is up [ 496.066129] smpboot: Booting Node 0 Processor 2 APIC 0x1 [ 496.037243] Initializing CPU#2 [ 496.068754] cache: parent cpu2 should not be sleeping [ 496.069177] CPU2 is up [ 496.078180] smpboot: Booting Node 0 Processor 3 APIC 0x3 [ 496.049174] Initializing CPU#3 [ 496.080774] cache: parent cpu3 should not be sleeping [ 496.081222] CPU3 is up [ 496.083107] ACPI: Waking up from system sleep state S3 [ 496.084549] uhci_hcd 0000:00:1d.0: System wakeup disabled by ACPI [ 496.084650] uhci_hcd 0000:00:1d.1: System wakeup disabled by ACPI [ 496.084727] uhci_hcd 0000:00:1d.2: System wakeup disabled by ACPI [ 496.084790] uhci_hcd 0000:00:1d.3: System wakeup disabled by ACPI [ 496.096184] ehci-pci 0000:00:1d.7: System wakeup disabled by ACPI [ 496.096479] PM: noirq resume of devices complete after 12.407 msecs [ 496.097219] PM: early resume of devices complete after 0.694 msecs [ 496.098609] usb usb2: root hub lost power or was reset [ 496.098612] usb usb4: root hub lost power or was reset [ 496.098618] usb usb3: root hub lost power or was reset [ 496.098681] usb usb5: root hub lost power or was reset [ 496.098700] rtc_cmos 00:01: System wakeup disabled by ACPI [ 496.098723] pcieport 0000:00:1c.1: System wakeup disabled by ACPI [ 496.099209] i8042 kbd 00:02: System wakeup disabled by ACPI [ 496.101167] sd 0:0:0:0: [sda] Starting disk [ 496.104216] serial 00:03: activated [ 496.108043] serial 00:04: activated [ 496.110331] serial 00:05: activated [ 496.112595] serial 00:06: activated [ 496.114970] serial 00:07: activated [ 496.219414] r8169 0000:02:00.0 enp2s0: link down [ 496.404549] ata2: SATA link down (SStatus 0 SControl 300) [ 496.560066] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 496.561125] ata1.00: configured for UDMA/66 [ 496.571065] usb 2-2: reset low-speed USB device number 2 using uhci_hcd [ 497.173783] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on [ 497.175153] PM: resume of devices complete after 1077.922 msecs [ 497.175676] PM: Finishing wakeup. [ 497.175679] Restarting tasks ... done. So, we still need to raise the panel power on timeout. Hi, I have a similar 'error' message in dmesg at startup on my laptop: [ 2.585176] psmouse serio1: synaptics: Touchpad model: 1, fw: 7.5, id: 0x1e0b1, caps: 0xf00473/0x240000/0xa2400/0x0, board id: 1514, fw id: 1008586 [ 2.617552] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input7 [ 2.931350] clocksource: Switched to clocksource tsc [ 3.348561] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on [ 3.412467] Console: switching to colour frame buffer device 170x48 [ 3.436206] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device [ 3.708987] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null) Additional information: $ uname -a Linux toshnux 4.9.10-200.fc25.x86_64 #1 SMP Wed Feb 15 23:28:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Laptop model: toshiba satellite z930 Apart from this error message at startup, everything seems to be working fine. I'll try to test the latest Linux Kernel without, then with the increased timeout as described in comment #5. (In reply to Florent Flament from comment #31) > Apart from this error message at startup, everything seems to be working > fine. I'll try to test the latest Linux Kernel without, then with the > increased timeout as described in comment #5. Hello Florent, Still the same with latest kernel https://www.kernel.org/? Since this case is some specific hardware, could you please file a new bug with HW and SW information, and dmesg with drm.debug=0xe with latest kernel for your case? Thanks. (In reply to Zoltán Böszörményi from comment #30) > The previous test was wrong, the patch to increase the panel power on > timeout from 1 second to 2 seconds was still applied accidentally. Hello Zoltan, Any news with the latest kernels/drm-tip? Thank you. (In reply to Elizabeth from comment #33) > (In reply to Zoltán Böszörményi from comment #30) > > The previous test was wrong, the patch to increase the panel power on > > timeout from 1 second to 2 seconds was still applied accidentally. > Hello Zoltan, > Any news with the latest kernels/drm-tip? > Thank you. Hello Zoltan, this bug seems to be about two issues, first the timeout, that as per comment #5 to #13 points to BIOS. And as it seems in comment #30 the issue may exist while BIOS keep the same. So this would be taken as a BIOS problem instead of kernel, at least while proved otherwise. And the second, the warns messages. Do this messages keep constant with latest kernel? As we don't have a way to reproduce, if we don't get feedback from reporters, it is assumed to be fixed or non-reproducible upstream. Hopefully the problem will be fixed already over 4.14 and up. Can you confirm? Thank you. Due to lack of update from reporter, I'm closing the current bug. If warn messages keep being an issue in newest kernel versions, please file a new bug to give a fresh follow up to this issue. Thanks. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.