Summary: | Failure to wake monitor after sleep on Haswell graphics driver under 4.4.1 | ||
---|---|---|---|
Product: | DRI | Reporter: | John <da_audiophile> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | brovvnout+bugzilla, da_audiophile, intel-gfx-bugs, lauwers.michael, luke, shashank.sharma, sonika.jindal |
Version: | XOrg git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | HSW | i915 features: | power/suspend-resume |
Attachments: |
Created attachment 121543 [details]
dmesg with debug flag enabled
I neglected to mention that if I ssh into the box after the bug occurs, /sys/class/drm/card0/error returns no errors. # cat /sys/class/drm/card0/error no error state collected I am still experiencing this under 4.4.3. Is the lack of attention to the report due to incomplete data on my part? Is there something else I need to include to get this into the hands of the right people? Thanks! Please attach dmesg again all the way from boot, with drm.debug=14 module parameter set. Created attachment 122185 [details]
fresh boot, monitor off after several min will not wake up
Attached. Thank you! Note - Bug is present in 4.4.4 and under 4.5-rc7. @Jani - Is this related to the bug#94024? Your comment there[1] is echoed in the recent dmesg I posted. % grep Live dmesg_with_debug_flags :( [ 0.582711] [drm:intel_hdmi_detect] Live status not up! [ 0.752726] [drm:intel_hdmi_detect] Live status not up! [ 5.739370] [drm:intel_hdmi_detect] Live status not up! [ 303.456558] [drm:intel_hdmi_detect] Live status not up! [ 303.563225] [drm:intel_hdmi_detect] Live status not up! [ 316.623147] [drm:intel_hdmi_detect] Live status not up! [ 317.973124] [drm:intel_hdmi_detect] Live status not up! [ 318.093122] [drm:intel_hdmi_detect] Live status not up! [ 318.199787] [drm:intel_hdmi_detect] Live status not up! [ 318.383118] [drm:intel_hdmi_detect] Live status not up! 1. https://bugs.freedesktop.org/show_bug.cgi?id=94024#c6 I started to do a git bisect on this. I post the log here. Note that commit 8b417c266b715b3797cd3e65342149372b9ac0c8 does wake up as expected but it also wakes-up spontaneously and I think my choice of "good" for it was probably bad in retrospect. I hope this helps narrow it down. % git bisect log git bisect start # good: [6a13feb9c82803e2b815eca72fa7a9f5561d7861] Linux 4.3 git bisect good 6a13feb9c82803e2b815eca72fa7a9f5561d7861 # bad: [afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc] Linux 4.4 git bisect bad afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc # good: [f66477a0aeb77f97a7de5f791700dadc42f3f792] Merge tag 'clk-for-linus-20151104' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux git bisect good f66477a0aeb77f97a7de5f791700dadc42f3f792 # bad: [56e0464980febfa50432a070261579415c72664e] Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect bad 56e0464980febfa50432a070261579415c72664e # good: [22402cd0af685c1a5d067c87db3051db7fff7709] Merge tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace git bisect good 22402cd0af685c1a5d067c87db3051db7fff7709 # bad: [816d2206f0f9953ca854e4ff1a2749a5cbd62715] Merge tag 'drm-intel-next-fixes-2015-11-06' of git://anongit.freedesktop.org/drm-intel into drm-next git bisect bad 816d2206f0f9953ca854e4ff1a2749a5cbd62715 # bad: [26148bd3c0f1fbd8f2b0dae994f3195316f677db] drm/i915/bxt: Set time interval unit to 0.833us git bisect bad 26148bd3c0f1fbd8f2b0dae994f3195316f677db # bad: [7b24c9a696c1c68eaa471a27bf467e97a9986fa9] drm/i915: don't enable FBC when pixel rate exceeds 95% on HSW/BDW git bisect bad 7b24c9a696c1c68eaa471a27bf467e97a9986fa9 # bad: [85a62bf9d8ef8d533635270ae985281c58e8c974] drm/i915: Also record time difference if vblank evasion fails, v2. git bisect bad 85a62bf9d8ef8d533635270ae985281c58e8c974 # bad: [901c2daf05c8ae6c3f85370fc96b9b6892f5da2d] drm/i915: Put back lane_count into intel_dp and add link_rate too git bisect bad 901c2daf05c8ae6c3f85370fc96b9b6892f5da2d # good: [cf1d58833f07afbb4534b15caa3fd48baa313b2c] drm/i915/bxt: WA for swapped HPD pins in A stepping git bisect good cf1d58833f07afbb4534b15caa3fd48baa313b2c # good: [919f1f55d90b5487a9f38e94842e486509474f09] drm/i915: Expose one LRC function for GuC submission mode git bisect good 919f1f55d90b5487a9f38e94842e486509474f09 # good: [8b417c266b715b3797cd3e65342149372b9ac0c8] drm/i915: Debugfs interface for GuC submission statistics git bisect good 8b417c266b715b3797cd3e65342149372b9ac0c8 # good: [f1afe24f0e736b9d7f2275e2b1504af3fe612f2a] drm/i915: Change SRM, LRM instructions to use correct length git bisect good f1afe24f0e736b9d7f2275e2b1504af3fe612f2a OK... I went back flagged cf1d58833f07afbb4534b15caa3fd48baa313b2c bad since the monitor would wake up on its own... ie without keyboard or mouse input. I am continuing to bisect and will only flag good if the monitor sleeps and wakes as expected, not just wakes as expected. Please disregard my comment #8. I will update accordingly shortly. OK... I finished the bisect: 09120d4e88b13967d44d46280fb74d3ac4ac2f73 is the first bad commit commit 09120d4e88b13967d44d46280fb74d3ac4ac2f73 Author: Michel Thierry <michel.thierry@intel.com> Date: Wed Jul 29 17:23:45 2015 +0100 drm/i915: Remove unnecessary gen8_clamp_pd gen8_clamp_pd clamps to the next page directory boundary, but the macro gen8_for_each_pde already has a check to stop at the page directory boundary. Furthermore, i915_pte_count also restricts to the next page table boundary. v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series. Suggested-by: Akash Goel <akash.goel@intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: "Akash Goel" <akash.goel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> :040000 040000 d5ad200fb348cab8518d6f7efdf9a15dcde161b8 0f9248add7ae0d95a7e17fd1a0c975b577245fd0 M drivers % git bisect log git bisect start # good: [6a13feb9c82803e2b815eca72fa7a9f5561d7861] Linux 4.3 git bisect good 6a13feb9c82803e2b815eca72fa7a9f5561d7861 # bad: [afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc] Linux 4.4 git bisect bad afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc # good: [f66477a0aeb77f97a7de5f791700dadc42f3f792] Merge tag 'clk-for-linus-20151104' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux git bisect good f66477a0aeb77f97a7de5f791700dadc42f3f792 # bad: [56e0464980febfa50432a070261579415c72664e] Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect bad 56e0464980febfa50432a070261579415c72664e # good: [22402cd0af685c1a5d067c87db3051db7fff7709] Merge tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace git bisect good 22402cd0af685c1a5d067c87db3051db7fff7709 # bad: [816d2206f0f9953ca854e4ff1a2749a5cbd62715] Merge tag 'drm-intel-next-fixes-2015-11-06' of git://anongit.freedesktop.org/drm-intel into drm-next git bisect bad 816d2206f0f9953ca854e4ff1a2749a5cbd62715 # bad: [26148bd3c0f1fbd8f2b0dae994f3195316f677db] drm/i915/bxt: Set time interval unit to 0.833us git bisect bad 26148bd3c0f1fbd8f2b0dae994f3195316f677db # bad: [7b24c9a696c1c68eaa471a27bf467e97a9986fa9] drm/i915: don't enable FBC when pixel rate exceeds 95% on HSW/BDW git bisect bad 7b24c9a696c1c68eaa471a27bf467e97a9986fa9 # bad: [85a62bf9d8ef8d533635270ae985281c58e8c974] drm/i915: Also record time difference if vblank evasion fails, v2. git bisect bad 85a62bf9d8ef8d533635270ae985281c58e8c974 # bad: [901c2daf05c8ae6c3f85370fc96b9b6892f5da2d] drm/i915: Put back lane_count into intel_dp and add link_rate too git bisect bad 901c2daf05c8ae6c3f85370fc96b9b6892f5da2d # bad: [cf1d58833f07afbb4534b15caa3fd48baa313b2c] drm/i915/bxt: WA for swapped HPD pins in A stepping git bisect bad cf1d58833f07afbb4534b15caa3fd48baa313b2c # bad: [e1f123257a1f7d3af36a31a0fb2d4c6f40039fed] drm/i915: Expand error state's address width to 64b git bisect bad e1f123257a1f7d3af36a31a0fb2d4c6f40039fed # bad: [81ba8aefd03803a8aec3395d18f7b1dda5942105] drm/i915/gen8: Add PML4 structure git bisect bad 81ba8aefd03803a8aec3395d18f7b1dda5942105 # bad: [d4ec9da0e17cb6a86c0b76c5b254981601d25031] drm/i915/gen8: Abstract PDP usage git bisect bad d4ec9da0e17cb6a86c0b76c5b254981601d25031 # bad: [6ac1850220732f47bc6ae767fa41542009674ad7] drm/i915/gen8: Make pdp allocation more dynamic git bisect bad 6ac1850220732f47bc6ae767fa41542009674ad7 # bad: [09120d4e88b13967d44d46280fb74d3ac4ac2f73] drm/i915: Remove unnecessary gen8_clamp_pd git bisect bad 09120d4e88b13967d44d46280fb74d3ac4ac2f73 # first bad commit: [09120d4e88b13967d44d46280fb74d3ac4ac2f73] drm/i915: Remove unnecessary gen8_clamp_pd I am issuing the following command to force the monitor to go into sleep mode: xset dpms force suspend I am repeating the bisect and am now on commit 85a62bf... Booted into the this kernel, the monitor did not go to sleep 2 out of 5 times but it did wake up correctly 3 of 3 times. Should that be considered a "good" or a "bad" commit? I'm getting the same behavior on Fedora 23 (4.4.4-301.fc23.x86_64, but these issues started with a previous 4.4 kernel), Ivy Bridge with integrated Intel HD Graphics 2500, monitor on HDMI, xorg-x11-drv-intel-2.99.917-19.20151206 from Fedora. Xorg.0.log shows this line while switching from a text VTY to display :0 (EE) intel(0): sna_mode_shutdown_crtc: invalid state found on pipe 0, disabling CRTC:21 I've already tried to wake up the monitor via xset dpms to no avail. I noticed that if the monitor is already powered up while resuming from suspend to memory it will exit from standby mode, otherwise it'll stay "no signal/power save" until X is killed. Brownout - I do not get the same error message you are seeing so your bug might not be exactly the same as this one. Which DE are you using? John, I'm using XFCE4, I landed here from the bug request you opened on the kernel and Arch: https://bugs.archlinux.org/task/47997 https://bugzilla.kernel.org/show_bug.cgi?id=111831 both are mentioning the same error message. OK. Seems like it is the kernel + xfce4. When I tried under gnome and it works as expected. The best I can do to describe this bug is that something in the kernel (?) has changed between 4.3.6 and 4.4 where once the monitor goes to sleep, and it is awakened by a key press or mouse movement, it wakes-up but doesn't find any active video signal, so the monitor happily goes back to sleep. What do people feel about my bisect in comment #10? Does this help to pin-point anything? I can confirm that downgrading to kernel 4.3 solves the issue. I must also add that even turning off the display triggers the problem, as it remains in sleep mode after turning it back on, killing X brings it back to life. I'm getting something very similar. Older 4.2 kernels are fine, but with 4.3 onwards, once the monitor has been turned off by the screensaver (under xfce4), it can't ever be woken up again. Switching to a VT partially fixes it (I can see the VT) but the only way to get X working again is /etc/init.d/xdm restart. This is on Gentoo, with: 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) And an Acer G246HYL 1080p HDMI monitor. Doing some bisecting, I came up with this change: 237ed86 drm/i915: Check live status before reading edid That change modifies the way that intel_hdmi_detect() waits for HDMI hotplug to complete. Before this change I think it just kind of ignored the HPD but now it seems to care whether it came up or not. That change was itself in turn later on fixed up in: f8d03ea drm/i915: increase the tries for HDMI hotplug live status checking But I wonder if perhaps my monitor is just being too slow even for that. I tried playing around with the timeout and seemed to improve things, but not actually fixed them (it now works about 50% of the time vs. previously none of the time). Created attachment 123537 [details] [review] possible fix Created attachment 123538 [details]
hdmi status logging
Attachment 123537 [details] fixes the problem for me. It's a patch against 4.6rc4 and I've been using it today without seeing the usual black screen problem.
I put some tracing in an earlier version of the result of trying to get the HPD status. It shows that it either succeeds immediately, or times out after 90ms. I find that a bit bizarre; I think I don't really understand what the calling code is doing with this function.
John, can you re-test using Luke's fix and confirm that you are not reproducing the issue? I think there may be a couple of problems here. What I'm seeing is that before my fix, once the display went off, it would never come back on. With the fix it will now come on if I haven't left it too long but if I leave it overnight then it will always be dead. The screensaver is set to switch off after an hour (but I've had it as low as 1 minute, and it still fails overnight. Last night I tried with the drm-intel-next branch. Kernel is checked out at: e3d5457 drm/i915: Ignore stale wm register values on resume on ilk-bdw (v2) Still fails to work. I think it may be just enough to ensure live_status is always set to True. The latest version of this function does that for generations < 7; my chipset seems to be generation 7 so I guess that one has broken HDMI HPD as well. i.e. - if (INTEL_INFO(dev_priv)->gen < 7 || IS_IVYBRIDGE(dev_priv)) + if (INTEL_INFO(dev_priv)->gen < 8 || IS_IVYBRIDGE(dev_priv)) Created attachment 124010 [details] [review] Updated fix - force live status to TRUE for generation 7 This updated patch forces live_status to TRUE for generation 7, as well as earlier generations, which was done in 4f4a818, "drm/i915: Fake HDMI live status". This seems to completely fix the problem for me. It looks like this bug has been around for a while: https://lists.freedesktop.org/archives/intel-gfx/2016-February/088400.html > After a kernel bisect, it was found that reverting the following commit resolved this bug: > > commit 237ed86c693d8a8e4db476976aeb30df4deac74b > Author: Sonika Jindal <sonika.jindal at intel.com> > Date: Tue Sep 15 09:44:20 2015 +0530 > > drm/i915: Check live status before reading edid Adding Sonika to comment on what should happen next with this. Adding Shashank. Earlier it was observed that live status was not reliable on older platforms and other issue was with the single link dvi to hdmi cable in certain cases. Is this also with a single link cable? If not then it is worrying that live status is not reliable even on gen7. With gen7 VLV, we had done enough testing and found this solution pretty reliable. I'm using an HDMI/HDMI cable. Just to confirm the generation: It's got one of these: > Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz lspci says: > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) Xorg says: > Integrated Graphics Chipset: Intel(R) HD Graphics 4600 I put in a printk in the live_status code, which thought that it was generation 7. (In reply to Luke from comment #29) > I'm using an HDMI/HDMI cable. > > Just to confirm the generation: > > It's got one of these: > > > Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz > > lspci says: > > > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) > > Xorg says: > > > Integrated Graphics Chipset: Intel(R) HD Graphics 4600 > > I put in a printk in the live_status code, which thought that it was > generation 7. Can you please give the output of: cat /sys/devices/pci0000\:00/0000\:00\:02.0/device $ cat /sys/devices/pci0000\:00/0000\:00\:02.0/device 0x0412 So the panel is taking more time to light up the live status after suspend resume. We can bypass this check for other generations as well. Shashank, what do u suggest? I remember, having a flag called 'is_suspending' or 'is_resuming' from the I915 PM side, which used to indicate that system is resuming, and live_status can be slow. I would suggest to try something like that, instead of removing one more gen from live_status optimization. This problem report isn't about coming out of s2ram or s2disk. It's just about the monitor having been put into sleep, and then coming out of that. (At least, that's my reading of the original report, and it's what's happening for me). I am curious, Luke. I guess the monitor can go to suspend state only when the source goes to suspend state first, or can a monitor go to sleep when there are still flips coming out of source side ? Shashank > I guess the monitor can go to suspend state only when the source goes to suspend state first, or
> can a monitor go to sleep when there are still flips coming out of source side ?
Good question. The monitor is being switched off by xscreensaver, which as far as I can tell uses DPMS, using functions called DPMSEnable() and DPMSDisable(). I don't know if those switch off the pixels or not.
Agree. So this DPMS-off first disables display, and similarly DPMS-on enables display. We have seen similar issue where live_status register takes time to set, which sometimes block detection path. So if we add a flag (crtc->is_resuming) on the resume path, this might help to handle this. Shashank Worth a try. I did try a fix where the timeout was turned up to 90ms, which worked better, but still was not completely reliable. What happens next? Currently it's working perfectly for me with my fix, but I would kind of like it upstreamed so I don't have to keep applying it. This is happening on Intel's generation 9 as well. cpu: i3-6100 lspci: 00:02.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 06) As Luke has done, I've just done the following change to intel_hdmi.c - if (INTEL_INFO(dev_priv)->gen < 7 || IS_IVYBRIDGE(dev_priv)) + if (INTEL_INFO(dev_priv)->gen < 10 || IS_IVYBRIDGE(dev_priv)) This will hopefully fix the problem for me as well but as Luke says, it would be nice if this got added upstream. I should know by tomorrow if this change has fixed it. The behaviour has been like this for me: -monitor wakes up if it's been less than 1 or 2 hours -monitor doesn't wake up the next morning, hence after a period of 8hours+ inactivity I need to go back to a console and restart xdm in order to get the monitor to wake up in X. (using xfce as DM) I'm running on a 4.7 kernel which is the most recent, while the live_status fix is still only valid for generations < 7. Would have saved me some headaches if this had been fixed but ok :) At least I'm glad this is a known issue. Update: it's been working perfectly for 2 days now, after applying the change in my previous post. Bump. Fixed by commit 23f889bdf6ee5cfff012d8b09f6bec920c691696 Author: David Weinehall <david.weinehall@linux.intel.com> Date: Wed Aug 17 15:47:48 2016 +0300 Revert "drm/i915: Check live status before reading edid" in drm-intel-nightly. Please reopen if the problem persists with that commit. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 121542 [details] Xorg.0.log after the monitor will now wake up I am running 4.4.1 on Arch x86_64 and have noticed that this update (coming from 3.4.5) causes the monitor to not wake up from sleep state. I have to `systemctl restart display-manager` in order to revive the Xorg display. Exact symptoms: 1) Leave monitor on and wait for power savings to kick it/LED on monitor is blue. 2) Power savings kick it and LED goes amber and screen goes to sleep. 3) Hit any key to wake up the monitor. LED goes blue, but "no signal" is displayed on the screen. 4) LED goes amber again. This is 100% reproducible (I have confirmed 4 times in a row). This machine is an Intel i7-4790K (Haswell) using the onboard graphics chip connected to the monitor via HDMI. The motherboard is an MSI Z97 Mpower Max AC using the latest BIOS. I am using the distro provided drivers (xf86-video-intel 1:2.99.917+519+g8229390). I am glad to provide additional debug info upon request. The attached dmesg includes the 'log_buf_len=1M' kernel parameter which seems to have eclipsed the initial lines of my dmesg output. I attached my full /var/log/Xorg.log as well. Note that I have tried letting xscreensaver control the power state of the monitor or letting xfce4 power manager do it and I am left with the same result described above. Downgrading to kernel 4.3.5 with no other changes solves the problem.