Bug 50227 - [INTEL]Resume from suspend leaves me with black screen or a screen of the desktop before it suspended (though the mouse still moves/changes cursor)
[INTEL]Resume from suspend leaves me with black screen or a screen of the des...
Status: RESOLVED FIXED
Product: DRI
Classification: Unclassified
Component: DRM/Intel
unspecified
x86-64 (AMD64) Linux (All)
: medium major
Assigned To: Daniel Vetter
https://bugs.launchpad.net/ubuntu/+so...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-22 12:04 UTC by brettcornwall
Modified: 2012-10-07 09:35 UTC (History)
7 users (show)

See Also:


Attachments
Xorg.0.log (41.84 KB, text/plain)
2012-05-22 20:40 UTC, brettcornwall
no flags Details
dmesg with drm.debug=6 (245.72 KB, text/plain)
2012-05-22 20:41 UTC, brettcornwall
no flags Details
dmesg-drm.debug=6 with git version (115.73 KB, text/plain)
2012-06-03 20:12 UTC, brettcornwall
no flags Details
xorg.0.log with git version (30.63 KB, text/plain)
2012-06-03 20:12 UTC, brettcornwall
no flags Details
drm.debug=6 After patches applied (245.95 KB, text/plain)
2012-09-02 21:16 UTC, brettcornwall
no flags Details
Xorg.0.log After patches applied (46.91 KB, text/plain)
2012-09-02 21:16 UTC, brettcornwall
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description brettcornwall 2012-05-22 12:04:23 UTC
This problem occured after upgrading from Ubuntu 11.10 (2:2.15.901-1ubuntu2) to Ubuntu 12.04 (2:2.17.0-1ubuntu4). We have attempted to update to 2.19 but it still occurs.

Sometimes upon resume, all I see is a black screen and the cursor. the mouse and keyboard respond (the mouse moves) but nothing changes. Switching to console and back doesn't fix it. Killing Compiz also does not fix the issue. Further, if one has password lock disabled on resume the user will be stuck with a frozen display of whatever was last shown on the desktop before suspend. The mouse cursor still changes as you hover over various elements but the display is frozen solid. One can switch to another TTY via ctrl+alt+F1 but killing the X session is the only way to get back to the desktop.

This is reproducible in metacity, not just compiz. Trying the 3.4 kernel also did not fix the issue. All of those affected are Intel-based machines.

BTW, I haven nothing connected to the laptop when I resume and before suspend. This is a very annoying issue because the only way to recover is to kill the X session, which means all the open files will be lost.

The launchpad page in which this is documented can be found here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/966744

Thank you for your time. Please let me know how it can be debugged further (I understand that this is likely not enough information)
Comment 1 Chris Wilson 2012-05-22 12:15:11 UTC
This sounds like the Xserver damage bug. First install all the driver packages from ppa:xorg-edgers. Then try and reproduce and attach a drm.debug=6 dmesg and Xorg.log from across the suspend and resume.
Comment 2 brettcornwall 2012-05-22 20:40:38 UTC
Created attachment 61993 [details]
Xorg.0.log

I got the bug to occur after upgrading and enabling debugging. Here is my dmesg output and xorg log.
Comment 3 brettcornwall 2012-05-22 20:41:39 UTC
Created attachment 61994 [details]
dmesg with drm.debug=6
Comment 4 Chris Wilson 2012-05-23 04:41:39 UTC
We encounter one warning there that is suspicious:

[   304.920] (WW) intel(0): flip queue failed: Invalid argument
[   304.920] (WW) intel(0): Page flip failed: Invalid argument

and nothing that corresponds with it in the dmesg. To hit EINVAL suggests that the display was disabled, which given that it is an LVDS panel is quite, quite bizarre. There is a patch in xf86-video-intel.git (with --enable-sna) that should behave better in such circumstances. Can you try building from git with SNA enabled and see if the bug reoccurs?
Comment 5 Chris Wilson 2012-05-25 08:40:33 UTC
ppa:xorg-edgers has an updated version of UXA that should help with one of the issues you encountered across resume. How does it fare under your testing?
Comment 6 brettcornwall 2012-05-28 11:34:42 UTC
Apologies with my reticence. I have been away from a machine in which I can test on. I shall test with the PPA once more.
Comment 7 brettcornwall 2012-06-01 06:38:46 UTC
While I seemed to have success, there are two other people on the bug report that have reported the issue still exists.
Comment 8 brettcornwall 2012-06-01 09:21:52 UTC
Annoyingly, after those few days of testing, I, too had the session freeze up on me again. So three people confirmed that it didn't work :)
Comment 9 Chris Wilson 2012-06-01 09:33:15 UTC
Don't forget to attach debugging info from the recent freeze so that we can be sure that you are still hitting the same issue every time.
Comment 10 brettcornwall 2012-06-03 20:10:52 UTC
Finally, Finally got it to reproduce again under the PPA:

test@test-Aspire-5734Z:~$ apt-cache policy xserver-xorg-video-intel
xserver-xorg-video-intel:
  Installed: 2:2.19.0+git20120530.cf5b3e2e-0ubuntu0sarvatt~precise
  Candidate: 2:2.19.0+git20120530.cf5b3e2e-0ubuntu0sarvatt~precise
  Version table:
 *** 2:2.19.0+git20120530.cf5b3e2e-0ubuntu0sarvatt~precise 0
        500 http://ppa.launchpad.net/xorg-edgers/ppa/ubuntu/ precise/main amd64 

I'll attach dmesg and xorg.0.log again.
Comment 11 brettcornwall 2012-06-03 20:12:30 UTC
Created attachment 62482 [details]
dmesg-drm.debug=6 with git version
Comment 12 brettcornwall 2012-06-03 20:12:59 UTC
Created attachment 62483 [details]
xorg.0.log with git version
Comment 13 brettcornwall 2012-06-16 20:16:07 UTC
Has this been sufficient information?
Comment 14 brettcornwall 2012-08-03 04:04:44 UTC
There are some stacktraces of compiz in the launchpad report, which appears to be the package that fuses the powder keg, that may detail why the driver isn't being particularly kind to compiz.
Comment 15 brettcornwall 2012-08-09 03:31:59 UTC
One user is reporting that the recent updates to the xorg-edgers PPA has been fine with his machine for the last few weeks. Are there any special commits to pay attention to that may have resolved this issue?
Comment 16 Chris Wilson 2012-08-23 18:03:25 UTC
The cause of the EINVAL is an attempt to pageflip with the pipe disabled due to DPMS off. This should be fixed by:

commit c4eb5528a456b65c673f7c984d14a622ac67cdca
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jun 5 16:04:16 2012 +0100

    uxa: Check for DPMS off before scheduling a WAIT_ON_EVENT
    
    Regression from commit 3f3bde4f0c72f6f31aae322bcdc20b95eade6631
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Thu May 24 11:58:46 2012 +0100
    
        uxa: Only consider an output valid if the kernel reports it attached
    
    When backporting from SNA, a key difference that UXA does not track DPMS
    state in its enabled flag and that a DPMS off CRTC is still bound to the
    fb. So we do need to rescan the outputs and check that we have a
    connector enabled *and* the pipe is running prior to emitting a scanline
    wait.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=50668
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 17 brettcornwall 2012-08-26 01:29:48 UTC
Hi, Chris - I'm afraid we've had a few confirmations that applying these fixes didn't fix the issue. The same problem remains unabated.

Would you like yet another dmesg output with the patches applied?
Comment 18 Chris Wilson 2012-09-02 20:26:29 UTC
Yes, I think we need a fresh set of debug logs with all the known fixes applied.
Comment 19 brettcornwall 2012-09-02 21:16:21 UTC
Created attachment 66517 [details]
drm.debug=6 After patches applied

Here you go.
Comment 20 brettcornwall 2012-09-02 21:16:51 UTC
Created attachment 66518 [details]
Xorg.0.log After patches applied
Comment 21 Chris Wilson 2012-09-23 10:35:13 UTC
Hmm, the X.log indicates 2.17, there were a few related fixes as well, can you please install a 2.20.8 from your distrobution updates?
Comment 22 Timo Aaltonen 2012-09-25 14:37:54 UTC
I have a couple of users on ubuntu 12.10 (2.20.8) that also tried 3.6rc7, and the hang happens every time the screen is closed or screensaver kicks in. I can't seem to be able to reproduce it myself though..
Comment 23 Chris Wilson 2012-09-25 14:45:11 UTC
Given the batch submit immediately after the vsync'ed copy, I can't see what else userspace can do to prevent the WAIT_FOR_EVENT hang...
Comment 24 Chris Wilson 2012-09-25 14:46:51 UTC
Hmm, there is some similarity here between this and bug 51616 if in both cases the kernel reports the pipe as active, but in reality it is disabled.
Comment 25 brettcornwall 2012-10-06 19:06:05 UTC
Actually, someone built a package including some other recent fixes and it looks like the problem has been resolved for a number of people. You can check out the LP report for specifics (sorry, in a bit of a rush).

Regardless, I've installed the package from his PPA and have been using normal lid-closing suspend (had been switching to TTY1, logging in as root, then using pm-suspend for the past few months).

I haven't had a single issue since. So unless there's any claiming otherwise, this bug is likely fixed.
Comment 26 Chris Wilson 2012-10-07 09:35:03 UTC
Closing as the UXA DPMS off vs pageflip race.