This problem occured after upgrading from Ubuntu 11.10 (2:2.15.901-1ubuntu2) to Ubuntu 12.04 (2:2.17.0-1ubuntu4). We have attempted to update to 2.19 but it still occurs.
Sometimes upon resume, all I see is a black screen and the cursor. the mouse and keyboard respond (the mouse moves) but nothing changes. Switching to console and back doesn't fix it. Killing Compiz also does not fix the issue. Further, if one has password lock disabled on resume the user will be stuck with a frozen display of whatever was last shown on the desktop before suspend. The mouse cursor still changes as you hover over various elements but the display is frozen solid. One can switch to another TTY via ctrl+alt+F1 but killing the X session is the only way to get back to the desktop.
This is reproducible in metacity, not just compiz. Trying the 3.4 kernel also did not fix the issue. All of those affected are Intel-based machines.
BTW, I haven nothing connected to the laptop when I resume and before suspend. This is a very annoying issue because the only way to recover is to kill the X session, which means all the open files will be lost.
The launchpad page in which this is documented can be found here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/966744
Thank you for your time. Please let me know how it can be debugged further (I understand that this is likely not enough information)
This sounds like the Xserver damage bug. First install all the driver packages from ppa:xorg-edgers. Then try and reproduce and attach a drm.debug=6 dmesg and Xorg.log from across the suspend and resume.
Created attachment 61993 [details]
I got the bug to occur after upgrading and enabling debugging. Here is my dmesg output and xorg log.
Created attachment 61994 [details]
dmesg with drm.debug=6
We encounter one warning there that is suspicious:
[ 304.920] (WW) intel(0): flip queue failed: Invalid argument
[ 304.920] (WW) intel(0): Page flip failed: Invalid argument
and nothing that corresponds with it in the dmesg. To hit EINVAL suggests that the display was disabled, which given that it is an LVDS panel is quite, quite bizarre. There is a patch in xf86-video-intel.git (with --enable-sna) that should behave better in such circumstances. Can you try building from git with SNA enabled and see if the bug reoccurs?
ppa:xorg-edgers has an updated version of UXA that should help with one of the issues you encountered across resume. How does it fare under your testing?
Apologies with my reticence. I have been away from a machine in which I can test on. I shall test with the PPA once more.
While I seemed to have success, there are two other people on the bug report that have reported the issue still exists.
Annoyingly, after those few days of testing, I, too had the session freeze up on me again. So three people confirmed that it didn't work :)
Don't forget to attach debugging info from the recent freeze so that we can be sure that you are still hitting the same issue every time.
Finally, Finally got it to reproduce again under the PPA:
test@test-Aspire-5734Z:~$ apt-cache policy xserver-xorg-video-intel
*** 2:2.19.0+git20120530.cf5b3e2e-0ubuntu0sarvatt~precise 0
500 http://ppa.launchpad.net/xorg-edgers/ppa/ubuntu/ precise/main amd64
I'll attach dmesg and xorg.0.log again.
Created attachment 62482 [details]
dmesg-drm.debug=6 with git version
Created attachment 62483 [details]
xorg.0.log with git version
Has this been sufficient information?
There are some stacktraces of compiz in the launchpad report, which appears to be the package that fuses the powder keg, that may detail why the driver isn't being particularly kind to compiz.
One user is reporting that the recent updates to the xorg-edgers PPA has been fine with his machine for the last few weeks. Are there any special commits to pay attention to that may have resolved this issue?
The cause of the EINVAL is an attempt to pageflip with the pipe disabled due to DPMS off. This should be fixed by:
Author: Chris Wilson <email@example.com>
Date: Tue Jun 5 16:04:16 2012 +0100
uxa: Check for DPMS off before scheduling a WAIT_ON_EVENT
Regression from commit 3f3bde4f0c72f6f31aae322bcdc20b95eade6631
Author: Chris Wilson <firstname.lastname@example.org>
Date: Thu May 24 11:58:46 2012 +0100
uxa: Only consider an output valid if the kernel reports it attached
When backporting from SNA, a key difference that UXA does not track DPMS
state in its enabled flag and that a DPMS off CRTC is still bound to the
fb. So we do need to rescan the outputs and check that we have a
connector enabled *and* the pipe is running prior to emitting a scanline
Signed-off-by: Chris Wilson <email@example.com>
Hi, Chris - I'm afraid we've had a few confirmations that applying these fixes didn't fix the issue. The same problem remains unabated.
Would you like yet another dmesg output with the patches applied?
Yes, I think we need a fresh set of debug logs with all the known fixes applied.
Created attachment 66517 [details]
drm.debug=6 After patches applied
Here you go.
Created attachment 66518 [details]
Xorg.0.log After patches applied
Hmm, the X.log indicates 2.17, there were a few related fixes as well, can you please install a 2.20.8 from your distrobution updates?
I have a couple of users on ubuntu 12.10 (2.20.8) that also tried 3.6rc7, and the hang happens every time the screen is closed or screensaver kicks in. I can't seem to be able to reproduce it myself though..
Given the batch submit immediately after the vsync'ed copy, I can't see what else userspace can do to prevent the WAIT_FOR_EVENT hang...
Hmm, there is some similarity here between this and bug 51616 if in both cases the kernel reports the pipe as active, but in reality it is disabled.
Actually, someone built a package including some other recent fixes and it looks like the problem has been resolved for a number of people. You can check out the LP report for specifics (sorry, in a bit of a rush).
Regardless, I've installed the package from his PPA and have been using normal lid-closing suspend (had been switching to TTY1, logging in as root, then using pm-suspend for the past few months).
I haven't had a single issue since. So unless there's any claiming otherwise, this bug is likely fixed.
Closing as the UXA DPMS off vs pageflip race.