Created attachment 30944 [details] Failing Xorg.0.log When suspending and then resuming a compiz desktop by closing and opening the lid of my Dell Inspiron 11z laptop, X dies. That obviously results in logging me out, and then X respawning with the login screen. X itself dies with this message: Fatal server error: Failed to map batchbuffer: Input/output error in the Xorg.0.log file, and the kernel has various messages that seem to boil down to the GPU being hung: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung I'm attaching the full Xorg.0.log and more complete kernel messages.
Created attachment 30945 [details] dmesg of two successful suspends, followed by two failing ones Note how the successful suspend/resume cycle has just a single [drm] LVDS-8: set mode 1366x768 c message at resume time, while the failing ones have two: [drm] LVDS-8: set mode 1366x768 c [drm] LVDS-8: set mode <corrupt> 1d and the failing ones eventually then result in a Aborting core [drm] LVDS-8: set mode 1366x768 c which is apparently X killing itself off, and then the new X starting (successfully). So that dmesg is all from one single boot - the machine stayed up, and was usable, but in the failure case the session had been killed on resume.
Hi Linus, Thanks for your bug report. There seem to be a recent spate of Intel driver problems with resume, (though most are things like backlight missing---yours is the first I've seen resulting in X server death). I'm doing my best to get minions to bisect things. We'll see what turns up. Eric, Keith, and Jesse, any other immediate thoughts? Thanks, -Carl
Created attachment 31206 [details] gzipped intel_gpu_dump *before* lid close This is intel_gpu_dump before the lid closed I had to gzip it to make it fit the bugzilla limits
Created attachment 31207 [details] gzipped intel_gpu_dump *after* lid open when gpu is hung I _think_ I caught the actual "hung" case rather than the case that happens a bit afterwards when hangcheck timers eventually force the GPU reset.
Hopefully the above two attachments make sense to somebody. I seem to be able to trigger the hang without actually suspending or resuming the machine at all, which made debugging much easier. I just need to close and open the lid a few times, it will hang after a couple of those events. If intel_gpu_dump isn't the right tool, then please point me to something better.
Some additional rumblings and thoughts about this: - I seem to be able to close and open the lid as much as I want, if I first just make sure that X and all other applications are stopped. IOW, I logged in from the network, and did a simple kill -STOP -1 killall -STOP Xorg and then I close and open the lid repeatedly, and nothing bad happens. The kernel catches the lid event, and my /var/log/messages looks like this: Nov 14 15:22:01 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:22:25 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:22:36 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:22:46 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:22:51 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:22:56 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:23:52 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:23:57 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:24:06 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:24:18 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:24:22 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:24:31 localhost kernel: [drm] LVDS-8: set mode 1366x768 c In fact, I seem to be able to do that even with X itself not stopped, but if I revive all the actual drawing programs, then the lid open/close will end up resulting in a GPU hang very soon: Nov 14 15:28:51 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:29:06 localhost kernel: [drm] LVDS-8: set mode 1366x768 c Nov 14 15:29:07 localhost kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Nov 14 15:29:07 localhost kernel: render error detected, EIR: 0x00000000 Nov 14 15:29:07 localhost kernel: i915: Waking up sleeping processes ... so there is definitely some interaction with the lid open/close code and the actual direct rendering. Which explains why everything works fine if I'm at the GDM login screen or if I don't have compiz enabled, but quickly goes to hell if I'm using desktop effects.
Similarities with bug 27922 and bug 27285? In particular the xrandr/modechange during 3D activity.
after.lid.gz: IPEHR: 0x01800020 I now recognise this little tell-tale. It's a MI_WAIT_FOR_EVENT. The GPU is waiting for a scanline on a disabled pipe. Given the GPU is idled - we wait for the completion of all batch buffers in the pipeline - prior to suspend, this instruction should not be being executed at the time of suspend/resume. The only source of this in the ringbuffer (apart from the old UMS path) is the new overlay code, that is unlikely to be the cause given compiz/GL rendering during suspend. Similar to bug 27146.
Pinging Jesse, as I think he knows best whether we are now safe from WAIT_FOR_EVENT + modeswitch.
I hope this is a dupe of one of the earlier ones, because I don't see how we could emit a wait_for_event after the display was off in current code. Linus, do you still see this with the latest 2D driver?
Ah and if you do see it, I was just reminded about an old patch that might fix it. Don't know why it's not upstream though, I'll check on that. https://patchwork.kernel.org/patch/80474/
On Fri, Jul 9, 2010 at 8:06 AM, <bugzilla-daemon@freedesktop.org> wrote: > > Linus, do you still see this with the latest 2D driver? This is my daughters laptop, and I haven't checked lately if it's still flaky. She's not been complaining, but at one point the work-around was to log out before suspending (which gets rid of compiz), so it's possible that it's still there. But at this point, you might as well close the bugzilla, especially if you think it's a dup. Linus
Ok, thanks. I'll close it out and get the other potential fix upstream just in case.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.