Version: xserver-xorg-video-intel 2:2.9.99.901+git20091208.47416b1e-0ubuntu0tormod~karmic (Ubuntu xorg-edgers package) During normal Firefox usage, the screen either goes black and the console freezes, or an image remains and the mouse cursor is movable but nothing else responds. (I can't be very precise about the failure modes, because they are being described to me remotely by a computer-phobic user.) Anyway, here is what I see in the X server log file: ----Xorg.0.log---- (II) intel(0): Modeline "640x480"x59.9 25.18 640 656 752 800 480 490 492 525 -hsync -vsync (31.5 kHz) (II) intel(0): Modeline "720x400"x70.1 28.32 720 738 846 900 400 412 414 449 -hsync +vsync (31.5 kHz) (WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error (repeated 9 times) (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error. (repeated 5385 times) (WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error (repeated 79 times) (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error. ----end---- Restarting the server (with Alt-SysRq-K) appears to lead to another black screen / movable mouse cursor / dead console hang, with the new server log peppered with "Failed to submit batch buffer" messages. There's no core dumps for these, and grabbing a batchbuffer dump at one point actually horked the system completely (presumably a kernel panic, as SSH no longer responded), so I'm not sure what additional information I could provide that would help track this bug down.
Thanks for the bug report. The critical details we need are basically the chipset id (i845, i915, i965 etc), the gpu dump (along with dmesg, Xorg.log and preferably the entire contents of /sys/kernel/debug/dri/0/*) and steps to reproduce. As you can probably guess by the time this error is reported, some time has elasped since the submission of the erroneous batch (long enough for the kernel to have spotted that the GPU has stalled) which makes getting the correct details more difficult. Carl, do we have any new tricks on how to grab the info at the time of the stall?
Yeah, I'm definitely going to need a different approach... "cp -a /sys/kernel/debug/dri/0 /tmp" crashes the system with a (presumable) kernel panic. (It's not supposed to do that, right? :-)
Saw this come in in the latest package update: commit 08371bc29013370558728dcbeeed6a23ad2f5a70 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Dec 8 22:35:24 2009 +0000 intel: Clear virtual after failing to mmap_gtt. Don't store the error return in bo_gem->gtt_virtual or else we will attempt to use that as a valid pointer in future mappings. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> This looks related. Chris, do you think this change addresses the failure mode I was seeing? So far, the hang hasn't been reproducible after the update.
Apologies Daniel for the slow update, but that patch alone should not address the issue you reported. (Instead it should fixup a crash that might occur in a situation like this.) What is happening is that an earlier batchbuffer (a sequence of commands sent to the GPU) is causing the GPU itself to hang and we only detect this much later when trying to use buffers result in an IO error. I am in the middle of developing a kernel patch to capture the batchbuffer that triggers the error - but I am away from my test machines and a reliable internet connection, at the moment, hence the delay.
Created attachment 32456 [details] [review] Capture error batch buffer This is the patch to capture the batch buffer that is likely to have triggered the error. Hopefully it will be applied to Eric's drm-intel-next tree soon and so be available via xorg-edgers. Once applied and after the next error, can you upload the contents of /sys/kernel/debug/dri/0/i915_error_state, please?
Chris, the xorg-edgers PPA doesn't seem to include any kernel packages. Where should I go to get a packaged kernel with your patch?
I think this bug will be fixed by this libdrm update: commit 4f0f871730b76730ca58209181d16725b0c40184 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Feb 10 09:45:13 2010 +0000 intel: Handle resetting of input params after EINTR during SET_TILING The SET_TILING is pernicious in that it overwrites the input arguments following an error in order to report the current tiling state of the buffer. This caught us by surprise as we then fed those arguments back into to the ioctl unmodified following an EINTR and so the kernel then reported success for the no-op. We interpreted this success as meaning that the tiling on the buffer had changed so updated our state and started using the buffer incorrectly in the new tiled/untiled manner. This lead to all sorts of random corruption and GPU hangs, even though the batch buffers would look sane (when the GPU had not wandered off into forbidden territory). References: Bug 25475 - [i915] Xorg crash / Execbuf while wedged http://bugs.freedesktop.org/show_bug.cgi?id=25475 Bug 25554 - i830_uxa_prepare_access: gtt bo map failed: Input/output error http://bugs.freedesktop.org/show_bug.cgi?id=25554 (And probably every other weird bug in the last few months.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.