Bug 25554 - i830_uxa_prepare_access: gtt bo map failed: Input/output error
i830_uxa_prepare_access: gtt bo map failed: Input/output error
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/intel
git
x86 (IA32) Linux (All)
: medium critical
Assigned To: Carl Worth
Xorg Project Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-12-09 21:44 UTC by Daniel Richard G.
Modified: 2010-02-10 01:56 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Capture error batch buffer (7.83 KB, patch)
2010-01-05 03:17 UTC, Chris Wilson
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Richard G. 2009-12-09 21:44:37 UTC
Version: xserver-xorg-video-intel 2:2.9.99.901+git20091208.47416b1e-0ubuntu0tormod~karmic (Ubuntu xorg-edgers package)

During normal Firefox usage, the screen either goes black and the console freezes, or an image remains and the mouse cursor is movable but nothing else responds. (I can't be very precise about the failure modes, because they are being described to me remotely by a computer-phobic user.)

Anyway, here is what I see in the X server log file:

----Xorg.0.log----
(II) intel(0): Modeline "640x480"x59.9   25.18  640 656 752 800  480 490 492 525 -hsync -vsync (31.5 kHz)
(II) intel(0): Modeline "720x400"x70.1   28.32  720 738 846 900  400 412 414 449 -hsync +vsync (31.5 kHz)
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(repeated 9 times)
(EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
(repeated 5385 times)
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(repeated 79 times)
(EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
----end----

Restarting the server (with Alt-SysRq-K) appears to lead to another black screen / movable mouse cursor / dead console hang, with the new server log peppered with "Failed to submit batch buffer" messages.

There's no core dumps for these, and grabbing a batchbuffer dump at one point actually horked the system completely (presumably a kernel panic, as SSH no longer responded), so I'm not sure what additional information I could provide that would help track this bug down.
Comment 1 Chris Wilson 2009-12-10 00:46:33 UTC
Thanks for the bug report. The critical details we need are basically the chipset id (i845, i915, i965 etc), the gpu dump (along with dmesg, Xorg.log and preferably the entire contents of /sys/kernel/debug/dri/0/*) and steps to reproduce. As you can probably guess by the time this error is reported, some time has elasped since the submission of the erroneous batch (long enough for the kernel to have spotted that the GPU has stalled) which makes getting the correct details more difficult.

Carl, do we have any new tricks on how to grab the info at the time of the stall?
Comment 2 Daniel Richard G. 2009-12-10 07:49:56 UTC
Yeah, I'm definitely going to need a different approach... "cp -a /sys/kernel/debug/dri/0 /tmp" crashes the system with a (presumable) kernel panic.

(It's not supposed to do that, right? :-)
Comment 3 Daniel Richard G. 2009-12-13 20:59:06 UTC
Saw this come in in the latest package update:

commit 08371bc29013370558728dcbeeed6a23ad2f5a70
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 8 22:35:24 2009 +0000

    intel: Clear virtual after failing to mmap_gtt.
    
    Don't store the error return in bo_gem->gtt_virtual or else we will
    attempt to use that as a valid pointer in future mappings.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

This looks related. Chris, do you think this change addresses the failure mode I was seeing? So far, the hang hasn't been reproducible after the update.
Comment 4 Chris Wilson 2009-12-21 09:06:46 UTC
Apologies Daniel for the slow update, but that patch alone should not address the issue you reported. (Instead it should fixup a crash that might occur in a situation like this.) What is happening is that an earlier batchbuffer (a sequence of commands sent to the GPU) is causing the GPU itself to hang and we only detect this much later when trying to use buffers result in an IO error.

I am in the middle of developing a kernel patch to capture the batchbuffer that triggers the error - but I am away from my test machines and a reliable internet connection, at the moment, hence the delay.
Comment 5 Chris Wilson 2010-01-05 03:17:29 UTC
Created attachment 32456 [details] [review]
Capture error batch buffer

This is the patch to capture the batch buffer that is likely to have triggered the error. Hopefully it will be applied to Eric's drm-intel-next tree soon and so be available via xorg-edgers. Once applied and after the next error, can you upload the contents of /sys/kernel/debug/dri/0/i915_error_state, please?
Comment 6 Daniel Richard G. 2010-01-26 00:18:57 UTC
Chris, the xorg-edgers PPA doesn't seem to include any kernel packages. Where should I go to get a packaged kernel with your patch?
Comment 7 Chris Wilson 2010-02-10 01:56:44 UTC
I think this bug will be fixed by this libdrm update:

commit 4f0f871730b76730ca58209181d16725b0c40184
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Feb 10 09:45:13 2010 +0000

    intel: Handle resetting of input params after EINTR during SET_TILING
    
    The SET_TILING is pernicious in that it overwrites the input arguments
    following an error in order to report the current tiling state of the
    buffer. This caught us by surprise as we then fed those arguments back
    into to the ioctl unmodified following an EINTR and so the kernel then
    reported success for the no-op. We interpreted this success as meaning
    that the tiling on the buffer had changed so updated our state and
    started using the buffer incorrectly in the new tiled/untiled manner.
    This lead to all sorts of random corruption and GPU hangs, even though
    the batch buffers would look sane (when the GPU had not wandered off
    into forbidden territory).
    
    References:
    
      Bug 25475 - [i915] Xorg crash / Execbuf while wedged
      http://bugs.freedesktop.org/show_bug.cgi?id=25475
    
      Bug 25554 - i830_uxa_prepare_access: gtt bo map failed: Input/output error
      http://bugs.freedesktop.org/show_bug.cgi?id=25554
    
    (And probably every other weird bug in the last few months.)
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>