Bug 29046 - [G45] Blank/corrupted windows, then GPU hang -- garbage in the batchbuffer
Summary: [G45] Blank/corrupted windows, then GPU hang -- garbage in the batchbuffer
Status: CLOSED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Carl Worth
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 40282 40564 41098 41102 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-07-13 11:04 UTC by Glen Peterson
Modified: 2016-11-03 12:08 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Display corruption - elongated font, or sometimes just vertical lines. (131.62 KB, image/png)
2010-07-13 11:04 UTC, Glen Peterson
no flags Details
Sometimes I see this flash on the screen every 4 seconds after a crash. (82.81 KB, image/jpeg)
2010-07-13 11:05 UTC, Glen Peterson
no flags Details
Problem still happens with DRI disabled - screen photo shows GPU hung (466.90 KB, image/jpeg)
2010-07-14 04:59 UTC, Glen Peterson
no flags Details
dmesg from immediately after GPU hang (kernel: 2.6.32-23-generic x86_64) (44.07 KB, application/octet-stream)
2010-07-14 06:47 UTC, Glen Peterson
no flags Details
i915_error_state from immediately after GPU hang (kernel: 2.6.32-23-generic x86_64) (217 bytes, text/plain)
2010-07-14 06:48 UTC, Glen Peterson
no flags Details
Xorg.0.log from immediately after GPU hang (kernel: 2.6.32-23-generic x86_64) (26.23 KB, text/plain)
2010-07-14 06:49 UTC, Glen Peterson
no flags Details
New Xorg.0.log from after crash (before reboot) using ppa:xorg-edgers (same old kernel) (48.86 KB, text/plain)
2010-07-14 11:09 UTC, Glen Peterson
no flags Details
dmesg after hang with ppa:xorg-edgers (same old kernel) (43.20 KB, text/plain)
2010-07-14 11:13 UTC, Glen Peterson
no flags Details
i915_error_state with ppa:xorg-edgers, kernal 2.6.35-8 x86_64 (763.68 KB, application/octet-stream)
2010-07-17 10:16 UTC, Glen Peterson
no flags Details
dmesg with ppa:xorg-edgers, kernal 2.6.35-8-generic x86_64 (48.89 KB, application/octet-stream)
2010-07-17 10:16 UTC, Glen Peterson
no flags Details
dmesg.0 with ppa:xorg-edgers, kernal 2.6.35-8-generic x86_64 (49.99 KB, application/octet-stream)
2010-07-17 10:16 UTC, Glen Peterson
no flags Details
Xorg.0.log with ppa:xorg-edgers, kernal 2.6.35-8-generic x86_64 (74.64 KB, text/x-log)
2010-07-17 10:17 UTC, Glen Peterson
no flags Details
Xorg.0.log.old with ppa:xorg-edgers, kernal 2.6.35-8-generic x86_64 (40.88 KB, application/x-trash)
2010-07-17 10:36 UTC, Glen Peterson
no flags Details

Description Glen Peterson 2010-07-13 11:04:31 UTC
Created attachment 36995 [details]
Display corruption - elongated font, or sometimes just vertical lines.

After a few non-fatal display issues (attachment: corruptBeforeCrash.png), the whole screen goes blank except for a single underscore in the upper left-hand corner (presumably it's a cursor in a character display mode).  After 2.5 seconds, the mouse pointer appears in the center. A half-second later, I can move the mouse. After 4 seconds (total), I sometimes see a flicker of what looks like startup messages (attachment: flashAfterCrash.jpg - this was very hard to capture).  Then the screen blanks and the process repeats with the single solid underscore in the upper-left-hand corner for 2.5 seconds.

I'm not sure if this is an issue with the intel video driver or DRI, or what.

This problem only occurs under 64-bit linux. Under 32-bit, I was able to see a very little corruption here and there, but no crash.  I think this is a display issue, more than a 64-bit issue, but in 32-bits it doesn't seem to crash.


System environment: 
-- chipset: Intel G45
-- system architecture: x86_64
-- xf86-video-intel: ?? 
-- xserver: 
-- mesa: 7.7.1 ???
-- libdrm:
-- kernel: 2.6.32-23-generic
-- Linux distribution: Ubuntu 10.04 64-bit
-- Machine or mobo model: SuperMicro C2SEA
-- Display connector: VGA


Reproducing steps:

Open firefox.
Go to gmail
Begin typing a message
Grab lower-left corner of window and drag rapidly and randomly
First, notice disply corruption, then total blank screen a few seconds later.

Alternately:
Similar with gnome-terminal or gimp.  Type or something that changes the screen inside the application, then wiggle lower-left corner of window (resizing it) until it crashes.  Should only take a few seconds.

Problem occurs with NO xorg.conf.  Creating an xorg.conf using "Xorg -configure" and adding the following to the "Device" section, "fixes" the problem entirely:

Option "DRI" "False"

I'm happy to provide more info if you tell me what you need.
Comment 1 Glen Peterson 2010-07-13 11:05:46 UTC
Created attachment 36996 [details]
Sometimes I see this flash on the screen every 4 seconds after a crash.
Comment 2 Chris Wilson 2010-07-13 11:55:55 UTC
Hmm, kernel is a little too old for automatic error state dumping on a GPU crash (and we may not gather sufficient information for g45 in any case). If you could install the current drivers from the Ubuntu ppa:xorg-edgers (and even the kernel!) and try and reproduce that is a vital first step. So after a hang, upload the /sys/kernel/debug/dri/0/i915_error_state, dmesg and Xorg.log.

Thanks.
Comment 3 Glen Peterson 2010-07-14 04:59:29 UTC
Created attachment 37037 [details]
Problem still happens with DRI disabled - screen photo shows GPU hung

It might take me a couple of days to do what you ask - I'm going to make a separate partition and install so I don't hose my main desktop.

In the mean time, I was wrong when I said that disabling DRI fixed it (and I remembered that I also had increased the starting graphics memory size in the bios).  I crashed it after 8 hours yesterday.  Things slowly started getting corrupted (same elongated fonts and/or vertical lines as before) in the end, I was adjusting a table background in OpenOffice Write when it crashed.  Here's a transcript of the error message I found when I hit CTRl-Alt-F1 (what the image is of):

[35732.200015] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[35732.201369] render error detected, EIR: 0x00000000
[35732.201410] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 2016657 at 2016602)

After a minute of nothing good happening, I pressed CTRL-C and rebooted.  I guess I should have copied some logs first - next time!
Comment 4 Glen Peterson 2010-07-14 06:47:30 UTC
Created attachment 37039 [details]
dmesg from immediately after GPU hang (kernel: 2.6.32-23-generic x86_64)
Comment 5 Glen Peterson 2010-07-14 06:48:30 UTC
Created attachment 37040 [details]
i915_error_state from immediately after GPU hang (kernel: 2.6.32-23-generic x86_64)
Comment 6 Glen Peterson 2010-07-14 06:49:31 UTC
Created attachment 37041 [details]
Xorg.0.log from immediately after GPU hang (kernel: 2.6.32-23-generic x86_64)
Comment 7 Glen Peterson 2010-07-14 07:08:57 UTC
I think I added the requested files from right after a crash (though I attached Xorg.0.log instead of Xorg.log), also a photo of a new error message about the GPU being hung.

Pardon my sloth and ignorance, but do I still need to upgrade to the latest (unstable ppa:xorg-edgers) kernel and driver, make it crash, and attach those files again?  I'll do it, but it's work and I just need to hear you say that it's still helpful.  I'm clearing the NEEDINFO tag, awaiting your response.
Comment 8 Chris Wilson 2010-07-14 07:25:35 UTC
Please do take the time to update the components of the gfx stack. The stock drivers in the Ubuntu install are quite old (i.e 3 stable release ago) so it is reassuring to check that the bug is still present in the current tree. Also an updated kernel (post-2.6.34) is required for automated batchbuffer dumping into i915_error_state. Also, if you restarted X (or it restarted automatically) the log file to look for will be Xorg.0.log.old or /var/log/gdm/:0.log.1 -- the one containing the crash may have additional relevant information.

Thanks for the taking the time to update the bug.
Comment 9 Glen Peterson 2010-07-14 07:53:20 UTC
I had thought about X overwriting/rotating logs when it restarts and made myself a little script to copy the files you requested into a timestamped directory.  When it crashed, I CTRL-ALT-F1'd (had to do it a few times for it to take effect) and ran my script *before* rebooting to be sure nothing was rotated/overwritten.  

But I wonder... the 4-second flash in the blank screen in my original report may be X trying to restart and overwriting the logs.  I'll have to check the timestamps next time.

I'll try to make my system into a dual-boot in the next few days so I can run any kernel you want without jeopardizing my main OS - also to test a fix later if one is made.  I'll upload new files once I'm done.

Thanks for your patience.
Comment 10 Glen Peterson 2010-07-14 11:09:27 UTC
Created attachment 37044 [details]
New Xorg.0.log from after crash (before reboot) using ppa:xorg-edgers (same old kernel)

Bug is still there in the default configuration (no xorg.conf) with the latest drivers from ppa:xorg-edgers.  I'm still not sure what's causing it, but here is what I did.

 - Install Ubnutu 10.04 64-bit

 - Add ppa:xorg-edgers to package list.

 - Run updater, apply, reboot

 - Set Appearance: Visual Effects: None

 - Open a few dozen random applications and put them randomly overlapping around the screen.  The more, the better.  Good ones are terminal, Ubuntu Software Center, Firefox, and OpenOffice Write (these are the ones that I've seen corruption in so far)

 - Type an email using gmail in firefox.  Type a paragraph with wrapped text, paste it several times to more-than fill the text area.

 - grab a corner of the firefox window with your mouse to start it resizing, then wiggle mouse erratically in all directions, in a vaguely circular motion.

 - Repeat with different applications, different corners

 - Open more apps and wiggle/type more as necessary until...

*boom*

So, where do you recommend I get the latest kernel from?  In any case, I have a separate install for testing now.
Comment 11 Glen Peterson 2010-07-14 11:13:21 UTC
Created attachment 37045 [details]
dmesg after hang with ppa:xorg-edgers (same old kernel)
Comment 12 Chris Wilson 2010-07-14 11:20:53 UTC
The ppa:xorg-edgers also has a mainline kernel for testing.
Comment 13 Glen Peterson 2010-07-15 07:38:34 UTC
(In reply to comment #12)
> The ppa:xorg-edgers also has a mainline kernel for testing.

I looked, but I didn't see what I think you want me to install.  I don't think Lucid has a 2.6.35+ kernel - I seem to always see Maverick associated with the newer kernel versions.  Should I upgrade to Maverick, and if so, what instructions should I follow (I installed a dedicated OS for this testing yesterday)?  Can you give me a link and/or a few sentences to point me in the right direction?

P.S.
When I reported "display corruption" in my previous posts, I think that always consisted of vertical font stretching (as in the first attachment).  The font stretching always precedes and intensifies just before hanging the GPU.  I hung the GPU once when resuming from suspend.

Also, I now believe that every time I said, "crashed" I meant "hung the GPU."

I switched my Appearance Preferences: Visual Effects from "None" to "Normal" yesterday and didn't have a GPU hang in the 5 hours or so since I did that.  I was trying not to hang the GPU and I saw just a little font stretching, but that might be a record.  I think dynamically resizing the window contents (which only happens with Visual Effects = "None") in combination with the font stretching may be a key to causing the GPU hang.
Comment 14 Glen Peterson 2010-07-17 10:16:14 UTC
Created attachment 37151 [details]
i915_error_state with ppa:xorg-edgers, kernal 2.6.35-8 x86_64
Comment 15 Glen Peterson 2010-07-17 10:16:37 UTC
Created attachment 37152 [details]
dmesg with ppa:xorg-edgers, kernal 2.6.35-8-generic x86_64
Comment 16 Glen Peterson 2010-07-17 10:16:55 UTC
Created attachment 37153 [details]
dmesg.0 with ppa:xorg-edgers, kernal 2.6.35-8-generic x86_64
Comment 17 Glen Peterson 2010-07-17 10:17:25 UTC
Created attachment 37154 [details]
Xorg.0.log with ppa:xorg-edgers, kernal 2.6.35-8-generic x86_64
Comment 18 Chris Wilson 2010-07-17 10:33:43 UTC
This is the culprit:

0x08fe03b4:      0x7b003c04: 3DPRIMITIVE: rect list sequential
0x08fe03b8:      0x00000003:    vertex count
0x08fe03bc:      0x00000000:    start vertex
0x08fe03c0:      0x00000001:    instance count
0x08fe03c4:      0x00000000:    start instance
0x08fe03c8:      0x00000000:    index bias
0x08fe03cc:      0x14300804: MI UNKNOWN
0x08fe03d0: HEAD 0x03f00100: MI_REPORT_HEAD
0x08fe03d4:      0x00000000: MI_NOOP
0x08fe03d8:      0x000e00ce: MI_NOOP
0x08fe03dc:      0x0b4d1000: MI UNKNOWN
0x08fe03e0:      0x00000000: MI_NOOP
0x08fe03e4:      0x00000000: MI_NOOP
0x08fe03e8:      0x00000000: MI_NOOP
0x08fe03ec:      0x00000000: MI_NOOP
0x08fe03f0:      0x00000000: MI_NOOP
0x08fe03f4:      0x00000000: MI_NOOP
0x08fe03f8:      0x00000000: MI_NOOP
0x08fe03fc:      0x00000000: MI_NOOP
0x08fe0400:      0x0200000a: MI_FLUSH
0x08fe0404:      0x69040000: 3DSTATE_PIPELINE_SELECT

Odd. Odd. Odd.
Comment 19 Glen Peterson 2010-07-17 10:36:47 UTC
Created attachment 37155 [details]
Xorg.0.log.old with ppa:xorg-edgers, kernal 2.6.35-8-generic x86_64

I upgraded to Maverick per Ubuntu instructions, reinstalled xorg-edgers, caused a hang, and uploaded requested files.

Better steps to reproduce in about 3 minutes.  Note that none of these steps are necessary, they just seem to make it hang quicker.

 - Use Lucid or Maverick 64-bit Ubuntu with Intel G45 chipset, either default or latest i915 driver.

 - Set Appearance Preferences: Visual Effects = "None."

 - Go to Gmail with Firefox and start composing a message with several paragraphs so that it scrolls the little textarea.

 - Go through the Applications menu opening them all, or as many as your system will handle (I think you need to use up some memory - I have 8GB).

 - Drag lower-left corner of Firefox window (resizing it) in a small circle (to make it resize vertically and horizontally.

 - Notice some text corruption before GPU hang.

Firefox seems to be the easiest to hang it with when composing a big gmail message.  It's not Firefox specific.  I've hung it (accidentally) with each of the following:

Intellij IDEA 9
OpenOffice Write
gnome-terminal
emacs
Update Manager
the Applications Menu
Resuming from Hibernate and trying to log in.

The only app I've never hung it with or seen text stretching in is the Chromium web browser.  The text stretching always precedes a hang.

Sorry to be daft and needy before.  Let me know if I can do any more to help.
Comment 20 Chris Wilson 2012-03-26 01:44:55 UTC
I believe these are all related to the underlying bug:

commit c501ae7f332cdaf42e31af30b72b4b66cbbb1604
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Dec 14 13:57:23 2011 +0100

    drm/i915: Only clear the GPU domains upon a successful finish
    
    By clearing the GPU read domains before waiting upon the buffer, we run
    the risk of the wait being interrupted and the domains prematurely
    cleared. The next time we attempt to wait upon the buffer (after
    userspace handles the signal), we believe that the buffer is idle and so
    skip the wait.
    
    There are a number of bugs across all generations which show signs of an
    overly haste reuse of active buffers.
    
    Such as:
    
      https://bugs.freedesktop.org/show_bug.cgi?id=29046
      https://bugs.freedesktop.org/show_bug.cgi?id=35863
      https://bugs.freedesktop.org/show_bug.cgi?id=38952
      https://bugs.freedesktop.org/show_bug.cgi?id=40282
      https://bugs.freedesktop.org/show_bug.cgi?id=41098
      https://bugs.freedesktop.org/show_bug.cgi?id=41102
      https://bugs.freedesktop.org/show_bug.cgi?id=41284
      https://bugs.freedesktop.org/show_bug.cgi?id=42141
    
    A couple of those pre-date i915_gem_object_finish_gpu(), so may be
    unrelated (such as a wild write from a userspace command buffer), but
    this does look like a convincing cause for most of those bugs.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: stable@kernel.org
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 21 Gordon Jin 2012-04-01 01:56:06 UTC
*** Bug 40282 has been marked as a duplicate of this bug. ***
Comment 22 Gordon Jin 2012-04-01 01:57:20 UTC
*** Bug 40564 has been marked as a duplicate of this bug. ***
Comment 23 Gordon Jin 2012-04-01 01:58:00 UTC
*** Bug 41092 has been marked as a duplicate of this bug. ***
Comment 24 Gordon Jin 2012-04-01 01:58:18 UTC
*** Bug 41098 has been marked as a duplicate of this bug. ***
Comment 25 Gordon Jin 2012-04-01 01:58:47 UTC
*** Bug 41102 has been marked as a duplicate of this bug. ***
Comment 26 Jari Tahvanainen 2016-11-03 12:08:20 UTC
Closing resolved+fixed. No activity on >3 years.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.