Bug 52382 - [ivb gt1] Severe image corruption and GPU Hang, too many PS threads
[ivb gt1] Severe image corruption and GPU Hang, too many PS threads
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965
8.0
x86-64 (AMD64) Linux (All)
: highest critical
Assigned To: Ian Romanick
:
: 52442 52473 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-23 05:21 UTC by Jonathan deBoer
Modified: 2012-10-08 08:27 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Ubuntu 12.04 login screen (396.42 KB, image/jpeg)
2012-07-23 05:21 UTC, Jonathan deBoer
Details
Picture of screen artifacts (driconf open in twm) (402.52 KB, image/jpeg)
2012-07-23 05:23 UTC, Jonathan deBoer
Details
Xorg log (49.20 KB, text/plain)
2012-07-23 05:24 UTC, Jonathan deBoer
Details
dmesg output (74.30 KB, text/plain)
2012-07-23 05:24 UTC, Jonathan deBoer
Details
Listing of installed packages and versions (226.85 KB, text/plain)
2012-07-23 05:25 UTC, Jonathan deBoer
Details
/etc/drirc (1.55 KB, text/plain)
2012-07-23 05:26 UTC, Jonathan deBoer
Details
glxinfo (12.64 KB, text/plain)
2012-07-23 05:26 UTC, Jonathan deBoer
Details
i915 error state (2.02 MB, application/octet-stream)
2012-07-23 05:27 UTC, Jonathan deBoer
Details
lspci -vvv output (29.56 KB, text/plain)
2012-07-23 05:28 UTC, Jonathan deBoer
Details
intel_reg_dumper output (11.60 KB, text/plain)
2012-07-23 05:29 UTC, Jonathan deBoer
Details
Output from lspci -n (618 bytes, text/plain)
2012-07-26 15:29 UTC, Jonathan deBoer
Details
Logs from running driver snapshot 4a7334eb... from git (330.50 KB, application/octet-stream)
2012-07-26 18:07 UTC, Jonathan deBoer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan deBoer 2012-07-23 05:21:10 UTC
Created attachment 64520 [details]
Ubuntu 12.04 login screen

I'm getting Severe graphics corruption on my system. Graphics are fine in Windows 7. I have had the motherboard and CPU replaced, the issue still exists. Memory tests fine.

Symptoms:

1) The screen is covered in small wrong-color squares, almost like a corrupt JPG. Some screen elements render fine (e.g: the twm menu) while others are unreadable. Screen elements are not always drawn over when they go away. See the pictures attached for examples:

2) Driver hangs on any 3d type activity. glxgears, for example, will cause the screen to basicly freeze. The driver will then reset (as shown in dmesg). xorg-edgers drivers do not appear to freeze, but glxgears exits with "intel_do_flush_locked failed: Input/output error". glxgears registers 0.163fps or lower. 

3) The image seems to display properly using the VESA driver. (Boot gentoo live-dvd with -nofb option)

4) With ubuntu and the xorg-edgers PPA, the corruption is present using both the "uxa" and "sna" AccelMethod options in xorg.conf 

5) Sometimes, switching to a VT and back will clean up the image for a moment (i.e. until something moves) Also, after glxgears crashed with the error above, most applications were readable. However, redraw problems were still rampant, and trying to run glxgears again gives the do_flush_locked error above

I have confirmed these issues are present when booting the following distros:

Linux Mint 13 (64bit) Live-dvd
Ubuntu 12.04 (64bit) Install CD and Standard install
Gentoo 64bit Live-DVD 12.1 (regular boot, -nofb boots into vesa mode)
Ubuntu 12.04 with xorg-edgers PPA (As of July 22, 2012)

Hardware:
CPU: Intel i5-3470
Chipset: Intel Z77 (M/B: Asus P8Z77-V, bios rev 1205)
Memory: 2x 8GB Patriot G3 DDR3-12800 1600mhz
Monitor: VGA connected CRT or HDMI connected LCD

Currently Running kernel:
Linux Hoita 3.5.0-5-generic #5-Ubuntu SMP Wed Jul 18 07:35:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Installed packages (Up-to-date with xorg-edgers-ppa as of July22, 2012):
linux-generic            3.5.0.5.5
xorg                     1:7.6+12ubuntu1
libdrm2                  2.4.37+git20120713.992e2afd-0ubuntu0ricotz~precise
libgl1-mesa-dri          8.1~git20120720.cdad337f-0ubuntu0sarvatt~precise
xserver-xorg-video-intel   2:2.20.0+git20120720.f92a64dd-0ubuntu0sarvatt~precise
Comment 1 Jonathan deBoer 2012-07-23 05:23:51 UTC
Created attachment 64521 [details]
Picture of screen artifacts (driconf open in twm)
Comment 2 Jonathan deBoer 2012-07-23 05:24:14 UTC
Created attachment 64522 [details]
Xorg log
Comment 3 Jonathan deBoer 2012-07-23 05:24:36 UTC
Created attachment 64523 [details]
dmesg output
Comment 4 Jonathan deBoer 2012-07-23 05:25:10 UTC
Created attachment 64524 [details]
Listing of installed packages and versions
Comment 5 Jonathan deBoer 2012-07-23 05:26:04 UTC
Created attachment 64525 [details]
/etc/drirc
Comment 6 Jonathan deBoer 2012-07-23 05:26:32 UTC
Created attachment 64526 [details]
glxinfo
Comment 7 Jonathan deBoer 2012-07-23 05:27:09 UTC
Created attachment 64527 [details]
i915 error state
Comment 8 Jonathan deBoer 2012-07-23 05:28:27 UTC
Created attachment 64528 [details]
lspci -vvv output
Comment 9 Jonathan deBoer 2012-07-23 05:29:27 UTC
Created attachment 64529 [details]
intel_reg_dumper output
Comment 10 Jonathan deBoer 2012-07-23 05:34:41 UTC
Turns out that the GPU is hanging with the latest xorg-edgers packages, same as before (see the dmesg output). However, the monitor is not resetting like it was (as if the resolution had changed) which is why I had thought otherwise.
Comment 11 Chris Wilson 2012-07-23 08:20:15 UTC
We have obviously angered the hw gods here.

Can you try sacrificing some rc6 to see if that appeases them? Please append i915.i915_enable_rc6=0 to your grub boot parameters and see if that makes a difference.
Comment 12 Jonathan deBoer 2012-07-23 15:34:38 UTC
Adding the "i915.i915_enable_rc6=0" command to the kernel boot parameters did not make any change that I can see.
Comment 13 Jonathan deBoer 2012-07-25 02:10:05 UTC
If it would help, I'm willing to checkout the required drivers and compile them with whatever debugging turned on you would like.

Just let me know, I'd really like to see this resolved.

Thanks!
Comment 14 Chris Wilson 2012-07-25 08:12:25 UTC
Do you have any overclocking settings in your BIOS?
Comment 15 Jonathan deBoer 2012-07-25 18:17:58 UTC
There are some, but none of them seem to make any difference as far as I can tell.

The Bios has a "Normal" setting in basic mode, which is supposed to disable all the overclocking features. It was the first thing I tried when I got the new motherboard and noticed the problem. In advanced mode, it gives you a LOT more overclocking options, but I'm not into overclocking, so I haven't used many. 

Right now, almost everything is set to factory default except the RAM. It's set to 1600MHz, which is what the RAM is rated for. Previously, it was 1333MHz, and the problem still existed.

I have not tried fiddling with any of the more advanced settings. Right now, mostly they're all set to auto.

If there is any particular change you would like me to make, please let me know.

Thanks!
Comment 16 Chris Wilson 2012-07-26 08:36:41 UTC
Can you please do a lspci -n? The question of the hour is whether this a 0x0152 (IvyBridge desktop GT1). I have two other bug reports for that specific chip, could this be a third?
Comment 17 Chris Wilson 2012-07-26 15:00:57 UTC
I pushed a potential fix to http://cgit.freedesktop.org/~ickle/xf86-video-intel/log/?h=ivb-gt1 if you are able to test, that would be fantastic.
Comment 18 Jonathan deBoer 2012-07-26 15:29:44 UTC
Created attachment 64746 [details]
Output from lspci -n

This is the output of lspci -n as requested
Comment 19 Chris Wilson 2012-07-26 15:34:59 UTC
Thanks, so three very similar deaths, each on a 0x0152.
Comment 20 Jonathan deBoer 2012-07-26 18:07:04 UTC
Created attachment 64751 [details]
Logs from running driver snapshot 4a7334eb... from git

These are the log files generated by running the driver version found here:

http://cgit.freedesktop.org/~ickle/xf86-video-intel/commit/?h=ivb-gt1&id=4a7334ebb0e31fa603139350160772ae37171990

Results: This _appears_ to fix the "corrupted jpg" look of the graphics on the initial login screen, and when running TWM. 

However, when launching Unity (logging in), the computer crashes HARD. Previously, I could switch to a VT, and kill X. Now, I cannot. (I needed to ssh into the system to get these logs)

Also, glxgears crashes without showing anything.
Comment 21 Chris Wilson 2012-07-26 21:07:02 UTC
I believe the ddx portion of this to be fixed with:

commit 1ced4f1ddcf30b518e1760c7aa4a5ed4f934b9f5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jul 26 10:50:31 2012 +0100

    Reduce maximum thread count for IVB GT1 to avoid spontaneous combustion
    
    Somewhere along the way it seems that IVB GT1 was reduced to only allow
    a maximum of 48 threads, as revealed in the lastest bspecs.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52473
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Note that a corresponding patch for mesa is also required.
Comment 22 Chris Wilson 2012-07-26 21:10:22 UTC
*** Bug 52442 has been marked as a duplicate of this bug. ***
Comment 23 Chris Wilson 2012-07-27 08:25:23 UTC
*** Bug 52473 has been marked as a duplicate of this bug. ***
Comment 24 Eric Anholt 2012-07-27 18:57:06 UTC
commit fbf86c7f0f1f12e52b927e3870535073879d0a4d
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Jul 27 11:34:07 2012 -0700

    i965/gen7: Reduce GT1 WM thread count according to updated BSpec.
    
    Acked-by: Kenneth Graunke <kenneth@whitecape.org>
    
    https://bugs.freedesktop.org/show_bug.cgi?id=52382

also pushed to 8.0.
Comment 25 Gordon Jin 2012-10-08 08:27:34 UTC
*** Bug 52473 has been marked as a duplicate of this bug. ***