Bug 42538 - Sandybridge GPU hang + reset while running OpenGL application
Summary: Sandybridge GPU hang + reset while running OpenGL application
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Ian Romanick
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-02 17:25 UTC by Tom Fogal
Modified: 2012-04-23 17:19 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
gzipped apitrace log which reproduces the issue (2.21 MB, application/octet-stream)
2011-11-02 17:27 UTC, Tom Fogal
Details
'dmesg' log from the system which hangs. (60.21 KB, text/plain)
2011-11-02 17:27 UTC, Tom Fogal
Details
output of /sys/kernel/debug/dri/0/i915_error_state after 'glretrace' on the apitrace log. (126.21 KB, application/octet-stream)
2011-11-02 17:28 UTC, Tom Fogal
Details
apitrace of 1516 calls (4 frames) to reproduce the gpu hangup (53.20 KB, application/octet-stream)
2012-01-07 10:05 UTC, Martin
Details
output of /sys/kernel/debug/dri/0/i915_error_state after calling retrace (1.49 MB, application/octet-stream)
2012-01-07 10:06 UTC, Martin
Details
dmesg log (64.58 KB, text/x-log)
2012-01-07 10:10 UTC, Martin
Details
output of lspci -vvn (964 bytes, text/plain)
2012-01-27 04:21 UTC, Martin
Details
32bit apitrace which produces the gpu hang (51.66 KB, application/octet-stream)
2012-03-14 14:25 UTC, Martin
Details

Description Tom Fogal 2011-11-02 17:25:57 UTC
My application reliably produces a GPU hang using the attached apitrace log.

A few seconds before the screen flickers / GPU resets, the X server does not
seem to receive/process updates (i.e. mouse movements).  Then the screen
flickers once and after that mouse movements are processed as normal.  I've
made multiple apitraces, and sometimes this appears to happen twice per run,
other times it just occurs once... might be a timing thing with how quickly I
can close the window after the first hang.

The hang+reset is reproducible with the attached apitrace log.
Comment 1 Tom Fogal 2011-11-02 17:27:00 UTC
Created attachment 53090 [details]
gzipped apitrace log which reproduces the issue
Comment 2 Tom Fogal 2011-11-02 17:27:41 UTC
Created attachment 53091 [details]
'dmesg' log from the system which hangs.
Comment 3 Tom Fogal 2011-11-02 17:28:58 UTC
Created attachment 53092 [details]
output of /sys/kernel/debug/dri/0/i915_error_state after 'glretrace' on the apitrace log.
Comment 4 Martin 2012-01-07 10:03:28 UTC
This bugs affects my Thinkpad T410 too. Although its not a SandyBridge GPU starting a OpenGL leads to the same error state. The GPU got declared wedged because it hangs too fast.
After starting the Application a weird Part of the QT4-GUI appears on the screen. Then it hangs for 3 seconds and the process segfaults in i965_dri.so (version: 7.12.0~git20120106.e60daf7e-0ubuntu0sarvatt~oneiric)

My attached apitrace is much smaller, perhaps this makes it possible to pin this down.

Greetings,
Martin
Comment 5 Martin 2012-01-07 10:05:07 UTC
Created attachment 55266 [details]
apitrace of 1516 calls (4 frames) to reproduce the gpu hangup
Comment 6 Martin 2012-01-07 10:06:29 UTC
Created attachment 55267 [details]
output of /sys/kernel/debug/dri/0/i915_error_state after calling retrace
Comment 7 Martin 2012-01-07 10:10:21 UTC
Created attachment 55268 [details]
dmesg log
Comment 8 Martin 2012-01-07 10:16:14 UTC
Comment on attachment 55266 [details]
apitrace of 1516 calls (4 frames) to reproduce the gpu hangup

call 187 (frame 1) leads to GPU hang/segfault.
Comment 9 Martin 2012-01-27 04:19:49 UTC
This bug is fixed for my case using git version 2:2.17.0+git20120126.b1f9415b-0ubuntu0sarvatt~oneiric

Thank you!
Comment 10 Martin 2012-01-27 04:21:15 UTC
Created attachment 56218 [details]
output of lspci -vvn
Comment 11 Kenneth Graunke 2012-01-27 12:31:08 UTC
Reopening.  While I'm glad that Martin's issue is fixed, it doesn't appear to be related to Tom's original report (which still hangs for me).
Comment 12 Bryce Harrington 2012-02-03 14:58:24 UTC
We believe this is the same as a bug some users are seeing in Ubuntu since mesa 7.11 and confirmed still happening in 8.0-rc2:
https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/899159
GPU lockup render.IPEHR: 0x7b009004

https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/899159/+attachment/2616330/+files/i915_error_state.txt
https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/899159/+attachment/2665023/+files/i915_error_state
https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/899159/+attachment/2712085/+files/i915_error_state

Various 3d games are affected, including ones run under wine, both full screen and windowed.  Ski Challenge 2012, NightSkyHD (infrequently), Saga of Ryzom (quickly and frequently).
Comment 13 kgerstl 2012-02-03 15:18:06 UTC
I'd like to add, that the "Ski Challenge 2012" game mentioned on Launchpad produces the same hangs as the attached apitrace log with glretrace.
Comment 14 Kenneth Graunke 2012-02-28 18:12:34 UTC
Tom,

It turns out that your GPU hang was due to a bug in our MRT support.  It should be fixed on master by:

commit 172bb92db1a3c317867d9cfec6f15c09c37a0f6c
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Sat Feb 18 21:29:29 2012 -0800

    i965: Only set Last Render Target Select on the last FB write.
    
    Fixes GPU hangs in OilRush, Trine, and Amnesia: The Dark Descent,
    which all use MRT (multiple render targets).
    
    NOTE: This is a candidate for release branches.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38720
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40059
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45216
    Reviewed-by: Eric Anholt <eric@anholt.net>
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>

That patch is also on the 8.0 branch as of today.  Your apitrace file no longer hangs my system.  Could you confirm?

Thanks!  Sorry this took so long.
Comment 15 Kenneth Graunke 2012-03-06 09:37:56 UTC
This is on the 8.0 branch as 16cc79f975816c0741711560be48fc498d4b4794.

Closing since I believe it's fixed and haven't heard otherwise.
Comment 16 Martin 2012-03-14 14:24:22 UTC
With Ubuntu stable Kernel (3.0.0-16-generic) on arch amd64 my 32-bit OpenGL applications still causes the described gpu hangs.

Perhaps it has something to do with multiarch (64 -> 32 bit) support? I build apitrace with -m32 and created another trace file for my app.

I hope the error could be reproduced.

Thanks in advance!
Martin
Comment 17 Martin 2012-03-14 14:25:34 UTC
Created attachment 58449 [details]
32bit apitrace which produces the gpu hang
Comment 18 Kenneth Graunke 2012-03-14 14:54:22 UTC
Hi Martin,

Your trace works fine on my Sandybridge system.  No hangs.

In order to run a 32-bit application, you'll need to use a 32-bit build of Mesa (libGL.so and i965_dri.so).  Note that your system 'glxinfo' binary is likely 64-bit, so it won't give an accurate indication of what version of Mesa you're using.  (You could use a 32-bit build of glxinfo, though.)

I'm guessing that perhaps you accidentally got the (old) 32-bit Mesa installed on your system, rather than the newly built git version you thought you were testing.  It's really easy to do.

Or I could be wrong and there's an actual bug. :)
Comment 19 Kenneth Graunke 2012-04-23 17:19:11 UTC
Closing as it works for me and no response for a month.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.