Bug 38771 - [[GM45] DRI] GPU hangs with current Mesa GIT when running certain OpenGL applications
[[GM45] DRI] GPU hangs with current Mesa GIT when running certain OpenGL appl...
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965
git
x86-64 (AMD64) Linux (All)
: highest major
Assigned To: Eric Anholt
:
: 38666 38901 38942 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-29 06:32 UTC by Julius Schwartzenberg
Modified: 2011-07-09 10:35 UTC (History)
7 users (show)

See Also:


Attachments
X.org log (54.49 KB, text/plain)
2011-06-29 06:32 UTC, Julius Schwartzenberg
Details
The i915_error_state file from /sys/kernel/debug/dri/0/i915_error_state (842.58 KB, text/plain)
2011-06-29 13:57 UTC, Julius Schwartzenberg
Details
The i915_fbc_status file from /sys/kernel/debug/dri/0/i915_fbc_status (41 bytes, text/plain)
2011-06-29 14:02 UTC, Julius Schwartzenberg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Julius Schwartzenberg 2011-06-29 06:32:11 UTC
Created attachment 48549 [details]
X.org log

When I tried to use the latest GIT version of Mesa, my GPU hangs and the graphics break. dmesg gives these errors:
[48559.150043] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[48559.151434] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 3474558 at 3474557, next 3474559)
[48559.670097] [drm:i915_reset] *ERROR* Failed to reset chip.

I have to reboot to get my graphics back to normal again after this.

I did a regression test and it pointed me to this commit:
c173541d9769d41a85cc899bc49699a3587df4bf is the first bad commit
commit c173541d9769d41a85cc899bc49699a3587df4bf
Author: Eric Anholt <eric@anholt.net>
Date:   Wed Apr 27 13:33:10 2011 -0700

    i965: Use state streaming on programs, and state base address on gen5+.

    There will be a little bit of thrashing of the program cache BO as the
    cache warms up, but once the application is in steady state, this
    reduces relocations on gen5 and later.

    On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6%
    +/- 1.3% (n=6).  No statistically significant performance difference
    on nexuiz (n=5).

:040000 040000 e35a232f47ae01c2e228e2778448642a9e34e112 58a3f87c3c7368c3949bc939                                                                          f457b8d9afa3a358 M      src


My system info:
Ubuntu Lucid 10.04
kernel version 2.6.38-8-generic
X.org driver: 2:2.15.0+git20110616.17bf0019-0ubuntu0sarvatt

My hardware info:
Intel X4500HD (G45)
Intel Core2 Duo P9500
4 GB RAM
Comment 1 Julius Schwartzenberg 2011-06-29 06:45:23 UTC
I forgot to mention. With glxgears this problem does not occur. I used the demo from Starry to test: http://indefini.org/starry/
Comment 2 Julius Schwartzenberg 2011-06-29 13:56:14 UTC
I still have this problem with git-d44f821.
Comment 3 Julius Schwartzenberg 2011-06-29 13:57:19 UTC
Created attachment 48569 [details]
The i915_error_state file from /sys/kernel/debug/dri/0/i915_error_state
Comment 4 Julius Schwartzenberg 2011-06-29 14:02:51 UTC
Created attachment 48570 [details]
The i915_fbc_status file from /sys/kernel/debug/dri/0/i915_fbc_status
Comment 5 Ian Romanick 2011-06-30 15:13:11 UTC
*** Bug 38666 has been marked as a duplicate of this bug. ***
Comment 6 Md Imam Hossain 2011-06-30 17:27:20 UTC
this happens with almost all programs except basic program like glxgears
Comment 7 Md Imam Hossain 2011-07-05 18:40:00 UTC
still cant use mesa from git because of this bug
Comment 8 Ian Romanick 2011-07-06 10:58:04 UTC
Reassigning to Eric since this bisected to his commit.
Comment 9 Andreas Rocznik 2011-07-08 03:12:44 UTC
I tried today and did a git rebase and removed commit:

c173541d9769d41a85cc899bc49699a3587df4bf

I also had to remove 18d4a44bdc2ed91ec9511d816acddc4a0bd7f9be, as I could not merge it and 3de9405763ad4b9e78577699ec206be7dda03374 to be able to build.

But that seems to have fixed it. I can the starry demo (just have problems with openAL) and now I can play Eve Online that always caused a GPU hang before.
Comment 10 Eric Anholt 2011-07-08 14:42:08 UTC
Reproduced a GPU hang in glean's teapot test (sigh) as run by piglit.  Working on getting the debug information updated that I want in order to look at this...
Comment 11 Eric Anholt 2011-07-09 07:53:39 UTC
commit 804995807dfea9cbdbd676e52b95d42715101913
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Jul 8 15:30:48 2011 -0700

    i965/gen4: Fix GPU hangs since the program streaming change.
    
    This was tricky.  We were doing a use-before-initialize of
    grf_reg_count, but the value usually got overwritten anyway -- when we
    didn't have to do a relocation (typical), or on gen5 when we didn't
    have relocations at all.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38771
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    (cherry picked from commit d03fdc4cdefdfdc5b59547945704c6037a5061c7)
Comment 12 Shawn Starr 2011-07-09 09:35:49 UTC
*** Bug 38901 has been marked as a duplicate of this bug. ***
Comment 13 Ivan Iakoupov 2011-07-09 10:35:20 UTC
*** Bug 38942 has been marked as a duplicate of this bug. ***