Bug 36812

Summary: GPU lockup in Team Fortress 2
Product: Mesa Reporter: Enrico_m
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: jonjon.arnearne, sa, tstellar
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Enrico_m 2011-05-03 11:36:23 UTC
After updating mesa to current git I always get a GPU lockup when the game is about to start. After git bisect I got the following commit:

commit 18dcbd358f1d4fd5e4a40fa26c6d3bf99485884e
Author: Tom Stellard <tstellar@gmail.com>
Date:   Sun Mar 27 01:17:43 2011 -0700

    prog_optimize: Fix reallocating registers for shaders with loops
    
    Registers that are used inside of loops need to be considered live
    starting with the first instruction of the outermost loop.
    
    https://bugs.freedesktop.org/show_bug.cgi?id=34370
    
    NOTE: This is a candidate for the 7.9 and 7.10 branches.
    
    Reviewed-by: Eric Anholt <eric@anholt.net>


tested with kernels:
2.6.37.4
2.6.38.4
2.6.39-RC5

This is the call trace:

WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:246 0xffffffff812dc0ba()

[<ffffffff8103e2aa>] <warn_slowpath_common+7a/b0>
[<ffffffff8103e381>] <warn_slowpath_fmt+41/50>
[<ffffffff812dc0ba>] <radeon_fence_wait+35a/3c0>
[<ffffffff81059fc0>] <autoremove_wake_function+0/40>
[<ffffffff81037a41>] <get_parent_ip+11/50>
[<ffffffff812dc92c>] <radeon_sync_obj_wait+c/10>
[<ffffffff812a461f>] <ttm_bo_wait+ff/1c0>
[<ffffffff81037a41>] <get_parent_ip+11/50>
[<ffffffff812f49c1>] <radeon_gem_wait_idle_ioctl+91/110>
[<ffffffff8128f80b>] <drm_ioctl+3fb/4a0>
[<ffffffff812f4930>] <radeon_gem_wait_idle_ioctl+0/110>
[<ffffffff810386bd>] <sub_preempt_count+9d/d0>
[<ffffffff81476791>] <_raw_spin_unlock_irq+11/40>
[<ffffffff810022bd>] <do_signal+17d/7b0>
[<ffffffff8100b5ec>] <fpu_finit+1c/30>
[<ffffffff810f00cb>] <do_vfs_ioctl+9b/4f0>
[<ffffffff810386bd>] <sub_preempt_count+9d/d0>
[<ffffffff81002bcc>] <sys_rt_sigreturn+22c/240>
[<ffffffff810f056a>] <sys_ioctl+4a/80>
[<ffffffff8147747b>] <system_call_fastpath+16/1b>

glxinfo (with mesa at git head):
OpenGL renderer string: Gallium 0.4 on AMD RV770
OpenGL version string: 2.1 Mesa 7.11-devel (git-a8bbce8)
Comment 1 Enrico_m 2011-05-03 11:46:04 UTC
Added commit author (hope this is ok?).
Comment 2 Sven Arvidsson 2011-05-16 12:45:11 UTC
*** Bug 37263 has been marked as a duplicate of this bug. ***
Comment 3 Sven Arvidsson 2011-05-16 12:47:38 UTC
I'm having the same problem with Left 4 Dead on:

System environment:
-- system architecture: 32-bit
-- Linux distribution: Debian unstable
-- GPU: REDWOOD
-- Model: XFX Radeon HD 5670 1GB
-- Display connector: DVI
-- xf86-video-ati: 6.14.1
-- xserver: 1.10.1
-- mesa: git-51095f7
-- drm: 2.4.25
-- kernel: 2.6.39-rc7

Reverting the commit mentioned above works.

The apitrace uploaded here might be useful to reproduce the hang:
http://dl.dropbox.com/u/28577999/l4d.trace.7z
Comment 4 Tom Stellard 2011-05-17 21:21:18 UTC
If you run with MESA_GLSL=nopt, does it still crash?
Comment 5 Sven Arvidsson 2011-05-19 11:16:03 UTC
With nopt it's even worse, I get a GPU hang and this error on the terminal "EE r600_pipe.c:429 r600_get_param - r600: unknown param 45" when I try to start the game.
Comment 6 Enrico_m 2011-05-30 06:47:14 UTC
Here it also got worse With MESA_GLSL=nopt -> The game does not show the menu anymore and the GPU does not reset (systems frozen).

If there is anything I could do to help debug this issue (test patches or add more traces), please let me know.
Comment 7 Tom Stellard 2011-05-31 02:21:06 UTC
With MESA_GLSL=nopt the code that was changed by the bisected commit is not being executed, so I think the real problem might be somewhere else.  I guess you could try bisecting again with MESA_GLSL=nopt and maybe you'll come up with a different bad commit.
Comment 8 Sven Arvidsson 2011-06-02 10:15:55 UTC
(In reply to comment #7)
> With MESA_GLSL=nopt the code that was changed by the bisected commit is not
> being executed, so I think the real problem might be somewhere else.  I guess
> you could try bisecting again with MESA_GLSL=nopt and maybe you'll come up with
> a different bad commit.

I might try this. Enrico, do you remember what the good revision you used for the bisect was, 7.10?
Comment 9 Sven Arvidsson 2011-06-03 13:54:28 UTC
I noticed that a few piglit tests causes GPU hang/resets when run with MESA_GLSL=nopt. Is this to be expected or is it worth to file bugs?

The failing tests are glsl-fs-atan-2, glsl-fs-lots-of-tex, glsl-orangebook-ch06-bump and possibly others. They run without problems if nopt isn't used.
Comment 10 Sven Arvidsson 2011-06-11 10:35:22 UTC
There's been a few hang-related fixes in git master over the last days. I can no longer reproduce the TF2 hang, or the hangs in piglit with nopt.

Enrico, can you re-test to be sure it has been fixed?
Comment 11 Enrico_m 2011-06-13 15:10:23 UTC
Yes, this bug is fixed. No more GPU lockups. The game does not run with MESA_GLSL=nopt yet (something about "missing vertex shader" and "EE r600_shader.c:145 r600_pipe_shader_create - translation from TGSI failed !"), but that's another story. Thanks

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.