Bug 55416

Summary: [R600g] Torchlight gives GPU lockup
Product: Mesa Reporter: Laurent carlier <lordheavym>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: sobkas
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: dumped shader with R600_DUMP_SHADERS=1 /usr/local/games/Torchlight/Torchlight.bin.x86_64

Description Laurent carlier 2012-09-28 12:56:40 UTC
When i try to play the game (after character selection),i've got several GPU lockup until i kill the game:

dmesg:
--8<--
[  900.867633] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[  900.867645] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000012fb4 last fence id 0x0000000000012faa)
[  900.868726] radeon 0000:01:00.0: GPU softreset 
[  900.868732] radeon 0000:01:00.0:   GRBM_STATUS=0xE7730828
[  900.868737] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[  900.868743] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xFC000001
[  900.868748] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[  900.868756] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[  900.868863] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[  900.868868] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[  900.868873] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[  900.868878] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[  900.869884] radeon 0000:01:00.0: GPU reset succeed
[  900.878264] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[  900.878371] radeon 0000:01:00.0: WB enabled
[  900.878379] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8802218a5c00
[  900.894717] [drm] ring test on 0 succeeded in 2 usecs
[  900.894770] [drm] ib test on ring 0 succeeded in 0 usecs
[  911.968025] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[  911.968038] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000013018 last fence id 0x0000000000013014)
[  911.969149] radeon 0000:01:00.0: GPU softreset 
[  911.969155] radeon 0000:01:00.0:   GRBM_STATUS=0xE7730828
[  911.969161] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[  911.969167] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xFC000001
[  911.969172] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[  911.969183] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[  911.969291] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[  911.969297] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[  911.969302] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[  911.969307] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[  911.970313] radeon 0000:01:00.0: GPU reset succeed
[  911.978920] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[  911.979030] radeon 0000:01:00.0: WB enabled
[  911.979039] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8802218a5c00
[  911.995378] [drm] ring test on 0 succeeded in 2 usecs
[  911.995432] [drm] ib test on ring 0 succeeded in 0 usecs
-->8--

uname -a:
Linux archMain 3.5.4-1-ARCH #1 SMP PREEMPT Sat Sep 15 08:12:04 CEST 2012 x86_64 GNU/Linux

glxinfo:
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD BARTS
OpenGL version string: 2.1 Mesa 9.1-devel (git-124b214)
OpenGL shading language version string: 1.30

libdrm:
local/libdrm 2.4.39-1

I've got similar lockup with kernel 3.6rc6
Comment 1 Laurent carlier 2012-09-28 13:47:14 UTC
Here is a trace that i can reproduce the lockup:

http://pkgbuild.com/~lcarlier/trace/Torchlight.bin.x86_64.trace.tar.gz
Comment 2 Laurent carlier 2012-09-28 14:05:03 UTC
Mesa is built with:

 ./autogen.sh --prefix=/usr --sysconfdir=/etc --with-dri-driverdir=/usr/lib/xorg/modules/dri --with-gallium-drivers=r300,r600,radeonsi,nouveau,swrast,svga --with-egl-platforms=x11,wayland,drm --enable-gallium-llvm --enable-gallium-egl --enable-glx-tls --enable-glx --enable-gles1 --enable-gles2 --enable-egl --enable-r600-llvm-compiler --enable-shared-glapi --enable-texture-float --enable-xa --enable-gbm --enable-osmesa --enable-vdpau
Comment 3 Anthony Waters 2012-09-29 02:28:09 UTC
I bisected this a few days ago, the commit that causes the issue is 
c8b06dccff9cb89e20378664f3cbc202876a180f
r600g: atomize framebuffer state
Comment 4 Krzysztof A. Sobiecki 2012-09-29 09:05:46 UTC
I'm no longer able to reproduce this bug on mesa:bb7ecb29fb6358a4c65278c2fe88936c578074cd. 

Can someone confirm this? 

I only had that bug, when I was using R600_LLVM=1(R600_LLVM=0 was safe). 
There was a small visual difference between this two settings. R600_LLVM=1 caused rendering errors on main menu(not the missing face). 


./configure --prefix=/usr --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --libdir=${prefix}/lib/x86_64-linux-gnu --localstatedir=/v
ar --build=x86_64-linux-gnu --with-driver=dri --enable-r600-llvm-compiler --with-dri-drivers= --with-dri-driverdir=/usr/lib/x86_64-linux-gnu/dri --with-dri-searchpath
=/usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri --enable-glx-tls --enable-shared-glapi --enable-texture-float --enable-xa --enable-driglx-direct --with-eg
l-platforms=x11 drm --enable-gallium-llvm --with-gallium-drivers= nouveau r600 r300 svga swrast --enable-gles1 --enable-gles2 --enable-openvg --enable-gallium-egl --d
isable-glu CFLAGS=-Wall -g -O2 CXXFLAGS=-Wall -g -O2

OpenGL renderer string: Gallium 0.4 on AMD JUNIPER
OpenGL version string: 3.0 Mesa 9.1-devel
OpenGL shading language version string: 1.30

Linux solis 3.6.0-rc6+ #2 SMP Wed Sep 19 11:56:23 CEST 2012 x86_64 GNU/Linux

wheezy/sid
llvm-3.1:
  Installed: 3.1-3~exp4
  Candidate: 3.1-3~exp4


  Version table:
 *** 3.1-3~exp4 0
        501 http://ftp.pl.debian.org/debian/ experimental/main amd64 Packages
        500 /var/lib/dpkg/status
     3.1-2 0
        501 http://ftp.pl.debian.org/debian/ unstable/main amd64 Packages
     3.1-1 0
        500 http://ftp.pl.debian.org/debian/ testing/main amd64 Packages

torchlight:
  Installed: 1.0.20120924-1
  Candidate: 1.0.20120924-1
  Version table:
 *** 1.0.20120924-1 0
        500 /var/lib/dpkg/status
Comment 5 Laurent carlier 2012-09-29 17:50:10 UTC
I can confirm, no lockups with R600_LLVM=0
Comment 6 Krzysztof A. Sobiecki 2012-09-29 22:03:23 UTC
Around time, when I started using mesa:bb7ecb29fb6358a4c65278c2fe88936c578074cd, R600_LLVM=1 env var stopped causing GPU hang problems. I don't know if there is a link between this two facts, but for now I'm not willing to experiment with this(last time, when I have tried, computer crashed completely, not even pings and a package database have gone fishing).
Comment 7 Krzysztof A. Sobiecki 2012-09-30 08:15:57 UTC
Sorry, I was wrong. Hang still occurs, but is less frequent.
Comment 8 Laurent carlier 2012-10-01 15:51:54 UTC
Created attachment 67933 [details]
dumped shader with R600_DUMP_SHADERS=1 /usr/local/games/Torchlight/Torchlight.bin.x86_64
Comment 9 Anthony Waters 2012-10-03 02:42:02 UTC
I take back what I said in comment 3, for me the lockup is cased by the same thing in bug 53111 (virtual address space active on cayman).
Comment 10 Laurent carlier 2012-12-11 05:50:54 UTC
Seems to be fixed since http://cgit.freedesktop.org/mesa/mesa/commit/?id=ffe1794e0c7efc46e7a5056ac222dd081cae4020 , so closing

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.