Bug 62466 - r600g hyperz lockups with KSP 0.19
r600g hyperz lockups with KSP 0.19
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600
git
x86-64 (AMD64) Linux (All)
: high major
Assigned To: Default DRI bug account
:
: 63748 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-18 11:33 UTC by Knut Andre Tidemann
Modified: 2014-04-10 00:45 UTC (History)
1 user (show)

See Also:


Attachments
dmesg output (64.06 KB, text/plain)
2013-03-18 11:33 UTC, Knut Andre Tidemann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Knut Andre Tidemann 2013-03-18 11:33:23 UTC
Created attachment 76678 [details]
dmesg output

When trying out Kerbal Space Program for linux (version 0.19 was just released with native linux support) I get GPU lockups when constructing rockets.

Everything worked well at first, but after a few minutes I got the first GPU lockup and after that it seem to lock up every 5 seconds or so, so I had to kill the game.

This happens every time I run the game. MSAA and vsync were not enabled.

I'm running mesa from git: 
Mesa 9.2.0 (git-2da8ee1) on a Radeon HD 5670

and the arch linxu kernel: 
Linux none 3.8.3-2-ARCH #1 SMP PREEMPT Sun Mar 17 13:04:22 CET 2013 x86_64 GNU/Linux

I've attached a full dmesg log.

I'm running KDE and have two monitors set up. The lockups happened both with and without desktop effects enabled.
Comment 1 Alex Deucher 2013-03-18 12:57:55 UTC
Does disabling hyperz help?  Set env var:
R600_DEBUG=nohyperz
Comment 2 Knut Andre Tidemann 2013-03-18 13:19:09 UTC
Disabling hyperz does indeed fix the issue. Tried a quick spin for ~15-20 mins wit no issues now.

With hyperz enabled, I get lockups after a few minutes.
Comment 3 lct 2013-04-22 13:03:18 UTC
I am also affected by this bug.

I use Debian Linux Mint Edition (current Debian Testing, minus 10-20 days lag).
Further info is below.

GPU is: HD5850, 1920x1080 via DVI.
I get repeatable lockups in game UrbanTerror 4.2(freeware) on maps like: ut4_horror(Horror) comming within 3 Minutes.

The GPU recovers, but it makes gaming really impossible. :/
GPU was in "high" profile.

I can set 20€ for this bug to be fixed in reliable and future-proof way, if within 2 Months. Payment is via PayPal.



Further Info below:

$ uname
Linux linux 3.2.0-4-amd64 #1 SMP Debian 3.2.32-1 x86_64 GNU/Linux

$glxinfo (parts)
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
server glx extensions:
.....
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
client glx extensions:
,,,,
GLX version: 1.4
GLX extensions:
....
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD CYPRESS
OpenGL version string: 2.1 Mesa 8.0.4
OpenGL shading language version string: 1.20

$dpkg-query -l |grep mesa
ii  libegl1-mesa:amd64                    8.0.4-2                              amd64        free implementation of the EGL API -- runtime
ii  libegl1-mesa-drivers:amd64            8.0.4-2                              amd64        free implementation of the EGL API -- hardware drivers
ii  libgl1-mesa-dev                       8.0.4-2                              amd64        free implementation of the OpenGL API -- GLX development files
ii  libgl1-mesa-dri:amd64                 8.0.4-2                              amd64        free implementation of the OpenGL API -- DRI modules
ii  libgl1-mesa-dri:i386                  8.0.4-2                              i386         free implementation of the OpenGL API -- DRI modules
ii  libgl1-mesa-dri-experimental:amd64    8.0.4-2                              amd64        free implementation of the OpenGL API -- Extra DRI modules
ii  libgl1-mesa-glx:amd64                 8.0.4-2                              amd64        free implementation of the OpenGL API -- GLX runtime
ii  libgl1-mesa-glx:i386                  8.0.4-2                              i386         free implementation of the OpenGL API -- GLX runtime
ii  libglapi-mesa:amd64                   8.0.4-2                              amd64        free implementation of the GL API -- shared library
ii  libglapi-mesa:i386                    8.0.4-2                              i386         free implementation of the GL API -- shared library
ii  libglu1-mesa:amd64                    8.0.4-2                              amd64        Mesa OpenGL utility library (GLU)
ii  libglu1-mesa:i386                     8.0.4-2                              i386         Mesa OpenGL utility library (GLU)
ii  libopenvg1-mesa:amd64                 8.0.4-2                              amd64        free implementation of the OpenVG API -- runtime
ii  mesa-common-dev                       8.0.4-2                              amd64        Developer documentation for Mesa
ii  mesa-utils                            8.0.1-2+b3                           amd64        Miscellaneous Mesa GL utilities

$ dpkg-query -l |grep radeon
ii  libdrm-radeon1:amd64                  2.4.33-3                             amd64        Userspace interface to radeon-specific kernel DRM services -- runtime
ii  libdrm-radeon1:i386                   2.4.33-3                             i386         Userspace interface to radeon-specific kernel DRM services -- runtime
ii  radeontool                            1.6.2-1.1                            amd64        utility to control ATI Radeon backlight functions on laptops
ii  xserver-xorg-video-radeon             1:6.14.4-5                           amd64        X.Org X server -- AMD/ATI Radeon display driver

# sensors
k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +39.6°C  (high = +70.0°C)
                       (crit = +72.0°C, hyst = +70.0°C)
radeon-pci-0100
Adapter: PCI adapter
temp1:        +64.0°C
Comment 4 lct 2013-04-22 13:20:40 UTC
This does not affect HyperZ for me, BUT the dmesg lockup message is EXACTLY the same.

I also found out, that it happens at certain "angles" and "positions" (when player is in certain location and his "viewport" is directed at specific vectors) in game.

I have launched the game several times to prove this and if I stay in specific position, the lockups generate constantly!

$dmesg|tail -n 300:
[25380.280087] radeon 0000:01:00.0: GPU lockup CP stall for more than 10008msec
[25380.280099] GPU lockup (waiting for 0x003BB1CE last fence id 0x003BB1C8)
[25380.281286] radeon 0000:01:00.0: GPU softreset 
[25380.281293] radeon 0000:01:00.0:   GRBM_STATUS=0xE77309A0
[25380.281299] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[25380.281305] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xFC000001
[25380.281311] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[25380.291605] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[25380.291713] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[25380.291719] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[25380.291725] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[25380.291731] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[25381.228728] radeon 0000:01:00.0: GPU reset succeed
[25382.406994] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[25382.407069] radeon 0000:01:00.0: WB enabled
[25382.423062] [drm] ring test succeeded in 0 usecs
[25382.423070] [drm] ib test succeeded in 1 usecs
[25452.156086] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[25452.156098] GPU lockup (waiting for 0x003C2295 last fence id 0x003C228F)
[25452.157288] radeon 0000:01:00.0: GPU softreset 
[25452.157295] radeon 0000:01:00.0:   GRBM_STATUS=0xE57208A0
[25452.157301] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[25452.157307] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x88000003
[25452.157313] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[25452.161877] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[25452.161984] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[25452.161990] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[25452.161996] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[25452.162002] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[25453.098283] radeon 0000:01:00.0: GPU reset succeed
[25454.204960] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[25454.205036] radeon 0000:01:00.0: WB enabled
[25454.221028] [drm] ring test succeeded in 0 usecs
[25454.221037] [drm] ib test succeeded in 1 usecs
[25464.535435] hda-intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
[26936.708086] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[26936.708097] GPU lockup (waiting for 0x00402CE4 last fence id 0x00402CE2)
[26936.709291] radeon 0000:01:00.0: GPU softreset 
[26936.709298] radeon 0000:01:00.0:   GRBM_STATUS=0xE77308A0
[26936.709305] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[26936.709311] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xFC000001
[26936.709317] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[26936.724482] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[26936.724590] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[26936.724596] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[26936.724602] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[26936.724608] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[26937.666885] radeon 0000:01:00.0: GPU reset succeed
[26938.845013] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[26938.845142] radeon 0000:01:00.0: WB enabled
[26938.861310] [drm] ring test succeeded in 0 usecs
[26938.861328] [drm] ib test succeeded in 1 usecs
[27026.040079] radeon 0000:01:00.0: GPU lockup CP stall for more than 10040msec
[27026.040090] GPU lockup (waiting for 0x0040DE04 last fence id 0x0040DDFE)
[27026.041280] radeon 0000:01:00.0: GPU softreset 
[27026.041287] radeon 0000:01:00.0:   GRBM_STATUS=0xE57208A0
[27026.041294] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[27026.041299] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x88000003
[27026.041305] radeon 0000:01:00.0:   SRBM_STATUS=0x20000AC0
[27026.055106] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[27026.055213] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[27026.055219] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[27026.055225] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[27026.055231] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[27026.995367] radeon 0000:01:00.0: GPU reset succeed
[27028.168997] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[27028.169076] radeon 0000:01:00.0: WB enabled
[27028.185065] [drm] ring test succeeded in 1 usecs
[27028.185072] [drm] ib test succeeded in 1 usecs
[27046.068108] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[27046.068119] GPU lockup (waiting for 0x0040EFEF last fence id 0x0040EFEA)
[27046.069317] radeon 0000:01:00.0: GPU softreset 
[27046.069324] radeon 0000:01:00.0:   GRBM_STATUS=0xE57208A0
[27046.069330] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[27046.069336] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x88000003
[27046.069342] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[27046.070199] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[27046.070306] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[27046.070312] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[27046.070318] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[27046.070324] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[27047.004093] radeon 0000:01:00.0: GPU reset succeed
[27048.099496] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[27048.099600] radeon 0000:01:00.0: WB enabled
[27048.115758] [drm] ring test succeeded in 1 usecs
[27048.115774] [drm] ib test succeeded in 1 usecs
[27138.696110] radeon 0000:01:00.0: GPU lockup CP stall for more than 10036msec
[27138.696122] GPU lockup (waiting for 0x00414EA3 last fence id 0x00414E9E)
[27138.697312] radeon 0000:01:00.0: GPU softreset 
[27138.697319] radeon 0000:01:00.0:   GRBM_STATUS=0xF0001828
[27138.697326] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x80000003
[27138.697332] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000003
[27138.697338] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[27138.700353] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[27138.700461] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[27138.700467] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[27138.700473] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[27138.700479] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[27139.638236] radeon 0000:01:00.0: GPU reset succeed
[27140.817595] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[27140.817763] radeon 0000:01:00.0: WB enabled
[27140.833959] [drm] ring test succeeded in 0 usecs
[27140.833977] [drm] ib test succeeded in 1 usecs
Comment 5 Alex Deucher 2013-04-22 13:39:30 UTC
*** Bug 63748 has been marked as a duplicate of this bug. ***
Comment 6 Jerome Glisse 2013-04-24 19:23:51 UTC
Please check if below patch fix the issue:

http://people.freedesktop.org/~glisse/0001-r600g-force-full-cache-for-hyperz.patch
Comment 7 Knut Andre Tidemann 2013-04-25 09:18:29 UTC
That patch fixes the bug! I can reliably reproduce it in a few seconds without the patch, but I have not been able to get a GPU hang after I applied the patch!

I've only done minimal teseting, 5-10 min, but everything works great.
Comment 8 Jerome Glisse 2013-05-06 14:47:29 UTC
Closing pushed to master and going to push to 9.1
Comment 9 lct 2013-05-06 18:23:47 UTC
Hi, Gerome! 

Do I understand correctly, this patch is only for R600g AND its only for kernel (not mesa or DDI) ?

Thanks for the fix! Unfortunately I have currently no access to the machine. 

When I have it, I will test the case on the sourcecode of kernel that I had - vanilla & crashing(1), 
as well as with your patch(2), 

and then if that happens in more actual kernel 3.8. (ubuntu raring)(3)
and finally the kernel current with patch (4)

I will test both performance with PTS Urban Terror profile, as well as stability exactly in the case I had and report back the results.


That said I am pretty sure it works out, as others already confirm that, but still 
- I would like you to send me your paypal account data -
for beers (green tea, coffee etc) money.. I know that you are employed by RedHat, its only a personal way to thank you for patching.

Thank you!
Comment 10 Jerome Glisse 2013-05-07 20:22:14 UTC
Patch was against mesa, but patch is now included in mesa except in mesa 9.1 branch, i will push something shortly.

If you want to make a donation make one to EFF https://www.eff.org/

Or buy me a beer if you ever bump into me.