62466 – r600g hyperz lockups with KSP 0.19

Bug 62466 - r600g hyperz lockups with KSP 0.19

Summary: r600g hyperz lockups with KSP 0.19

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/Gallium/r600 (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	high major
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Duplicates (1):	63748 (view as bug list)
Depends on:
Blocks:

Reported:	2013-03-18 11:33 UTC by Knut Andre Tidemann
Modified:	2014-04-10 00:45 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg output (64.06 KB, text/plain) 2013-03-18 11:33 UTC, Knut Andre Tidemann	Details
View All

Description Knut Andre Tidemann 2013-03-18 11:33:23 UTC

Created attachment 76678 [details]
dmesg output

When trying out Kerbal Space Program for linux (version 0.19 was just released with native linux support) I get GPU lockups when constructing rockets.

Everything worked well at first, but after a few minutes I got the first GPU lockup and after that it seem to lock up every 5 seconds or so, so I had to kill the game.

This happens every time I run the game. MSAA and vsync were not enabled.

I'm running mesa from git: 
Mesa 9.2.0 (git-2da8ee1) on a Radeon HD 5670

and the arch linxu kernel: 
Linux none 3.8.3-2-ARCH #1 SMP PREEMPT Sun Mar 17 13:04:22 CET 2013 x86_64 GNU/Linux

I've attached a full dmesg log.

I'm running KDE and have two monitors set up. The lockups happened both with and without desktop effects enabled.

Comment 1 Alex Deucher 2013-03-18 12:57:55 UTC

Does disabling hyperz help?  Set env var:
R600_DEBUG=nohyperz

Comment 2 Knut Andre Tidemann 2013-03-18 13:19:09 UTC

Disabling hyperz does indeed fix the issue. Tried a quick spin for ~15-20 mins wit no issues now.

With hyperz enabled, I get lockups after a few minutes.

Comment 3 inbox-3VOAHXJJYNDO 2013-04-22 13:03:18 UTC

I am also affected by this bug.

I use Debian Linux Mint Edition (current Debian Testing, minus 10-20 days lag).
Further info is below.

GPU is: HD5850, 1920x1080 via DVI.
I get repeatable lockups in game UrbanTerror 4.2(freeware) on maps like: ut4_horror(Horror) comming within 3 Minutes.

The GPU recovers, but it makes gaming really impossible. :/
GPU was in "high" profile.

I can set 20€ for this bug to be fixed in reliable and future-proof way, if within 2 Months. Payment is via PayPal.



Further Info below:

$ uname
Linux linux 3.2.0-4-amd64 #1 SMP Debian 3.2.32-1 x86_64 GNU/Linux

$glxinfo (parts)
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
server glx extensions:
.....
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
client glx extensions:
,,,,
GLX version: 1.4
GLX extensions:
....
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD CYPRESS
OpenGL version string: 2.1 Mesa 8.0.4
OpenGL shading language version string: 1.20

$dpkg-query -l |grep mesa
ii  libegl1-mesa:amd64                    8.0.4-2                              amd64        free implementation of the EGL API -- runtime
ii  libegl1-mesa-drivers:amd64            8.0.4-2                              amd64        free implementation of the EGL API -- hardware drivers
ii  libgl1-mesa-dev                       8.0.4-2                              amd64        free implementation of the OpenGL API -- GLX development files
ii  libgl1-mesa-dri:amd64                 8.0.4-2                              amd64        free implementation of the OpenGL API -- DRI modules
ii  libgl1-mesa-dri:i386                  8.0.4-2                              i386         free implementation of the OpenGL API -- DRI modules
ii  libgl1-mesa-dri-experimental:amd64    8.0.4-2                              amd64        free implementation of the OpenGL API -- Extra DRI modules
ii  libgl1-mesa-glx:amd64                 8.0.4-2                              amd64        free implementation of the OpenGL API -- GLX runtime
ii  libgl1-mesa-glx:i386                  8.0.4-2                              i386         free implementation of the OpenGL API -- GLX runtime
ii  libglapi-mesa:amd64                   8.0.4-2                              amd64        free implementation of the GL API -- shared library
ii  libglapi-mesa:i386                    8.0.4-2                              i386         free implementation of the GL API -- shared library
ii  libglu1-mesa:amd64                    8.0.4-2                              amd64        Mesa OpenGL utility library (GLU)
ii  libglu1-mesa:i386                     8.0.4-2                              i386         Mesa OpenGL utility library (GLU)
ii  libopenvg1-mesa:amd64                 8.0.4-2                              amd64        free implementation of the OpenVG API -- runtime
ii  mesa-common-dev                       8.0.4-2                              amd64        Developer documentation for Mesa
ii  mesa-utils                            8.0.1-2+b3                           amd64        Miscellaneous Mesa GL utilities

$ dpkg-query -l |grep radeon
ii  libdrm-radeon1:amd64                  2.4.33-3                             amd64        Userspace interface to radeon-specific kernel DRM services -- runtime
ii  libdrm-radeon1:i386                   2.4.33-3                             i386         Userspace interface to radeon-specific kernel DRM services -- runtime
ii  radeontool                            1.6.2-1.1                            amd64        utility to control ATI Radeon backlight functions on laptops
ii  xserver-xorg-video-radeon             1:6.14.4-5                           amd64        X.Org X server -- AMD/ATI Radeon display driver

# sensors
k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +39.6°C  (high = +70.0°C)
                       (crit = +72.0°C, hyst = +70.0°C)
radeon-pci-0100
Adapter: PCI adapter
temp1:        +64.0°C

Comment 4 inbox-3VOAHXJJYNDO 2013-04-22 13:20:40 UTC

This does not affect HyperZ for me, BUT the dmesg lockup message is EXACTLY the same.

I also found out, that it happens at certain "angles" and "positions" (when player is in certain location and his "viewport" is directed at specific vectors) in game.

I have launched the game several times to prove this and if I stay in specific position, the lockups generate constantly!

$dmesg|tail -n 300:
[25380.280087] radeon 0000:01:00.0: GPU lockup CP stall for more than 10008msec
[25380.280099] GPU lockup (waiting for 0x003BB1CE last fence id 0x003BB1C8)
[25380.281286] radeon 0000:01:00.0: GPU softreset 
[25380.281293] radeon 0000:01:00.0:   GRBM_STATUS=0xE77309A0
[25380.281299] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[25380.281305] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xFC000001
[25380.281311] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[25380.291605] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[25380.291713] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[25380.291719] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[25380.291725] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[25380.291731] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[25381.228728] radeon 0000:01:00.0: GPU reset succeed
[25382.406994] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[25382.407069] radeon 0000:01:00.0: WB enabled
[25382.423062] [drm] ring test succeeded in 0 usecs
[25382.423070] [drm] ib test succeeded in 1 usecs
[25452.156086] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[25452.156098] GPU lockup (waiting for 0x003C2295 last fence id 0x003C228F)
[25452.157288] radeon 0000:01:00.0: GPU softreset 
[25452.157295] radeon 0000:01:00.0:   GRBM_STATUS=0xE57208A0
[25452.157301] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[25452.157307] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x88000003
[25452.157313] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[25452.161877] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[25452.161984] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[25452.161990] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[25452.161996] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[25452.162002] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[25453.098283] radeon 0000:01:00.0: GPU reset succeed
[25454.204960] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[25454.205036] radeon 0000:01:00.0: WB enabled
[25454.221028] [drm] ring test succeeded in 0 usecs
[25454.221037] [drm] ib test succeeded in 1 usecs
[25464.535435] hda-intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
[26936.708086] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[26936.708097] GPU lockup (waiting for 0x00402CE4 last fence id 0x00402CE2)
[26936.709291] radeon 0000:01:00.0: GPU softreset 
[26936.709298] radeon 0000:01:00.0:   GRBM_STATUS=0xE77308A0
[26936.709305] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[26936.709311] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xFC000001
[26936.709317] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[26936.724482] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[26936.724590] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[26936.724596] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[26936.724602] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[26936.724608] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[26937.666885] radeon 0000:01:00.0: GPU reset succeed
[26938.845013] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[26938.845142] radeon 0000:01:00.0: WB enabled
[26938.861310] [drm] ring test succeeded in 0 usecs
[26938.861328] [drm] ib test succeeded in 1 usecs
[27026.040079] radeon 0000:01:00.0: GPU lockup CP stall for more than 10040msec
[27026.040090] GPU lockup (waiting for 0x0040DE04 last fence id 0x0040DDFE)
[27026.041280] radeon 0000:01:00.0: GPU softreset 
[27026.041287] radeon 0000:01:00.0:   GRBM_STATUS=0xE57208A0
[27026.041294] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[27026.041299] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x88000003
[27026.041305] radeon 0000:01:00.0:   SRBM_STATUS=0x20000AC0
[27026.055106] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[27026.055213] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[27026.055219] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[27026.055225] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[27026.055231] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[27026.995367] radeon 0000:01:00.0: GPU reset succeed
[27028.168997] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[27028.169076] radeon 0000:01:00.0: WB enabled
[27028.185065] [drm] ring test succeeded in 1 usecs
[27028.185072] [drm] ib test succeeded in 1 usecs
[27046.068108] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[27046.068119] GPU lockup (waiting for 0x0040EFEF last fence id 0x0040EFEA)
[27046.069317] radeon 0000:01:00.0: GPU softreset 
[27046.069324] radeon 0000:01:00.0:   GRBM_STATUS=0xE57208A0
[27046.069330] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xFC000001
[27046.069336] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x88000003
[27046.069342] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[27046.070199] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[27046.070306] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[27046.070312] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[27046.070318] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[27046.070324] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[27047.004093] radeon 0000:01:00.0: GPU reset succeed
[27048.099496] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[27048.099600] radeon 0000:01:00.0: WB enabled
[27048.115758] [drm] ring test succeeded in 1 usecs
[27048.115774] [drm] ib test succeeded in 1 usecs
[27138.696110] radeon 0000:01:00.0: GPU lockup CP stall for more than 10036msec
[27138.696122] GPU lockup (waiting for 0x00414EA3 last fence id 0x00414E9E)
[27138.697312] radeon 0000:01:00.0: GPU softreset 
[27138.697319] radeon 0000:01:00.0:   GRBM_STATUS=0xF0001828
[27138.697326] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x80000003
[27138.697332] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000003
[27138.697338] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[27138.700353] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[27138.700461] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[27138.700467] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[27138.700473] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[27138.700479] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[27139.638236] radeon 0000:01:00.0: GPU reset succeed
[27140.817595] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[27140.817763] radeon 0000:01:00.0: WB enabled
[27140.833959] [drm] ring test succeeded in 0 usecs
[27140.833977] [drm] ib test succeeded in 1 usecs

Comment 5 Alex Deucher 2013-04-22 13:39:30 UTC

*** Bug 63748 has been marked as a duplicate of this bug. ***

Comment 6 Jerome Glisse 2013-04-24 19:23:51 UTC

Please check if below patch fix the issue:

http://people.freedesktop.org/~glisse/0001-r600g-force-full-cache-for-hyperz.patch

Comment 7 Knut Andre Tidemann 2013-04-25 09:18:29 UTC

That patch fixes the bug! I can reliably reproduce it in a few seconds without the patch, but I have not been able to get a GPU hang after I applied the patch!

I've only done minimal teseting, 5-10 min, but everything works great.

Comment 8 Jerome Glisse 2013-05-06 14:47:29 UTC

Closing pushed to master and going to push to 9.1

Comment 9 inbox-3VOAHXJJYNDO 2013-05-06 18:23:47 UTC

Hi, Gerome! 

Do I understand correctly, this patch is only for R600g AND its only for kernel (not mesa or DDI) ?

Thanks for the fix! Unfortunately I have currently no access to the machine. 

When I have it, I will test the case on the sourcecode of kernel that I had - vanilla & crashing(1), 
as well as with your patch(2), 

and then if that happens in more actual kernel 3.8. (ubuntu raring)(3)
and finally the kernel current with patch (4)

I will test both performance with PTS Urban Terror profile, as well as stability exactly in the case I had and report back the results.


That said I am pretty sure it works out, as others already confirm that, but still 
- I would like you to send me your paypal account data -
for beers (green tea, coffee etc) money.. I know that you are employed by RedHat, its only a personal way to thank you for patching.

Thank you!

Comment 10 Jerome Glisse 2013-05-07 20:22:14 UTC

Patch was against mesa, but patch is now included in mesa except in mesa 9.1 branch, i will push something shortly.

If you want to make a donation make one to EFF https://www.eff.org/

Or buy me a beer if you ever bump into me.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.