Created attachment 76678 [details] dmesg output When trying out Kerbal Space Program for linux (version 0.19 was just released with native linux support) I get GPU lockups when constructing rockets. Everything worked well at first, but after a few minutes I got the first GPU lockup and after that it seem to lock up every 5 seconds or so, so I had to kill the game. This happens every time I run the game. MSAA and vsync were not enabled. I'm running mesa from git: Mesa 9.2.0 (git-2da8ee1) on a Radeon HD 5670 and the arch linxu kernel: Linux none 3.8.3-2-ARCH #1 SMP PREEMPT Sun Mar 17 13:04:22 CET 2013 x86_64 GNU/Linux I've attached a full dmesg log. I'm running KDE and have two monitors set up. The lockups happened both with and without desktop effects enabled.
Does disabling hyperz help? Set env var: R600_DEBUG=nohyperz
Disabling hyperz does indeed fix the issue. Tried a quick spin for ~15-20 mins wit no issues now. With hyperz enabled, I get lockups after a few minutes.
I am also affected by this bug. I use Debian Linux Mint Edition (current Debian Testing, minus 10-20 days lag). Further info is below. GPU is: HD5850, 1920x1080 via DVI. I get repeatable lockups in game UrbanTerror 4.2(freeware) on maps like: ut4_horror(Horror) comming within 3 Minutes. The GPU recovers, but it makes gaming really impossible. :/ GPU was in "high" profile. I can set 20€ for this bug to be fixed in reliable and future-proof way, if within 2 Months. Payment is via PayPal. Further Info below: $ uname Linux linux 3.2.0-4-amd64 #1 SMP Debian 3.2.32-1 x86_64 GNU/Linux $glxinfo (parts) name of display: :0.0 display: :0 screen: 0 direct rendering: Yes server glx vendor string: SGI server glx version string: 1.4 server glx extensions: ..... client glx vendor string: Mesa Project and SGI client glx version string: 1.4 client glx extensions: ,,,, GLX version: 1.4 GLX extensions: .... OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD CYPRESS OpenGL version string: 2.1 Mesa 8.0.4 OpenGL shading language version string: 1.20 $dpkg-query -l |grep mesa ii libegl1-mesa:amd64 8.0.4-2 amd64 free implementation of the EGL API -- runtime ii libegl1-mesa-drivers:amd64 8.0.4-2 amd64 free implementation of the EGL API -- hardware drivers ii libgl1-mesa-dev 8.0.4-2 amd64 free implementation of the OpenGL API -- GLX development files ii libgl1-mesa-dri:amd64 8.0.4-2 amd64 free implementation of the OpenGL API -- DRI modules ii libgl1-mesa-dri:i386 8.0.4-2 i386 free implementation of the OpenGL API -- DRI modules ii libgl1-mesa-dri-experimental:amd64 8.0.4-2 amd64 free implementation of the OpenGL API -- Extra DRI modules ii libgl1-mesa-glx:amd64 8.0.4-2 amd64 free implementation of the OpenGL API -- GLX runtime ii libgl1-mesa-glx:i386 8.0.4-2 i386 free implementation of the OpenGL API -- GLX runtime ii libglapi-mesa:amd64 8.0.4-2 amd64 free implementation of the GL API -- shared library ii libglapi-mesa:i386 8.0.4-2 i386 free implementation of the GL API -- shared library ii libglu1-mesa:amd64 8.0.4-2 amd64 Mesa OpenGL utility library (GLU) ii libglu1-mesa:i386 8.0.4-2 i386 Mesa OpenGL utility library (GLU) ii libopenvg1-mesa:amd64 8.0.4-2 amd64 free implementation of the OpenVG API -- runtime ii mesa-common-dev 8.0.4-2 amd64 Developer documentation for Mesa ii mesa-utils 8.0.1-2+b3 amd64 Miscellaneous Mesa GL utilities $ dpkg-query -l |grep radeon ii libdrm-radeon1:amd64 2.4.33-3 amd64 Userspace interface to radeon-specific kernel DRM services -- runtime ii libdrm-radeon1:i386 2.4.33-3 i386 Userspace interface to radeon-specific kernel DRM services -- runtime ii radeontool 1.6.2-1.1 amd64 utility to control ATI Radeon backlight functions on laptops ii xserver-xorg-video-radeon 1:6.14.4-5 amd64 X.Org X server -- AMD/ATI Radeon display driver # sensors k10temp-pci-00c3 Adapter: PCI adapter temp1: +39.6°C (high = +70.0°C) (crit = +72.0°C, hyst = +70.0°C) radeon-pci-0100 Adapter: PCI adapter temp1: +64.0°C
This does not affect HyperZ for me, BUT the dmesg lockup message is EXACTLY the same. I also found out, that it happens at certain "angles" and "positions" (when player is in certain location and his "viewport" is directed at specific vectors) in game. I have launched the game several times to prove this and if I stay in specific position, the lockups generate constantly! $dmesg|tail -n 300: [25380.280087] radeon 0000:01:00.0: GPU lockup CP stall for more than 10008msec [25380.280099] GPU lockup (waiting for 0x003BB1CE last fence id 0x003BB1C8) [25380.281286] radeon 0000:01:00.0: GPU softreset [25380.281293] radeon 0000:01:00.0: GRBM_STATUS=0xE77309A0 [25380.281299] radeon 0000:01:00.0: GRBM_STATUS_SE0=0xFC000001 [25380.281305] radeon 0000:01:00.0: GRBM_STATUS_SE1=0xFC000001 [25380.281311] radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 [25380.291605] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B [25380.291713] radeon 0000:01:00.0: GRBM_STATUS=0x00003828 [25380.291719] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x00000007 [25380.291725] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x00000007 [25380.291731] radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 [25381.228728] radeon 0000:01:00.0: GPU reset succeed [25382.406994] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [25382.407069] radeon 0000:01:00.0: WB enabled [25382.423062] [drm] ring test succeeded in 0 usecs [25382.423070] [drm] ib test succeeded in 1 usecs [25452.156086] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [25452.156098] GPU lockup (waiting for 0x003C2295 last fence id 0x003C228F) [25452.157288] radeon 0000:01:00.0: GPU softreset [25452.157295] radeon 0000:01:00.0: GRBM_STATUS=0xE57208A0 [25452.157301] radeon 0000:01:00.0: GRBM_STATUS_SE0=0xFC000001 [25452.157307] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x88000003 [25452.157313] radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 [25452.161877] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B [25452.161984] radeon 0000:01:00.0: GRBM_STATUS=0x00003828 [25452.161990] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x00000007 [25452.161996] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x00000007 [25452.162002] radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 [25453.098283] radeon 0000:01:00.0: GPU reset succeed [25454.204960] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [25454.205036] radeon 0000:01:00.0: WB enabled [25454.221028] [drm] ring test succeeded in 0 usecs [25454.221037] [drm] ib test succeeded in 1 usecs [25464.535435] hda-intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj. [26936.708086] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [26936.708097] GPU lockup (waiting for 0x00402CE4 last fence id 0x00402CE2) [26936.709291] radeon 0000:01:00.0: GPU softreset [26936.709298] radeon 0000:01:00.0: GRBM_STATUS=0xE77308A0 [26936.709305] radeon 0000:01:00.0: GRBM_STATUS_SE0=0xFC000001 [26936.709311] radeon 0000:01:00.0: GRBM_STATUS_SE1=0xFC000001 [26936.709317] radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 [26936.724482] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B [26936.724590] radeon 0000:01:00.0: GRBM_STATUS=0x00003828 [26936.724596] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x00000007 [26936.724602] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x00000007 [26936.724608] radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 [26937.666885] radeon 0000:01:00.0: GPU reset succeed [26938.845013] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [26938.845142] radeon 0000:01:00.0: WB enabled [26938.861310] [drm] ring test succeeded in 0 usecs [26938.861328] [drm] ib test succeeded in 1 usecs [27026.040079] radeon 0000:01:00.0: GPU lockup CP stall for more than 10040msec [27026.040090] GPU lockup (waiting for 0x0040DE04 last fence id 0x0040DDFE) [27026.041280] radeon 0000:01:00.0: GPU softreset [27026.041287] radeon 0000:01:00.0: GRBM_STATUS=0xE57208A0 [27026.041294] radeon 0000:01:00.0: GRBM_STATUS_SE0=0xFC000001 [27026.041299] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x88000003 [27026.041305] radeon 0000:01:00.0: SRBM_STATUS=0x20000AC0 [27026.055106] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B [27026.055213] radeon 0000:01:00.0: GRBM_STATUS=0x00003828 [27026.055219] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x00000007 [27026.055225] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x00000007 [27026.055231] radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 [27026.995367] radeon 0000:01:00.0: GPU reset succeed [27028.168997] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [27028.169076] radeon 0000:01:00.0: WB enabled [27028.185065] [drm] ring test succeeded in 1 usecs [27028.185072] [drm] ib test succeeded in 1 usecs [27046.068108] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [27046.068119] GPU lockup (waiting for 0x0040EFEF last fence id 0x0040EFEA) [27046.069317] radeon 0000:01:00.0: GPU softreset [27046.069324] radeon 0000:01:00.0: GRBM_STATUS=0xE57208A0 [27046.069330] radeon 0000:01:00.0: GRBM_STATUS_SE0=0xFC000001 [27046.069336] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x88000003 [27046.069342] radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 [27046.070199] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B [27046.070306] radeon 0000:01:00.0: GRBM_STATUS=0x00003828 [27046.070312] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x00000007 [27046.070318] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x00000007 [27046.070324] radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 [27047.004093] radeon 0000:01:00.0: GPU reset succeed [27048.099496] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [27048.099600] radeon 0000:01:00.0: WB enabled [27048.115758] [drm] ring test succeeded in 1 usecs [27048.115774] [drm] ib test succeeded in 1 usecs [27138.696110] radeon 0000:01:00.0: GPU lockup CP stall for more than 10036msec [27138.696122] GPU lockup (waiting for 0x00414EA3 last fence id 0x00414E9E) [27138.697312] radeon 0000:01:00.0: GPU softreset [27138.697319] radeon 0000:01:00.0: GRBM_STATUS=0xF0001828 [27138.697326] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x80000003 [27138.697332] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x00000003 [27138.697338] radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 [27138.700353] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B [27138.700461] radeon 0000:01:00.0: GRBM_STATUS=0x00003828 [27138.700467] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x00000007 [27138.700473] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x00000007 [27138.700479] radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 [27139.638236] radeon 0000:01:00.0: GPU reset succeed [27140.817595] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [27140.817763] radeon 0000:01:00.0: WB enabled [27140.833959] [drm] ring test succeeded in 0 usecs [27140.833977] [drm] ib test succeeded in 1 usecs
*** Bug 63748 has been marked as a duplicate of this bug. ***
Please check if below patch fix the issue: http://people.freedesktop.org/~glisse/0001-r600g-force-full-cache-for-hyperz.patch
That patch fixes the bug! I can reliably reproduce it in a few seconds without the patch, but I have not been able to get a GPU hang after I applied the patch! I've only done minimal teseting, 5-10 min, but everything works great.
Closing pushed to master and going to push to 9.1
Hi, Gerome! Do I understand correctly, this patch is only for R600g AND its only for kernel (not mesa or DDI) ? Thanks for the fix! Unfortunately I have currently no access to the machine. When I have it, I will test the case on the sourcecode of kernel that I had - vanilla & crashing(1), as well as with your patch(2), and then if that happens in more actual kernel 3.8. (ubuntu raring)(3) and finally the kernel current with patch (4) I will test both performance with PTS Urban Terror profile, as well as stability exactly in the case I had and report back the results. That said I am pretty sure it works out, as others already confirm that, but still - I would like you to send me your paypal account data - for beers (green tea, coffee etc) money.. I know that you are employed by RedHat, its only a personal way to thank you for patching. Thank you!
Patch was against mesa, but patch is now included in mesa except in mesa 9.1 branch, i will push something shortly. If you want to make a donation make one to EFF https://www.eff.org/ Or buy me a beer if you ever bump into me.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.