Created attachment 120427 [details] syslog (Linux 4.3, mesa git, radeon hawaii r9 390X) Hi I (re) tried Natural Selection 2 yesterday, and I it crashed (as expected), and this time the syslog said: ``` Dec 8 07:20:25 gollum kernel: [16588.241364] radeon 0000:01:00.0: GPU fault detected: 146 0x0e65e014 Dec 8 07:20:25 gollum kernel: [16588.241368] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00007673 Dec 8 07:20:25 gollum kernel: [16588.241369] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x050E0014 Dec 8 07:20:25 gollum kernel: [16588.241370] VM fault (0x04, vmid 2) at page 30323, write from 'CB1' (0x43423100) (224) ---8<--- 120 000+ lines --------------------- Dec 8 07:22:58 gollum kernel: [16741.758495] radeon 0000:01:00.0: GPU fault detected: 146 0x06a59014 Dec 8 07:22:58 gollum kernel: [16741.758506] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00007673 Dec 8 07:22:58 gollum kernel: [16741.758512] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05150014 Dec 8 07:22:58 gollum kernel: [16741.758518] VM fault (0x04, vmid 2) at page 30323, write from 'CB6' (0x43423600) (336) Dec 8 07:23:03 gollum kernel: [16746.694729] ns2_linux32[16192]: segfault at 1 ip 00000000f3bf8d12 sp 00000000f0b9adf8 error 4 in libc-2.21.so[f3acc000+1b4000] ``` Well, I'm not expecting to see this game working one day, I own this game since 2013-12-15 (it's almost two years) and it never worked, so these guys proved their product is a joke and I don't expect nothing from them. But perhaps this bug can help you to track things on your side, that's why I report it. When the error happened, NS2 was loading assets since 20min to load a tutorial map. I don't have more logs currently but if you have some ideas of things I can record, I can try to reproduce it. I agree the current log is a bit quiet. I've already seen radeon errors like that before in a dpm issue (see #93288 ) but the current bug is unrelated to dpm (dpm was inactive in the current case). I use a Radeon R9 390X with Mesa git on Linux 4.3 here.
Hi Thomas! The VM faults are in backend writes. Can you try running with R600_DEBUG=nohyperz and/or nodcc?
Note that the game crash probably isn't directly related to the GPUVM faults. We'd need to see backtrace of the crash (preferably from gdb) to say if the crash is due to a bug in the game or Mesa.
(In reply to Michel Dänzer from comment #2) > Note that the game crash probably isn't directly related to the GPUVM > faults. Perhaps, but they were spamed right before the crash and stopped with it. > We'd need to see backtrace of the crash (preferably from gdb) to say > if the crash is due to a bug in the game or Mesa. I will try that. (In reply to Nicolai Hähnle from comment #1) > Hi Thomas! The VM faults are in backend writes. Can you try running with > R600_DEBUG=nohyperz and/or nodcc? Right now I started it with R600_DEBUG=nohyperz and I get the output I’ll join. On stdout side I got a 'radeon: mmap failed, errno: 12' error and on the syslog side I got a standard segfault message. There was no other message in syslog.
Created attachment 120429 [details] ns2 stdout (nohyperz)
Good to know, using R600_DEBUG=nodcc the game loaded fine. There was many 'radeon: mmap failed, errno: 1' lines printed but the games does not crashed on graphical rendering (it crashed while playing a weapon sound). I was able to load a map and move in it, so that nodcc option workarounds the graphical rendering issue.
Created attachment 120430 [details] ns2 stdout (nodcc)
Created attachment 120431 [details] ns2 stdout (default)
Created attachment 120432 [details] ns2 stdout + gdb backtrace
Created attachment 120433 [details] ns2 stdout + gdb full backtrace + libdrm-radeon1-dbg symbols
So, I added a gdb backtrace, I guess the interesting part is: ---8<------------- Thread 31 (Thread 0xe04e8b40 (LWP 26311)): #0 0xf7fd9be5 in __kernel_vsyscall () No symbol table info available. #1 0xf478025c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/i386-linux-gnu/libpthread.so.0 No symbol table info available. #2 0xe2e9cabc in radeon_drm_cs_emit_ioctl (param=0x8386300) at ../../../../../../../include/c11/threads_posix.h:159 ws = 0x8386300 cs = <optimized out> i = <optimized out> #3 0xe2e9c117 in impl_thrd_routine (p=0x856a708) at ../../../../../../../include/c11/threads_posix.h:87 pack = {func = 0xe2e9ca60 <radeon_drm_cs_emit_ioctl>, arg = 0x8386300} #4 0xf477b1aa in start_thread () from /lib/i386-linux-gnu/libpthread.so.0 No symbol table info available. #5 0xf44b5fde in clone () from /lib/i386-linux-gnu/libc.so.6 ------------->8---
Thanks for the followup. The last crash you reported
Thank you for the followup. Concerning your last backtrace: The crash is in a different thread than the one you mentioned, but the symbols in the backtrace suggest that it happens while updating a buffer, so the crash is likely a consequence of the failed mmap and the game not checking whether the map succeeded. I'm noticing a lot of other error messages in stdout, though, such as the "Failed to launch child process" and something about allocation failures that is unrelated to Mesa. Perhaps the game's virtual address space is simply getting full? It does call itself a 32 bit program. I'm a bit surprised in hindsight that nodcc had an effect. Now that I think about it, Hawaii doesn't actually have DCC, does it?
(In reply to Nicolai Hähnle from comment #12) > I'm a bit surprised in hindsight that nodcc had an effect. Now that I think > about it, Hawaii doesn't actually have DCC, does it? Right. Hawaii (CI) does not have DCC, only VI parts do.
No, Hawaii doesn't have DCC. The issue can indeed be that there is not enough address space. Mesa doesn't unmap most buffers, because mmap is a very costly operation. Such buffers are only unmapped when destroyed. This is obviously a problem on 32-bit. Possible solutions: 1) Always unmap buffers at appropriate places. This can have a significant performance cost in CPU-bound apps. 2) Clear pb_cache on mmap failure and try mmap again. (pb_cache typically contains staging buffers, which are always mapped. Therefore, pb_cache completely hides allocation, GPU mmap, and CPU mmap overhead, but it also contributes to address space usage.)
Created attachment 120440 [details] [review] 0001-XXX-winsys-radeon-limit-buffer-cache-size.patch Following the hypothesis that your game is running out of address space, here is a quick patch for Mesa that limits the buffer cache size to 4MB as a simple test. Try this (feel free to play around with the number in the patch), plus reducing any graphics quality options the game may have to the minimum. Don't be surprised if performance becomes worse. Please also set the environment variable R600_DEBUG=info and provide the output, as this allows us to see the default buffer cache size.
OK. Thanks, I will try all these things. I just have one question, since this software is running in 32 bit mode, does it mean I have to recompile mesa in 32bit too ?
(In reply to Thomas DEBESSE from comment #16) > OK. Thanks, I will try all these things. I just have one question, since > this software is running in 32 bit mode, does it mean I have to recompile > mesa in 32bit too ? You only have to recompile Mesa 32-bit, but not 64-bit.
OK, I patched and recompiled mesa, but it changes nothing (see incoming log). Setting R600_DEBUG=info I get: ---8<------------- radeon_drm_winsys_create: create cache of 4194304 bytes pci_id = 0x67b0 family = 57 chip_class = 9 gart_size = 2043 MB vram_size = 8192 MB max_sclk = 1080 max_compute_units = 44 max_se = 4 max_sh_per_se = 1 drm = 2.43.0 has_uvd = 1 vce_fw_version = 671220224 r600_num_backends = 16 r600_clock_crystal_freq = 27000 r600_tiling_config = 0x1023 r600_num_tile_pipes = 16 r600_max_pipes = 11 r600_virtual_address = 1 r600_has_dma = 1 r600_backend_map = 0 r600_backend_map_valid = 1 si_tile_mode_array_valid = 1 cik_macrotile_mode_array_valid = 1 Num displays: 1 ------------->8---
Created attachment 120443 [details] ns2 stdout + gdb full backtrace + libdrm-radeon1-dbg symbols / patched mesa + R600_DEBUG=info
(In reply to Thomas DEBESSE from comment #19) > Created attachment 120443 [details] > ns2 stdout + gdb full backtrace + libdrm-radeon1-dbg symbols / patched mesa > + R600_DEBUG=info This looks like the mmap error has been fixed.
Indeed. You'll also find that thread 1 commits suicide because TCMalloc cannot find more memory. So it's definitely still an out of memory condition, but without more evidence to the contrary I'm inclined to say it's probably not the OpenGL driver's fault.
Yeah, it could be a memory leak of some kind. I recommend running the game with this environment variable: GALLIUM_HUD=VRAM-usage,GTT-usage You should see if it's leaking GPU buffers. You can also try running it with valgrind, but exit the game before it crashes.
Created attachment 128951 [details] GTT and VRAM usage after 30 min play Here is a screenshot of GTT and VRAM usage after 30min of play. From these graphs alone I couldn't make out any memory leaks. However the game crashed soon afterwards on an new map. System Info: Card: Advanced Micro Devices [AMD/ATI] Bonaire XTX [Radeon R7 260X] glxinfo | grep OpenGL OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD BONAIRE (DRM 2.46.0 / 4.8.6-1-CHAKRA, LLVM 3.9.1) OpenGL core profile version string: 4.3 (Core Profile) Mesa 13.0.2 OpenGL core profile shading language version string: 4.30 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 13.0.2 OpenGL shading language version string: 1.30 OpenGL context flags: (none)
Created attachment 128953 [details] GTT and VRAM on crash Here is another screenshot, taken right on a crash.
Please note that more people report crashing on radeonsi after some time of playing: http://forums.unknownworlds.com/discussion/comment/2330666/#Comment_2330666 This is with "Failed to allocate" errors. For me personally, NS2 didn't even start on older Mesa (mmap failed), now it starts, but as people report, seems to crash after some time. I personally never played that long (because loading the shaders takes an eternity). I'd guess this is simply the game running out of 32bit space, and Mesa drivers being more memory hungry than proprietary drivers. The devs had OOM problems even with nvidia drivers in the past and had to cut down on memory usage. They said they can't even add more skins because they're that close to the RAM limit. They are working on a 64bit version of the game, which should hopefully fix all this.
(In reply to Kamil Páral from comment #25) > Please note that more people report crashing on radeonsi after some time of > playing: > http://forums.unknownworlds.com/discussion/comment/2330666/#Comment_2330666 > This is with "Failed to allocate" errors. > I know (and if you compare the system info, you'll find a lot similarities ... ;-) > For me personally, NS2 didn't even start on older Mesa (mmap failed), now it > starts, but as people report, seems to crash after some time. I personally > never played that long (because loading the shaders takes an eternity). > Yeah, same here, just go it to load wite mesa 13 and llvm 3.9.1. With regard to loading times, I guess shader-on-disk cache should help here... > I'd guess this is simply the game running out of 32bit space, and Mesa > drivers being more memory hungry than proprietary drivers. The devs had OOM > problems even with nvidia drivers in the past and had to cut down on memory > usage. They said they can't even add more skins because they're that close > to the RAM limit. They are working on a 64bit version of the game, which > should hopefully fix all this.
On my end I re-tried to play ns2_linux32 with my R9 390X two days ago running Linux 4.9 (amdgpu), Mesa 17~git and LLVM 4~svn and I was able to play a complete alien tutorial without crashes and the rendering looked OK to me. By the way I haven't seen the game running on another platform so I can miss something, but everything looked acceptable, I don't know if it's the expected rendering, but it's a nice, acceptable and honest rendering. I've never played too much this game to be able to hit the 32bit memory limit.
I just want to report that with new mesa and cache NS2 runs as smooth as butter, thanks to the devs! System: Host: cray Kernel: 4.12.4-1-CHAKRA x86_64 (64 bit) Desktop: KDE Plasma 5.10.4 Distro: Chakra Machine: Device: desktop Mobo: ASUSTeK model: A88XM-PLUS v: Rev X.0x UEFI: American Megatrends v: 3003 date: 03/04/2017 CPU: Quad core AMD A10-7850K Radeon R7 12 Compute Cores 4C+8G (-MCP-) cache: 8192 KB clock speeds: max: 4200 MHz 1: 4200 MHz 2: 3000 MHz 3: 3000 MHz 4: 3000 MHz Graphics: Card: Advanced Micro Devices [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] Display Server: x11 (X.Org 1.19.3) driver: amdgpu Resolution: 2560x1440@119.88hz OpenGL: renderer: Gallium 0.4 on AMD FIJI (DRM 3.15.0 / 4.12.4-1-CHAKRA, LLVM 4.0.1) version: 4.5 Mesa 17.1.5 Audio: Card-1 Advanced Micro Devices [AMD] FCH Azalia Controller driver: snd_hda_intel Card-2 Advanced Micro Devices [AMD/ATI] Fiji HDMI/DP Audio Controller driver: snd_hda_intel Sound: Advanced Linux Sound Architecture v: k4.12.4-1-CHAKRA Network: Card: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet Sensors: System Temperatures: cpu: 44.0C mobo: 34.0C gpu: 41.0 Fan Speeds (in rpm): fan-1: 826 fan-2: 841 fan-3: 717 Info: Processes: 213 Uptime: 11:34 Memory: 4278.6/15994.4MB Client: Shell (bash) inxi: 2.3.23
Yes, I can confirm NS2 works well now. One thing to note is that when compiling the shaders, the memory usage of the game is much higher than when running from the cache (even *after* the compilation has ended), I'm not sure whether it's a game problem or driver problem of not freeing all the memory. In that case, it's still possible to hit the 32b memory limit. But when running from cache, I've seen no memory issue and rendering is fine.
This seems to be fixed. If the high memory use is still an issue some place, it seems best to open a new bug.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.