|Summary:||[amdgpu] Rocket League: long hangs (several seconds) when loading assets (models/textures/shaders?)|
|Product:||Mesa||Reporter:||Silvan Jegen <s.jegen>|
|Component:||Drivers/Gallium/radeonsi||Assignee:||Default DRI bug account <dri-devel>|
|Status:||RESOLVED FIXED||QA Contact:||Default DRI bug account <dri-devel>|
|Priority:||medium||CC:||alexander, Andrei.Demin1996, bugs.freedesktop, capricorn.david, captaincrutches, cip91sk, dri-devel, frederic.romagne, himling, hofmann.zachary, jani.karkkainen, karolherbst, libcg, libgradev, matthew, mesa, thesaxophonist, ttimo, vedran, vrodic|
|i915 platform:||i915 features:|
Screenshot with GALLIUM_HUD of the first 10s of the game (bad)
Screenshot with GALLIUM_HUD of the first 1min30s of the game (good)
Screenshot with GALLIUM_HUD of the first 2min30s of the game (good)
Profile of the the first few seconds of a game (all components)
Profile of the the first few seconds of a game (only radeonsi)
Profile of a few seconds in the middle of the game (all components)
Profile of a few seconds in the middle of the game (only radeonsi)
perf report of RocketLeague stalling/freezing
perf report of RocketLeague stalling/freezing, including callgraphs
Shaders output folfer
patch - possible workaround
Description Silvan Jegen 2016-09-20 20:04:50 UTC
The game Rocket League has recently be ported to Linux (still in Beta at the moment). Running the game works fine but at each match start, the video output just freezes for several seconds. After these 5-10 second pauses the game resumes (though at a much later point in time since the other players in this multiplayer game kept playing during the time the output froze on my end). After the first few of those freezes, they don't happen anymore for that one match (presumably because the textures/shaders have been loaded already?). As soon as players join or leave those freezes come back though (because more loading of assets is required?). Setting the "Texture Detail" option to "Quality" instead of "High Quality" reduces the freezes to around 1 second max which makes the game much more playable. Hardware: i7 3.4GH, 8GB RAM RX 480 Sapphire Nitro 8GB Software: Linux 4.7.4-1-ARCH #1 SMP PREEMPT Thu Sep 15 15:24:29 CEST 2016 x86_64 GNU/Linux mesa 12.0.3-1 llvm 3.8.1-1 I tried to get an apitrace of a match vs CPUs. You can find it here: https://sillymon.ch/data/RocketLeague.trace.xz
Comment 1 Michel Dänzer 2016-09-21 02:05:07 UTC
Looks like most of the cycles at the start of the apitrace are spent in the GLSL compiler frontend, in particular in do_common_optimization.
Comment 2 Silvan Jegen 2016-09-21 08:05:53 UTC
(In reply to Michel Dänzer from comment #1) > Looks like most of the cycles at the start of the apitrace are spent in the > GLSL compiler frontend, in particular in do_common_optimization. Just to clarify: the trace also includes loading of the game itself (which takes a long time too) as well as in-game menu navigation. Loading the in-game menus (especially the options) takes a long time too so it *could* be related. I quit the game after about 1 minute of play time so the last one to two minutes of the trace should be the actual game. Thanks for looking into it!
Comment 3 Michel Dänzer 2016-09-21 14:59:41 UTC
(In reply to Silvan Jegen from comment #2) > Just to clarify: the trace also includes loading of the game itself (which > takes a long time too) as well as in-game menu navigation. Right, replaying the trace crashes for me after the shader compilations on startup, so my CPU profile only covered that. Maybe you can try getting a CPU profile of one of the other stalls. Otherwise, maybe try setting the environment variable GALLIUM_HUD=.dfps,requested-VRAM+VRAM-usage,requested-GTT+GTT-usage,cpu+temperature+GPU-load,.dnum-bytes-moved,.dbuffer-wait-time,.dnum-compilations+num-shaders-created for either running the game itself or replaying the trace, and taking a screenshot within one minute after one of the other stalls. That should allow at least confirming or ruling out that the other stalls are due to shader compilation as well.
Comment 4 Andreas Hartmetz 2016-09-21 18:04:03 UTC
I have the same problem. Assuming the problem was shader compilation, what could be done about it, though? Optimizing shader compilation by a factor of 10 seems unrealistic, a disk cache for shaders has been rejected (right?) and would not always help, e.g. when somebody with a car that uses a new asset joins.
Comment 5 Tobias Droste 2016-09-21 19:01:19 UTC
Created attachment 126715 [details] Screenshot with GALLIUM_HUD of the first 10s of the game (bad)
Comment 6 Tobias Droste 2016-09-21 19:02:16 UTC
Created attachment 126716 [details] Screenshot with GALLIUM_HUD of the first 1min30s of the game (good)
Comment 7 Tobias Droste 2016-09-21 19:02:46 UTC
Created attachment 126717 [details] Screenshot with GALLIUM_HUD of the first 2min30s of the game (good)
Comment 8 Tobias Droste 2016-09-21 19:07:06 UTC
On the first screenshot you can clearly see the FPS going down to 0 multiple times. Though looking at the graph only the first 2 drops could be explained by shader compiles if I interpret this correctly. I can't tell what's happening after that.
Comment 9 Tobias Droste 2016-09-21 19:15:24 UTC
Ah should have mentioned it: This is with radeon+radeonsi (no amdpgu) so it's probably not the kernel.
Comment 10 Silvan Jegen 2016-09-22 12:27:21 UTC
(In reply to Michel Dänzer from comment #3) > (In reply to Silvan Jegen from comment #2) > > Just to clarify: the trace also includes loading of the game itself (which > > takes a long time too) as well as in-game menu navigation. > > Right, replaying the trace crashes for me after the shader compilations on > startup, so my CPU profile only covered that. Maybe you can try getting a > CPU profile of one of the other stalls. I tried to replay the trace and it crashed on my machine as well... > Otherwise, maybe try setting the environment variable > GALLIUM_HUD=.dfps,requested-VRAM+VRAM-usage,requested-GTT+GTT-usage, > cpu+temperature+GPU-load,.dnum-bytes-moved,.dbuffer-wait-time,.dnum- > compilations+num-shaders-created for either running the game itself or > replaying the trace, and taking a screenshot within one minute after one of > the other stalls. That should allow at least confirming or ruling out that > the other stalls are due to shader compilation as well. Do you still need such a screenshot now that other people in the thread have posted a few? If so, I will upload one early next week (I will be away over the weekend).
Comment 11 Gregor Münch 2016-09-22 17:47:17 UTC
Experiencing same problem with Radeon HD 7970 Mesa Git radeonsi, LLVM Git, Linux 4.8 rc6. The Hangs are always there when: -start of the match 5-7sec -multiple times within the 1st minute of the game, up to 10sec per stall -upon the first explosion -upon a new player, with a new car spawns -opening option menu -end of the game, where the results are displayed Especially in the begin, you see a lot of blurrish textures. Ground textures, decals of the car, blurry "psyonix" advertisements. The grass looks like in a cell shading game, (wrong shader applied?)
Comment 12 Michel Dänzer 2016-09-23 01:15:25 UTC
(In reply to Andreas Hartmetz from comment #4) > Assuming the problem was shader compilation, what could be done about it, > though? Optimizing shader compilation by a factor of 10 seems unrealistic, Not necessarily. Let's worry about that once the code taking so much time is identified and analyzed. For that purpose, what's needed next is CPU profiles of the stalls. > a disk cache for shaders has been rejected (right?) No, it hasn't. > and would not always help, e.g. when somebody with a car that uses a new > asset joins. Right, there's only so much the driver stack can do when the game (engine) keeps compiling shaders.
Comment 13 hofmann.zachary 2016-09-24 15:19:29 UTC
I can also confirm this problem with Radeon R7 250 (radeonsi), Mesa 12.0.2, LLVM 3.8.1 and kernel version 4.6.0.
Comment 14 Tobias Droste 2016-09-24 23:58:21 UTC
Created attachment 126770 [details] Profile of the the first few seconds of a game (all components)
Comment 15 Tobias Droste 2016-09-24 23:58:39 UTC
Created attachment 126771 [details] Profile of the the first few seconds of a game (only radeonsi)
Comment 16 Tobias Droste 2016-09-25 00:00:42 UTC
I tried to do some profiling but with callgrind it's impossible because after half an hour I'm not even in the game creation menu. I tried zoom as an alternative (never used it before) and the attached screenshots are the result. I'm not 100% sure I used it how I should use it, but maybe someone can see something there.
Comment 17 Tobias Droste 2016-09-25 00:09:57 UTC
Created attachment 126772 [details] Profile of a few seconds in the middle of the game (all components)
Comment 18 Tobias Droste 2016-09-25 00:10:26 UTC
Created attachment 126773 [details] Profile of a few seconds in the middle of the game (only radeonsi)
Comment 19 Tobias Droste 2016-09-25 00:11:25 UTC
And I added screenshots of the profile for a few seconds where the game is running fine. I figured that it might be interesting where the difference between good and bad is.
Comment 20 Eero Tamminen 2016-09-26 11:12:41 UTC
Apitrace's own CPU overhead is so high that it's not very good for identifying CPU bottlenecks. Best would be to do (e.g. from SSH console): # perf record -a ^C During the game freeze. And provide profile report here: # perf report (+ install debug symbols for anything that misses symbols in the report.)
Comment 21 Silvan Jegen 2016-09-26 18:07:54 UTC
(In reply to Eero Tamminen from comment #20) > Best would be to do (e.g. from SSH console): > # perf record -a > ^C > > During the game freeze. I have a dual screen setup and ran 'perf record -a' on a terminal on the screen that the game was not running on. I captured events for a short amount of time during which the game was mostly stalling/freezing. > And provide profile report here: > # perf report > > (+ install debug symbols for anything that misses symbols in the report.) I uploaded the perf report. Most of the CPU time seems to be spent in RocketLeague itself for which there are no symbols available.
Comment 22 Silvan Jegen 2016-09-26 18:09:32 UTC
Created attachment 126796 [details] perf report of RocketLeague stalling/freezing
Comment 23 Michel Dänzer 2016-09-27 08:49:49 UTC
Please pass --call-graph to perf record (you may need to play with the different methods supported by that to find the one which works best for you, see perf record --help) and perf report.
Comment 24 Silvan Jegen 2016-09-27 09:11:13 UTC
(In reply to Michel Dänzer from comment #23) > Please pass --call-graph to perf record (you may need to play with the > different methods supported by that to find the one which works best for > you, see perf record --help) and perf report. Looks like my perf is linked to libunwind so according to 'perf help record' I should be able to run 'perf record --call-graph dwarf -a' during the stall and generate a CPU profile report including the call graphs. I will do so later today.
Comment 25 Silvan Jegen 2016-09-27 15:23:39 UTC
Created attachment 126813 [details] perf report of RocketLeague stalling/freezing, including callgraphs
Comment 26 Silvan Jegen 2016-09-27 15:26:11 UTC
I generated the report and uploaded the result. I may have hit a bug in perf while doing it. "perf report" was blocking on a pread64 call on /dev/dri/card0 when creating the report. I had to use gdb to close the associated file descriptor to get perf to continue creating the report but it seems to have worked.
Comment 27 Eero Tamminen 2016-09-27 15:47:16 UTC
(In reply to Silvan Jegen from comment #22) > Created attachment 126796 [details] > perf report of RocketLeague stalling/freezing Overview: --------------------------------- 74.86% RocketLeague 76.73% RocketLeague 16.33% [kernel.vmlinux] 2.05% libc-2.24.so 1.79% libpthread-2.24.so 1.59% [amdgpu] 10.22% RenderingThread 46.08% radeonsi_dri.so 22.35% RocketLeague 15.32% libc-2.24.so 8.99% libpthread-2.24.so 2.07% [amdgpu] 1.50% libGL.so.1.2.0 1.50% [kernel.vmlinux] 6.86% swapper 97.25% [kernel.vmlinux] 2.58% [xhci_hcd] 6.81% AsyncIOSystem 99.83% RocketLeague --------------------------------- -> If the perf data is just from the freeze(s), I'm not sure the problem here is compiler. (With the "-n" option given to "perf report" this overview could be much more accurate than calculating the info from the rounded percentages in the attachment.)
Comment 28 Micael Bergeron 2016-09-27 16:07:38 UTC
I can reproduce this every time. Using amdgpu-pro seems to fix the problem. I asked Psyonix for the debug symbols bug had no response. http://psyonix.com/forum/viewtopic.php?f=36&t=27894 I opened a bug report on Rocket League for the menu freeze. https://support.rocketleaguegame.com/hc/en-us/requests/1840 Shaders/materials seems to load very slowly compared to the Windows version. Software: Linux 4.7.4-1-ARCH #1 SMP PREEMPT Thu Sep 15 15:24:29 CEST 2016 x86_64 GNU/Linux mesa 12.0.3-1
Comment 29 Silvan Jegen 2016-09-27 18:45:01 UTC
(In reply to Micael Bergeron from comment #28) > I asked Psyonix for the debug symbols bug had no response. > http://psyonix.com/forum/viewtopic.php?f=36&t=27894 Maybe you can add a link to this bug report on their support forums? I have seen that people at Feral that do ports to Linux have commented on mesa bug reports about the games they are porting.
Comment 30 Michel Dänzer 2016-09-28 02:40:10 UTC
(In reply to Eero Tamminen from comment #20) > Apitrace's own CPU overhead is so high that it's not very good for > identifying CPU bottlenecks. That may be true in general, but taking a CPU profile while replaying the referenced apitrace clearly shows that most of the CPU cycles during the startup phase are spent in the GLSL compiler frontend code for me. Can't you reproduce that? Anyway, it looks like there may be at least one other, not shader compilation related, issue at play here. But that doesn't mean nothing can be done about the shader compilation issue.
Comment 31 Marek Olšák 2016-10-15 11:16:13 UTC
Please create apitrace while this environment variable is set: MESA_EXTENSION_OVERRIDE="-GL_ARB_buffer_storage -GL_AMD_pinned_memory" Then check that the apitrace replays correctly. If it does, please attach it. Thanks.
Comment 32 Micael Bergeron 2016-10-15 14:51:06 UTC
(In reply to Marek Olšák from comment #31) > Please create apitrace while this environment variable is set: > MESA_EXTENSION_OVERRIDE="-GL_ARB_buffer_storage -GL_AMD_pinned_memory" > > Then check that the apitrace replays correctly. If it does, please attach > it. Thanks. Here is my attempt to provide the needed apitrace. As they are large files, even compressed, I uploaded them to an external service. *Files will be deleted in 30 days.* I made 2 apitrace, 1) when you click on the Option menu and "High Quality" shaders are on. You can reproduce this by setting the "High Quality" shaders on then clicking again on Options. 2) when you enter a game, there is a long delay in the game loading, it hangs for multiples seconds at a time, multiple times until it settles. From my PoV, it seems to have something to do with the HUD. I haven't been able to replay the apitrace, as I'm not too familiar with this tool, maybe someone can fix the traces or something. I guess it's better than having nothing. Hit me up if you need anything else, thanks for your time. 1) https://ufile.io/2a49 2) https://ufile.io/0cf6
Comment 33 Marek Olšák 2016-10-15 15:35:02 UTC
The traces don't work. Let's assume it's an apitrace bug. Running the traces can be done with: glretrace --benchmark [trace file] glretrace is part of apitrace. We can at least profile the compiler: - please create an empty directory - point this environment variable to it: MESA_SHADER_CAPTURE_PATH=[path] - play the game for a while... after that, the directory should contain a lot of shader files - please pack the directory and attach it Thanks.
Comment 34 Micael Bergeron 2016-10-15 16:00:46 UTC
Created attachment 127319 [details] Shaders output folfer MESA_SHADER_CAPTURE_PATH output folder.
Comment 35 Marek Olšák 2016-10-16 11:55:05 UTC
Well, it's unreal engine 3, so it doesn't surprise me that it compiles shaders on demand. All UE3 games are affected in the same way. Possible solution 1: - persistent shader cache on the disk Possible solution 2: - convert shaders from TGSI to ISA in Mesa (we can use the LLVM assembler, but we need good register allocation for TGSI) - LLVM compilation should follow and be fully asynchronous Comments welcome.
Comment 36 Silvan Jegen 2016-10-16 12:41:14 UTC
(In reply to Marek Olšák from comment #35) > Well, it's unreal engine 3, so it doesn't surprise me that it compiles > shaders on demand. All UE3 games are affected in the same way. > > Possible solution 1: > - persistent shader cache on the disk > > Possible solution 2: > - convert shaders from TGSI to ISA in Mesa (we can use the LLVM assembler, > but we need good register allocation for TGSI) > - LLVM compilation should follow and be fully asynchronous > > Comments welcome. Is it save to assume that the amount of shaders to be compiled is what causes these stalls? I am not familiar enough with the problem space to be able to tell you which one would be better. According to the following quote (from further above) > Micael Bergeron 2016-09-27 16:07:38 UTC > I can reproduce this every time. > Using amdgpu-pro seems to fix the problem. amdgpu-pro doesn't seem to have this issue. Do we know what AMD does in its closed driver (persistent shader cache or something else)? Even if it works for AMD's closed driver it doesn't meant that the same approach would be feasible for the Open Source one of course.
Comment 37 Micael Bergeron 2016-10-16 14:29:13 UTC
The number of shaders is quite astonishing. Could using the r600 shader compiler LLVM (--enable-r600-llvm-compiler) fix this? As for the future, I think a persistent shader cache is probably the quickest way to patch this so the game is enjoyable. After that, a more subtle approach could be used: > Possible solution 2: > - convert shaders from TGSI to ISA in Mesa (we can use the LLVM assembler, > but we need good register allocation for TGSI) > - LLVM compilation should follow and be fully asynchronous Rocket League is a really popular game, which I think the Linux community should embrace. Thank you guys for making this work.
Comment 38 Vedran Miletić 2016-10-25 14:52:24 UTC
(In reply to Micael Bergeron from comment #37) > The number of shaders is quite astonishing. > > Could using the r600 shader compiler LLVM (--enable-r600-llvm-compiler) fix > this? > I don't see this option but judging by the name, this enables LLVM's R600 backend as a shader compiler on EG/NI cards and does not apply for SI+.
Comment 39 Stefano Cipriani 2016-10-28 12:49:33 UTC
Playing against bots with an AMD r9 m265x (cape verde) using DRI_PRIME=1 rendered via Intel HD Graphics 4600 the game it's not playable for about 30+ seconds, bot cars start to one every 5~10 seconds. When playing online I get the hangs for about the same time. I made a trace of an offline match with the git version of apitrace, and I can replay it with both git and stable version. You can find it at https://drive.google.com/open?id=0B41ZZ4daaP9hTmNDaUl2MHFWQlE
Comment 40 Michel Dänzer 2016-10-31 03:32:58 UTC
(In reply to Stefano Cipriani from comment #39) > https://drive.google.com/open?id=0B41ZZ4daaP9hTmNDaUl2MHFWQlE While replaying this trace, I only see pauses of fractions of seconds (for shader compilations) during the match. If someone is seeing pauses of several seconds when replaying this trace, it would be interesting to see a profile for one of those pauses.
Comment 41 Marek Olšák 2016-10-31 11:51:49 UTC
Yeah, the stalls are pretty small with apitrace. I've just pushed patches that should improve compile times for the GLSL compiler. Before that, the GLSL compilation speed always decreased with a high memory allocation count in the process. That may explain why apitrace is faster.
Comment 42 Evan Black 2016-11-02 19:13:38 UTC
Can't really contribute much in terms of any of the tracing stuff; just dropping my specs here as I have similar issues. Core i5 4690k R9 285 8 GB Memory I've tried it both on my SSD and HDD, and the issue was exactly the same: freezing in the early stages of the match or when new players join the game, obviously linked to loading their model / textures. Please keep the search up; I'd love to play this game just as smooth as it runs on Windows.
Comment 43 Steven Basart 2016-11-11 18:27:22 UTC
I also experience the issue. If someone can explain or point me to how I can profile it I can also try to provide some api traces. I'm running Mesa 12.0.3 OpenGL renderer string: Gallium 0.4 on AMD HAWAII (DRM 2.46.0 / 4.8.0-27-generic, LLVM 3.8.1) AMD 290X intel i7
Comment 44 Jani Kärkkäinen 2016-11-11 23:49:03 UTC
I'm quite sure I'm getting this as well. I'm however on a HD6850, so that'd be the r600 driver I believe? Anything I can do to help pinpoint the culprit? Some specs: OpenGL renderer string: Gallium 0.4 on AMD BARTS (DRM 2.46.0 / 4.8.7-xanmod9, LLVM 4.0.0) OpenGL core profile version string: 3.3 (Core Profile) Mesa 13.1.0-devel - padoka PPA Ubuntu Gnome 16.10 64-bit
Comment 45 Gregor Münch 2016-11-15 17:25:51 UTC
No change with GIT of today. Slowdowns remain the same.
Comment 46 George Billios 2016-11-15 20:38:26 UTC
This is also valid for Trine 3. Big delay to start ~5mins setting MESA_SHADER_CAPTURE_PATH results in about 16k files!!! specs i7 860 Fury R9 4.7-staging kernel mesa-git Mesa 13.1.0-devel (git-151aeca) llvm 3.9.0
Comment 47 Marek Olšák 2017-01-06 22:23:01 UTC
The freezes are not caused by shader compilation. I don't know what's causing them. They are too long (even 10 seconds). A CPU profiler shows the time is not spent in the driver. It's spent in RocketLeague. The game is doing something and I can't see what, because it doesn't come with debug info. Some time is spent in sched_yield, which suggests that the game is in a loop waiting for something. This issue might not be related to the GL driver even.
Comment 48 Jani Kärkkäinen 2017-01-09 07:12:10 UTC
Using AMDGPU-PRO the hangs are non-existent. So there must be SOMETHING in the whole mesa-driver stack, that causes the hangs, if not the GL drivers themselves.
Comment 49 iuno 2017-01-10 04:50:06 UTC
(In reply to Jani Kärkkäinen from comment #48) > Using AMDGPU-PRO the hangs are non-existent. So there must be SOMETHING in > the whole mesa-driver stack, that causes the hangs, if not the GL drivers > themselves. Actually, I didn't see this huge hangs with Intel. The game is not playable with radeonsi at this point.
Comment 50 Marek Olšák 2017-01-10 17:02:03 UTC
It would be useful to have a debug build of the game and run it with a profiler or debugger to see where it's looping.
Comment 51 Jani Kärkkäinen 2017-01-13 22:22:20 UTC
Sent a support ticket to Psyonix about this (and the request for a debug build). Hopefully a random persons support ticket gets to the dev team and they deem it something that's in the realm of possibility. On a good note, already got a positive response from the first-level support, so, fingers crossed. :D
Comment 52 Marek Olšák 2017-01-24 17:59:05 UTC
We don't need a debug build. We just need: 1) One person to run the debug build and use sysprof to capture where the CPU is spending time during the freeze. 2) Make a screenshot of the sysprof window and send it to the game developer. 3) The game developer should look at it and decide what to do next. sysprof is a very-easy-to-use standalone CPU profiler GUI that you run under root. It's observing all processes and also the kernel. For apps built with -g (but also keep -O2 at least), it will show the functions and % of CPU time spent in them. For apps also built with -fno-omit-frame-pointer, it will show whole call stacks.
Comment 53 Timothee Besset 2017-01-24 18:20:55 UTC
Hello! I have started working on this. I haven't found the root cause yet but I will update here when I have something. (For context, I did the initial port work for Psyonix. I just recently got a radeonsi setup together so I can look at this now.)
Comment 54 Michel Dänzer 2017-01-25 01:08:26 UTC
(In reply to Marek Olšák from comment #52) > 2) Make a screenshot of the sysprof window and send it to the game developer. Please save the profile in sysprof and send the saved data instead of a screenshot. Then the recipient can peruse the profile any way they like.
Comment 55 Timothee Besset 2017-01-25 17:02:08 UTC
Rocket League uses an async I/O thread to stream in content in the background. That thread works off of a queue and only does disk reads and decompression (it doesn't issue any OpenGL calls for instance, that happens on a dedicated render thread). The CPU usage you are seeing out of Rocket League is this thread being busy with decompression. This is generally normal and not an indication of a problem. However, when running with radeonsi, the async I/O thread is performing very poorly. Much, much worse than with amdgpupro. To the point that the main thread has to wait for async and causes a significant hitch (up to 4 seconds in some logs I captured). I tend to agree that this problem is not related to shader compilation. The bumps from shader compilation are barely showing up in profiles. But it's definitely made a lot worse by running on radeonsi. On the same system with amdgpupro those same hitches can happen but they are in the 100/200ms range. At this point of the investigation, and due to scheduling constraints, I am likely only going to come back to this once Timothy Arceri has a shader cache implementation available. I suspect this will not help significantly but let's hope I'm wrong there. In the meantime there are some log files from the async thread compared between radeonsi and amdgpu-pro if someone feels like extracting some possibly useful data: https://www.dropbox.com/s/d73jk66uhfrhv8l/amdgpupro-compressedread-timings.log?dl=0 https://www.dropbox.com/s/1xfl3l24f5buzs0/radeonsi-compressedread-timings.log?dl=0
Comment 56 Grazvydas Ignotas 2017-01-25 17:34:17 UTC
Is the async I/O thread doing a lot of memory allocations? If so, perhaps the malloc implementation is slowed down by having large amount of small allocs/frees from shader compiles and/or fragmentation of it's free space. I wonder if it's possible to put mesa on a separate allocator without affecting the rest of the app. If the game is 32bit, it could also be that the address space is almost fully mapped and mmap is struggling to find holes to fit the new allocations in. Just some random guesses there.
Comment 57 Vladislav Egorov 2017-01-26 02:28:51 UTC
Simple way to reproduce the bug - launch the game, then go to "Options" menu and get multi-second (>5s) freeze. I've tried to launch Rocket League on several GPUs. Graphics settings were the same everywhere. I wasn't able to reproduce the bug on i965 (Haswell), nouveau (GT710) and llvmpipe, it takes maybe 1 second to load "Options" menu on these drivers. I've reproduced the bug on radeonsi (R270X) and r600g (6930). The bug is reproducible on both radeon+radeonsi and amdgpu+radeonsi on fresh 4.9 kernel. So it's very unlikely it has anything to do with memory allocation or shader compilation at all. Shader cache would not help either. Narrowing the circle of suspects a bit more. LLVM can't be related, because the bug is reproducible on r600. Gallium is unlikely related, because I don't see the bug on llvmpipe and nouveau. That's something common to R600 and radeonsi.
Comment 58 Vladislav Egorov 2017-01-26 04:35:14 UTC
Correction - in fact I only tested radeon+radeonsi pair. I will try amdgpu+radeonsi later.
Comment 59 Andreas Hartmetz 2017-01-26 14:17:19 UTC
Another random guess: Maybe shader compilation is a problem insofar that it blocks in a different GL API call than in other drivers, which causes some lock to be held for much longer. Then the I/O thread won't find the lock free to submit a result when it expects to. That would require the GL / I/O thread synchronization to be done in a suboptimal way - the right synchronization primitives and techniques would avoid such problems.
Comment 60 Iaroslav Andrusyak 2017-01-26 14:22:11 UTC
if it help rocket league in wine(gallium-nine) works fine without hangs
Comment 61 Marek Olšák 2017-02-07 10:41:17 UTC
Created attachment 129379 [details] [review] patch - possible workaround Does the attached patch help?
Comment 62 freedesktop 2017-02-07 12:21:46 UTC
the patch kind of works for me. The Game loads much faster now, so does the options menu and I can play the game without any lags, but know I only get 30-40fps (60+ before) at Radeon 6950(r600g)
Comment 63 Marek Olšák 2017-02-07 13:32:33 UTC
OK. Let's wait for more people to report the same findings.
Comment 64 freedesktop 2017-02-07 17:07:46 UTC
Got it tested on another system with a rx470. The hangs are mostly gone, but fps drops from 60+ to ~25. (btw the patch was applied to mesa 126.96.36.199)
Comment 65 Andrei.Demin1996 2017-02-07 18:30:48 UTC
> Got it tested on another system with a rx470. >The hangs are mostly gone, but fps drops from 60+ to ~25. (btw the patch was >applied to mesa 188.8.131.52) Yep, same. FPS drops from 60+ to 30-40 FPS, the hangs are mostly gone, but some still there. MESA 13.0.3 RX 480
Comment 66 Captain Crutches 2017-02-07 21:34:01 UTC
I'm having the same results as everyone else with the patch: hanging seems diminished, FPS down noticeably (in multiplayer, not free play) but still playable. Patched mesa 17.0_rc2 R9 290
Comment 67 Michel Dänzer 2017-02-08 01:08:40 UTC
The patch forces most buffers to GTT, which is clearly a bad idea with a dedicated GPU. Marek, what theory did you want to test with this patch?
Comment 68 Marek Olšák 2017-02-08 12:39:49 UTC
The long hangs are spent in an upload/decompression thread the game uses. The thread doesn't use any GL calls. My theory that's now confirmed was that the game maps buffers in VRAM and does read-modify-write on that memory in that thread. We know that direct VRAM access is uncached and super slow if it's not a series of sequential writes that is write-combined on the CPU, which is like a write-only cache. Games typically upload data using a sequential copy/fill, which is why we had not seen this issue before. If the game used the MAP READ flag, the driver would return a mapping in RAM with caching enabled. You'll also get that if you set the CLIENT STORAGE bit in glBufferStorage. For completeness, you can also get a mapping in RAM that's uncached. The CPU and GPU performance is somewhere between the VRAM and cached RAM. We use it for streaming because the GPU has faster access to RAM uncached by the CPU.
Comment 69 Marek Olšák 2017-02-08 18:16:59 UTC
Actually, the only way to get cached RAM is to map a buffer with GL_MAP_READ_BIT. There is no other way.
Comment 70 Captain Crutches 2017-02-08 18:23:03 UTC
So, at the risk of sounding ignorant, does that mean the bug actually belongs to Psyonix because they're just doing it inefficiently, or is there still something that can be done on the Mesa side?
Comment 71 Marek Olšák 2017-02-08 18:36:18 UTC
The current heuristic in Mesa is that if a buffer is not mapped for read, Mesa doesn't enable CPU caches for the mapping. It's usually an optimal solution for write-only buffer uploads. Enabling CPU caches for mapped buffers adds a lot of inefficiencies, so it should be avoided.
Comment 72 Marek Olšák 2017-02-08 18:37:42 UTC
If the game does any reads or read-modify-write on the mapped buffer memory and doesn't use the GL_MAP_READ_BIT, it's a bug in the game.
Comment 73 Clemens Eisserer 2017-02-09 12:48:12 UTC
just to be curious, does it actually map the buffer with the READ flag?
Comment 74 iuno 2017-02-10 00:59:59 UTC
Do you need more testing? Has anyone yet reported that to psyonix?
Comment 75 Michel Dänzer 2017-02-10 01:05:45 UTC
(In reply to Clemens Eisserer from comment #73) > just to be curious, does it actually map the buffer with the READ flag? According to Timothee and the apitrace it does, so https://patchwork.freedesktop.org/patch/138047/ should help for this issue.
Comment 76 fin4478 2017-02-10 02:08:01 UTC
I tested the game and with default video settings it did hang sometimes, but dialing video settings down, the Rocket League works fine with Debian testing Xfce. Oipaf ppa and custom adg5f drm-next-4.11-wip kernel. Rx 460 is my gpu and I have tweaked the kernel to be fast and stable, see: https://bugzilla.kernel.org/show_bug.cgi?id=193651
Comment 77 freedesktop 2017-02-10 12:38:04 UTC
(In reply to Michel Dänzer from comment #75) > (In reply to Clemens Eisserer from comment #73) > > just to be curious, does it actually map the buffer with the READ flag? > > According to Timothee and the apitrace it does, so > https://patchwork.freedesktop.org/patch/138047/ should help for this issue. it helps, but fps drops quite a bit on my aged r660g gpu (patch applied to mesa 13.0.4). It runs better as Mareks test did before, so maybe it works out for newer hardware.
Comment 78 Timothee Besset 2017-02-10 18:10:18 UTC
Testing a game side fix for this problem atm (doesn't require the patched drivers). If someone wants to give it a spin before I send it to Psyonix, catch me on freenode (TTimo).
Comment 79 Timothee Besset 2017-02-10 20:13:15 UTC
(testing done - it may take a few days before this rolls out in a steam update)
Comment 80 Jani Kärkkäinen 2017-02-10 22:23:05 UTC
I think this is merged now? Using the padoka git ppa which just so happened to be updated, which included this patch (as it was just merged?), I can confirm the findings others had. The lag spikes effectively disappeared, but fps dropped to about 80% (on the lowest settings, to be fair). What's interesting though is that now it was possible to set the highest quality (before the patch it just lagged for me for double digit seconds, just for setting it up), and if I remember correctly, the fps was horrid, about half of the lowest settings. But now I can play just fine with 1080p quality maxed @ 74 fps. Also, before the patch the fps was quite stable, but now it fluctuates a lot more. It doesn't affect the gameplay negatively, though. Testing this on RX480.
Comment 81 Marek Olšák 2017-02-12 15:08:22 UTC
I merged a different patch that just improves GL_MAP_READ_BIT and returns a mapping with CPU caches enabled, with the downside that it does a copy from the uncached buffer into a cached temporary buffer, maps that, and if GL_MAP_WRITE_BIT is set, the temporary buffer is copied into the original buffer on unmap. The reason GPU drivers prefer uncached allocations is that the GPU access to the buffer is faster while the CPU access is slower (except for sequential writes). If we enabled caching, the GPU access would be slower, but the CPU access would be faster. Or we could have 2 buffers and do copying back and forth like for GL_MAP_READ_BIT, but again the copy also isn't free and it's worse if you have to do 2 copies - one in Map and the other in Unmap. The malloc + memcpy solution on the app side is probably the best one.
Comment 82 Timothee Besset 2017-02-12 16:45:39 UTC
Thanks Marek. The game has been updated to use a malloc and memcpy, this fix will have rolled out to everyone by now.
Comment 83 Michel Dänzer 2017-02-13 03:23:30 UTC
Thanks for the report and everyone's cooperation in getting it resolved. Fixed in Git master: Module: Mesa Branch: master Commit: d86099df0af7c22c8acfd48b38ad446d9c8df6bd URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=d86099df0af7c22c8acfd48b38ad446d9c8df6bd Author: Marek Olšák <email@example.com> Date: Thu Feb 9 03:14:22 2017 +0100 gallium/radeon: fix performance of buffer readbacks
Comment 84 Marek Olšák 2017-02-13 09:22:32 UTC
Michel, the commit in master doesn't fix it. It was fixed in the game by using mallocd memory for the upload process and then memcpy into the buffer. Therefore, the game doesn't need to use the read flag and doesn't do read-modify-write on mapped memory.
Comment 85 Jani Kärkkäinen 2017-02-13 13:13:10 UTC
Well this is confusing. The client update hasn't rolled out to me on steam yet (the last modified date of the RocketLeague -binary is 31st of January, so about 2 weeks old), but by updating to a more recent mesa (where the buffer fix is inplace) the game's lag spikes have been definitely fixed. Also, the lag spikes were gone only after updating the drivers. It can't be the client, so it must be the drivers, right?
Comment 86 Marek Olšák 2017-02-13 13:19:22 UTC
In that case, Michel is right.
Comment 87 Andreas Hartmetz 2017-02-13 14:34:29 UTC
For the record, the game side fix has NOT been shipped by Steam yet, at least for me, in Germany, not on a beta program. In case anyone else gets confused.
Comment 88 freedesktop 2017-02-13 19:53:20 UTC
(In reply to Andreas Hartmetz from comment #87) > For the record, the game side fix has NOT been shipped by Steam yet, at > least for me, in Germany, not on a beta program. In case anyone else gets > confused. it's not public yet https://steamdb.info/app/252950/depots/
Comment 89 Tobias Droste 2017-02-14 00:08:34 UTC
So today came a game update and I can also confirm that the long pauses at the beginning are gone. But there's still a small pause (<1s) every 2 or 3s now and not only at the beginning. Do you think it's a separate issue? (Mesa git from today)
Comment 90 Marek Olšák 2017-02-14 01:12:43 UTC
(In reply to Tobias Droste from comment #89) > So today came a game update and I can also confirm that the long pauses at > the beginning are gone. But there's still a small pause (<1s) every 2 or 3s > now and not only at the beginning. > > Do you think it's a separate issue? (Mesa git from today) You can find that out by yourself with Gallium HUD queries such as num-compilations for the shader compiler and num-bytes-moved, which is TTM activity.
Comment 91 Andreas Hartmetz 2017-02-14 09:51:43 UTC
Steam has now shipped the game side fix as well. It feels almost like a new game, much much better.
Comment 92 libgradev 2017-02-16 20:28:10 UTC
Working great here now guys - latest RL build and Mesa git. Thanks to all involved - great cooperation :)