Bug 106893 - Potential mem leak with radv, linked to RADV_TRACE_FILE
Summary: Potential mem leak with radv, linked to RADV_TRACE_FILE
Status: CLOSED NOTABUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/radeon (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-11 21:32 UTC by John
Modified: 2018-07-23 12:36 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description John 2018-06-11 21:32:01 UTC
Hello,

when I try to play, or simply benchmark, Rise Of The Tomb Raider, the game ends up getting killed with an OOM error, while I should have more than enough with 16Gb. Just staying in the main menu I can see memory usage slowly creeping up to my limit.

I've tried looking at the process with memleax and I mostly see potential leaks in libvulkan_radeon.so. Since Nvidia had some sorts of memleak with that game too, I thought maybe it'd be somewhat similar with radv and open a bug here.

I have a 280X running on Arch Linux, with Linux 4.17. but it was the same on 4.16.x, mesa-git @135e4d434f and llvm-svn@334364, Xorg 1.20.

I'm happy to provide any information needed or try any patch.

Thank you!
Comment 1 Alex Smith 2018-06-12 08:28:38 UTC
The game will be compiling pipelines in the background even while left idle at the main menu. This is likely what's causing the memory usage to increase.

I just tested here with a Vega 64 on Mesa commit e266b32059 and LLVM release_60 branch r333579. At the main menu, once all background pipeline creation has completed, the game's memory usage is 2.6GB, which is not far off what I'd expect (perhaps a bit higher than last time I looked, IIRC on 18.0 stable).

If you're seeing significantly higher than that, perhaps something has regressed on LLVM trunk?
Comment 2 John 2018-06-12 08:41:28 UTC
Hello Alex,

on one test I started the game with around 1.5Gb used, and quit it when I had around 11Gb used just being in the main menu, quite more than expected, and it was still increasing (well maybe it would have stopped eventually).

I've dropped all settings to minimum, window instead of fullscreen, etc... but it does not seem to help.


I've had the issue since the game was released on Linux, so if it is a regression in LLVM, it is not super recent. I can always try with some release drivers instead of dev ones if you think that could be a start.

Thank you!
Comment 3 John 2018-06-12 09:18:07 UTC
I've just tried with Mesa 18.1 and LLVM 6.0, the behavior was the same.
Comment 4 John 2018-07-19 11:18:22 UTC
I have just tried again and since I was within Alex' numbers in the menu, I tried the benchmark and was able to complete it with no issue for the 1st time! I'd say the game used about 4Gb of RAM during that time, so all good!

A few things have changed on my system so I'll dig a little more to see what helped.

- I'm building mesa myself instead of using mesa-git to use this patch: https://bugs.freedesktop.org/attachment.cgi?id=139672&action=edit (I cannot play Hitman in good conditions without it).

- I'm using the CK patchset for my kernel with MUQSS.

- Then the obvious new code in linux 4.17.6/llvm-svn/radv-git, but I'd rather not bisect these to find what fixed my issue...

Thanks to whoever fixed this issue!
Comment 5 Bas Nieuwenhuizen 2018-07-19 11:41:51 UTC
Thanks for the follow up!
Comment 6 John 2018-07-19 13:37:09 UTC
Of course!

But now I am quite confused, I tried with a more standard system (standard Linux kernel and standard mesa-git packages), no difference.

I then rolled back Linux to 4.17 as it's what I used when I created this, and moved to stable llvm 6.0.1 /Mesa 18.1.4, but still no difference.

So either it's another package I'm not thinking of, since I'm using Arch there's an update to something every day, something that happened between Mesa 18.1 and 18.1.4 or LLVM 6.0.1, or something obvious that I'm missing. But since it's working, oh well!

Thanks!
Comment 7 John 2018-07-20 08:54:30 UTC
Hello,

out of curiosity I rolled back llvm and mesa to find out what helped but nothing changed.

Then eventually I figured it out:

When I have RADV_TRACE_FILE=~/radv.txt in set in /etc/environment, I get the leak, when that line is commented there's no more leak (after a reboot). Strangely I've only seen the issue with one game.

Surprisingly, when Rise Of The Tomb Raider uses a lot of memory with the variable set, the radv.txt file is not created so no issue is traced there.

I'm reopening the bug as it seems there is an issue yet to be fixed after all.

Thank you!
Comment 8 Samuel Pitoiset 2018-07-20 15:38:31 UTC
Interesting, after checking memory usage with RADV_TRACE_FILE set and RoTR, there is apparently a memory leak...

Note that RADV_TRACE_FILE should only be set for debugging purposes.
Comment 9 John 2018-07-20 16:08:27 UTC
Oh, I thought it'd be somewhat safe to leave it on for a while, when I was having issues with radv, and then I forgot to unset it till recently.

Apart from this issue, is it costly to run?
Comment 10 Samuel Pitoiset 2018-07-20 16:41:11 UTC
It is costly. You shouldn't set it. Basically, you don't have to set any RADV_ environment variables except when developers ask you to do. :-)
Comment 11 John 2018-07-20 16:49:14 UTC
Alright, thank you for the explanation!
I'll add a safety note to my environment file about that so that I remember in a few years :)
Comment 12 Samuel Pitoiset 2018-07-23 11:36:11 UTC
We don't have any memory leaks with RADV_TRACE_FILE actually. The thing is that RoTR creates a TON of pipelines in the background menu for the whole game. When RADV_TRACE_FILE is set we keep all shader info, including SPIRV, NIR and assembly.

After few minutes the game will crash because we are out of memory. We can't do anything useful for that.

Thanks for the report!
Comment 13 John 2018-07-23 12:20:45 UTC
I'm surely wrong but shouldn't that be flushed to a file or something when memory is becoming a problem? It seems like in such a case the trace cannot be used, isn't that problematic?

Either way, thank you for looking into it!
Comment 14 Alex Smith 2018-07-23 12:36:46 UTC
If ever there is a need to get a trace, I've given Samuel details of a game option that can be set to disable the background pipeline preloading.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.