It was discoverd that ANV doesn't free up allocated memory, original discussion: https://github.com/doitsujin/dxvk/issues/632
Affected configuration: ANV + DXVK + Wine.
Created attachment 141521 [details]
Top 10 stacks with outstanding allocations:
From the github issue, it sounded like these allocations are never freed even after closing the app; is this correct?
Yes, this is correct. Even closing the application the memory is not freed. If the application is opened again it does not use that memory it used before, it allocates new memory to itself.
Note: I'm FurretUber on GitHub.
Note: Dota 2 is causing GPU hangs, I have to set INTEL_DEBUG=nohiz to make it work.
It seems it's not a DXVK-only issue. Dota 2 has, after 12 hours, allocated 719 MB of i915_request cache too. What I have noticed is that it was not allocating any i915_request cache for 20 minutes, then it started to allocate the memory too but at rates much slower than the DXVK applications.
As a comparison, Forsaken Castle, in the same 20 minutes, allocated 213 MB.
I already tried to debug this (see the GitHub issue) using perf and the only thing that looked strange in the DXVK application was one of the variables having values way off compared to the value of the variable in other applications. Using perf to debug @<i915_gem_do_execbuffer+3647> and looking at the values of in_fence and fences:
Forsaken_Castle 16943  17521.309896: probe:i915_gem_do_execbuffer: (ffffffffc07690af) in_fence=0x0 fences=0xffff952366b7bb00
Xorg 2215  17521.309986: probe:i915_gem_do_execbuffer: (ffffffffc07690af) in_fence=0x0 fences=0x2d28
rolc.exe 15796  17521.311725: probe:i915_gem_do_execbuffer: (ffffffffc07690af) in_fence=0x0 fences=0x1d8
I can do more tests if needed.
The accumulation of i915_request implies there is a fence leak. Assuming it is not internal (an unmatched dma_fence_get/dma_fence_put), all userspace owners would be tied to an fd and eventually one would notice the fd exhaustion (after a few million depending on rlimit). But for the fd to stick around requires the process to be kept alive, which would imply the fence fd being passed to a display server. I don't think that is how fences are handled under X, which makes the likelihood of it being a singular userspace fence leak less likely.
'ls -1 /proc/$suspect/fd/ | wc -l' might be interesting to watch.
I suspect this is some sort of fence leak with syncobj which would explain why only anv hits it (the GL driver doesn't use syncobj yet).
> The accumulation of i915_request implies there is a fence leak. Assuming it is not internal (an unmatched dma_fence_get/dma_fence_put), all userspace owners would be tied to an fd and eventually one would notice the fd exhaustion (after a few million depending on rlimit).
It is not tied to a file descriptor because I915_EXEC_FENCE_OUT is not set in args->flags (that was worked out by working backward from a perf trace). This means that DEFINE_DRM_GEM_FOPS->drm_release is never called, and we never get dma_fence_put() from this (hypothetical) stack:
The trace at github indirectly shows that out_fence_fd == -1:
Also, my system is also affected by this. I have a Xeon E3-1276v3. I am running Gentoo with Linux 4.18.0-rc8, Xorg 1.19.5, mesa 18.1.6 and vulkan-loader 220.127.116.11. I have killed the Xorg server and the i915_request objects were not freed from the SLAB cache. This implies that the objects are not tied to a file descriptor.
Disregard that remark. That path involves sync_file_fops() and it has a matching dma_fence_get()/dma_fence_put() in sync_file_create()/sync_file_release().
Created attachment 141589 [details]
Dump of long lived i915_request object
I captured one of these long lived SLAB objects from my system using the crash utility. I am not a graphics developer, but here is what stood out to me:
1. The dma_fence refcount is 1.
2. The dma_fence segno is 82576060.
3. The dma_fence ops is i915_fence_ops.
4. The global_seqno is 82576036.
I'll post my capture of the referenced intel_engine_cs next, although the interesting thing there is that the timeline seqno is 82620551, which implies that this particular object is old.
Created attachment 141590 [details]
Dump of intel_engine_cs to match i915_request dump
Created attachment 141603 [details]
slabtop, /proc/slabinfo, wc -l proposed and one screenshot
I'm sorry for not being more active on this report before.
The attached file has three text files and one image.
I tested playing Forsaken Castle while watching ls -1 /proc/$suspect/fd/ | wc -l (on my case, $suspect=24003), the file is called proc24003fdwcl.txt. I create it by using:
while true ; do ls -1 /proc/24003/fd/ | wc -l >> proc24003fdwcl.txt ; sleep 2; done
There is /proc/slabinfo and the output of slabtop -s c too. There is iGVT-g load in the logs because I was using a Virtual Machine when tested this (it's not Forsaken Castle with iGVT-g load, don't worry).
The screenshot is the end of the game demo, with an approximate of the time I played it. Notice that in ~12 minutes (I needed some seconds to set up the logging) it allocated around 110 MB of memory.
Created attachment 141619 [details] [review]
DRM syncobj fence leak fix
Mind trying a kernel patch? I'm not in a position to experiment with kernel patches at the moment but I think I found the bug.
I am able to patch my kernel and rebuild to test, but I will not have access to the workstation that reproduces the problem for another 12 hours.
Created attachment 141641 [details]
Logs and screenshot
Great news! Looks like the memory leak is no longer happening or is very, very slow (see later). I played Forsaken Castle demo for 10 minutes, a game I became skilled thanks to this bug, and the amount of pages started at 100 objects and finished at 100 objects. The attached file has dmesg, /proc/slabinfo and a screenshot.
Comparing to the previous situation, where 12 minutes playing allocated 110 MB, it's perfect. I'm not sure if it's placebo effect, but looked like the game was running faster too, 125-130 FPS instead of 100-120 FPS.
Later I started Dota 2 (INTEL_DEBUG=nohiz used due to the hang and crash bug) and the i915_request object count rose to 650, when I closed it reduced to 350. I restarted X session and there is no sign of i915_request on slabtop, /proc/slabinfo has:
i915_request 350 450 640 25 4 : tunables 0 0 0 : slabdata 18 18 0
The values seem pretty small but it did not reduce restarting X, not sure how relevant this is.
This patch looks great.
We would need to dump the objects from the slab cache with crash to confirm that the objects are long lived and should not be there, but it sounds like there might be a small leak remaining. Unfortunately, I do not have a handy method available for dumping all objects in a slab cache via the crash utility. The best that i have done so far is get a list of addresses of objects and then dumped them each manually.
I could test a bit further. I opened another DXVK game (DiRT 3 Complete Edition) using INTEL_DEBUG=nohiz too, let one replay playing for one hour and checked the i915_request again. It rose up to 575 objects and when I closed it it kept that value.
What I noticed is that after I opened a Wine OpenGL game (rolc.exe) the value reduced to 100 again, the same value that was at boot.
Looks like the leak as before no longer exists, as that cache was cleaned by the OpenGL application. Before this patch the cache was never cleaned, requiring a reboot.
The curious thing is that the Vulkan/DXVK applications aren't cleaning part of the i915_request cache after they close, which the OpenGL application is doing for them.
Jason, applying your patch against Linux 4.18.0-rc8 (yes, I know I should upgrade) resolves the issue in Rise of Nations: Extended Edition. In my cursory test, i915_request allocations do not exceed 163 and alt-tabbing back to the KDE 5 desktop drops them to 15. Prior to the patch, it would have thousands of allocations by now. I do not see the issue that leozinho29_eu reported, although I did not try `INTEL_DEBUG=nohiz`.
Your patch looks good to me. Feel free to add my Reviewed-by and my Tested-by. To be clear, I know what saying to add my Reviewed-by means. I followed the "Reviewer’s statement of oversight" before offering it:
This might just be my lack of familiarity with the codebase, but why is ANV affected by this while RADV is not? I do not have any hardware to use to study how RADV works, but at a glance, it is relatively easy to see the fence bits being handled in the i915 GEM code while I am unable to tell at a glance how the amdgpu GEM code uses them. It seems to be done in a very abstract way in the amdgpu_cs.c file and it is not clear to me how that gets invoked by RADV. I am curious if the difference might point to the possibility that ANV is overly aggressive at fencing. I believe that DXVK should be making the same API calls on both.
I've submitted the kernel patch to the mailing list. Hopefully, it will land fairly soon and we'll make sure it gets back-ported as far as needed.
The fix has landed in drm-misc-fixes; it will propagate to a kernel release near you shortly.
Yurii, do you still have the issue? Based on your confirmation, I can close the bug.
I just forwarded issue here, I even don't have Linux desktop now. I had asked on same question also on GH.
In the original report users say that this was fixed.
Thanks for your feedback Yurii. I consider this bug has been fixed. Closing this bug.
*** Bug 108456 has been marked as a duplicate of this bug. ***