As part of the Vulkan CTS, radv creates about 30k AMDGPU contexts (about 1-20 live at the same time though). Each of those creates a bunch of fence contexts, one for each ring, to use for fences created from submitted jobs. However, as part of running jobs, fences with those contexts get attached to the vm->root.base.bo->tbo.resv of the corresponding vm. Which means that at some point we have tens of thousands of fences attached to it as they never get removed. They only ever get deduplicated with a later fence from the same fence context, so fences from destroyed contexts never get removed. Then in amdgpu_gem_va_ioctl -> amdgpu_vm_clear_freed -> amdgpu_vm_bo_update_mapping we do an amdgpu_sync_resv, which tries to add that to an amdgpu_sync object. Which only has a 16-entry hashtable, so adding the fences to the hashtable results in quadratic behavior. Combine this with doing sparse buffer tests at the end, which do lots of VA operations this results in tests taking 20+ minuts. So I could reduce the number of amdgpu contexts a bit in radv, but the bigger issue in my opnion is that we are pretty much leaking and never reclaiming the fences. Any idea how to best remove some signalled fences?
Well that should be already fixed by the following commits: commit ca25fe5efe4ab43cc5b4f3117a205c281805a5ca Author: Christian König <ckoenig.leichtzumerken@gmail.com> Date: Tue Nov 14 15:24:36 2017 +0100 dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace The amdgpu issue to also need signaled fences in the reservation objects should be fixed by now. Optimize the handling by replacing a signaled fence when adding a new shared one. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171114142436.1360-2-christian.koenig@amd.com commit 4d9c62e8ce69d0b0a834282a34bff5ce8eeacb1d Author: Christian König <ckoenig.leichtzumerken@gmail.com> Date: Tue Nov 14 15:24:35 2017 +0100 dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v3 The amdgpu issue to also need signaled fences in the reservation objects should be fixed by now. Optimize the list by keeping only the not signaled yet fences around. v2: temporary put the signaled fences at the end of the new container v3: put the old fence at the end of the new container as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171114142436.1360-1-christian.koenig@amd.com
Hmm, seems like we were only backporting amdgpu and not the things in drivers/dma-buf, that would explain. Thanks a lot!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.