As of at least mesa 19.3/bfac462d929 on a Vega 64:
Running obs-studio, even without starting a broadcast, will begin a seemingly exponential memory leak. It will be fine for a few minutes, until it rapidly begins consuming what appears to be kernel memory (nothing attributed to app, but total usage skyrockets). With 32G of ram I exhaust system memory after about three minutes, but the OOM killer doesn't know what to take down as OBS itself remains low in the list. This can then murder the whole system.
However, killing OBS causes most of the memory to be freed. I say most because after reproducing on a fresh boot, there were apparently a few gigabytes of unaccounted for memory that never returned. Subsequent repros of the bug on that same boot returned to the same baseline, however. Some caching mechanism gone wrong?
I've noticed this going back at least a few weeks, but haven't a proper bisect. It should be very easy to reproduce, and happens on both Vega 64 systems I have available.
Steps to reproduce, may not all be necessary but I confirmed this does it from a fresh state:
- Launch obs-studio
- Enable Studio Mode by clicking the button the right
- Add two sources: "desktop capture" (select any monitor) and a single "Image" source (any image)
- Press Fade/Cut up top to make that state live. No need to actually start recording/broadcasting.
- Wait a few minutes or until your system hangs. Memory usage will appear stable for at least a full minute before taking off unprompted. It will not be attributed to the app, however, being apparently kernel memory.
Reproduces with 19.3 - bfac462d929
Does not reproduce with 19.1.4
Kernel versions 5.2.8/5.2.11 same behavior
> Reproduces with 19.3 - bfac462d929
> Does not reproduce with 19.1.4
Could you bisect to find when the issue was introduced?
Thanks for the clear steps to reproduce this issue. I managed to reproduce this on my RX 480 and it bisected to:
Author: Michel Dänzer <firstname.lastname@example.org>
Date: Fri Jun 28 18:35:56 2019 +0200
winsys/amdgpu: Make KMS handles valid for original DRM file descriptor
Getting a DMA-buf fd and converting that to a handle using our duplicate
of that file descriptor (getting at which requires passing a
radeon_winsys pointer to the buffer_get_handle hook) makes sure of this,
since duplicated file descriptors reference the same file description
and therefore the same GEM handle namespace.
This is necessary because libdrm_amdgpu may use a different DRM file
descriptor with a separate handle namespace internally, e.g. because it
always reuses any existing amdgpu_device_handle for the same device.
amdgpu_bo_export returns a handle which is valid for that internal
Reviewed-by: Marek Olšák <email@example.com>
Tested-by: Pierre-Eric Pelloux-Prayer <firstname.lastname@example.org>
While testing I saw a .8 to 1 MB/s slow leak which appeared immediately on opening OBS with the test scene. It felt like it consistently hit some obscured value like 64MB before the major memory leak started, which helped bisect the issue.
I reverted the commit on top of f8887909c6683986990474b61afd6d4335a69e41 with good results.
Does https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1907 help by any chance?
I reproduced the issue with 7d28e9ddd62eeccf6c528beee6b1a58fdfb7f5a0 + merge request 1907. No visible effect.