Bug 109389 - memory leak in `amdgpu_bo_create()`
Summary: memory leak in `amdgpu_bo_create()`
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 109390 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-01-18 21:11 UTC by Paul Menzel
Modified: 2019-09-06 21:16 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Output of kmemleak (4.80 MB, text/plain)
2019-01-18 21:11 UTC, Paul Menzel
no flags Details
Linux kernel messages (dmesg) (79.30 KB, text/plain)
2019-01-19 01:16 UTC, Paul Menzel
no flags Details
Galactic Civilizations III memleak log without DXVK (6.73 KB, text/plain)
2019-09-03 20:41 UTC, Czcibor Bohusz-Dobosz
no flags Details
Galactic Civilizations III memleak log with DXVK (6.00 KB, text/plain)
2019-09-03 20:49 UTC, Czcibor Bohusz-Dobosz
no flags Details
Galactic Civilizations III memleak log with DXVK (6.62 KB, text/plain)
2019-09-03 21:31 UTC, Czcibor Bohusz-Dobosz
no flags Details
DRM/Radeon glxgears memleak log (5.46 KB, text/plain)
2019-09-06 21:14 UTC, Czcibor Bohusz-Dobosz
no flags Details
DRM/AMDgpu glxgears memleak log (7.27 KB, text/plain)
2019-09-06 21:16 UTC, Czcibor Bohusz-Dobosz
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Menzel 2019-01-18 21:11:33 UTC
Created attachment 143156 [details]
Output of kmemleak

With Linux 5.0-rc2+ the memory leaks below are reported by kmemleak.

```
unreferenced object 0xffff9f83850c5000 (size 2048):
  comm "gnome-shell", pid 569, jiffies 4294682217 (age 9133.583s)
  hex dump (first 32 bytes):
    02 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00  ................
    02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000001aec1dd8>] amdgpu_bo_create+0x40/0x220 [amdgpu]
    [<000000007da39c30>] amdgpu_gem_object_create+0x9e/0x120 [amdgpu]
    [<00000000099484e9>] amdgpu_gem_create_ioctl+0x1d3/0x290 [amdgpu]
    [<000000009d8251d3>] drm_ioctl_kernel+0xa9/0xf0
    [<0000000050b61811>] drm_ioctl+0x201/0x3a0
    [<000000007c88aae3>] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
    [<0000000083291747>] do_vfs_ioctl+0xa4/0x630
    [<00000000722b6176>] ksys_ioctl+0x60/0x90
    [<000000001bfa30dc>] __x64_sys_ioctl+0x16/0x20
    [<000000007862c966>] do_syscall_64+0x55/0x170
    [<00000000a8eeee88>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [<000000001242345f>] 0xffffffffffffffff
unreferenced object 0xffff9f837cdd36c0 (size 64):
  comm "gnome-shell", pid 569, jiffies 4294682217 (age 9133.583s)
  hex dump (first 32 bytes):
    d4 23 b4 d5 ab e2 45 94 d0 53 43 86 83 9f ff ff  .#....E..SC.....
    01 00 00 00 04 00 00 00 60 ab cc 5c 83 9f ff ff  ........`..\....
  backtrace:
    [<000000000768e015>] ttm_bo_mem_space+0x41/0x4a0
    [<00000000f11076b2>] ttm_bo_validate+0xc7/0x130
    [<00000000c820992e>] ttm_bo_init_reserved+0x32f/0x390
    [<00000000fcfd5ce2>] amdgpu_bo_do_create+0x1ed/0x420 [amdgpu]
    [<000000001aec1dd8>] amdgpu_bo_create+0x40/0x220 [amdgpu]
    [<000000007da39c30>] amdgpu_gem_object_create+0x9e/0x120 [amdgpu]
    [<00000000099484e9>] amdgpu_gem_create_ioctl+0x1d3/0x290 [amdgpu]
    [<000000009d8251d3>] drm_ioctl_kernel+0xa9/0xf0
    [<0000000050b61811>] drm_ioctl+0x201/0x3a0
    [<000000007c88aae3>] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
    [<0000000083291747>] do_vfs_ioctl+0xa4/0x630
    [<00000000722b6176>] ksys_ioctl+0x60/0x90
    [<000000001bfa30dc>] __x64_sys_ioctl+0x16/0x20
    [<000000007862c966>] do_syscall_64+0x55/0x170
    [<00000000a8eeee88>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [<000000001242345f>] 0xffffffffffffffff
```
Comment 1 Paul Menzel 2019-01-18 21:13:53 UTC
*** Bug 109390 has been marked as a duplicate of this bug. ***
Comment 2 Paul Menzel 2019-01-19 01:16:22 UTC
Created attachment 143157 [details]
Linux kernel messages (dmesg)
Comment 3 Michel Dänzer 2019-01-25 16:19:05 UTC
Does this also happen with 4.20.y? If not, can you bisect?
Comment 4 Czcibor Bohusz-Dobosz 2019-09-03 20:41:32 UTC
Created attachment 145255 [details]
Galactic Civilizations III memleak log without DXVK

As far as I'm understanding the logs that I've gotten, this memory leak does still occur with Linux 5.2.11-arch1-1-ARCH and Mesa 1.9.15.

In my case, it is most prevalent when a Direct3D game is launched with the use of Wine accompanied by the DXVK translation layer that converts the D3D calls to Vulkan - just going to a game's main menu can eat up large amounts of memory, which are then never freed, not even as the game is closed, until caches are manually dropped with a command.

However, this seems to also occur to a much smaller extent with DXVK turned off; I attach a bcc memleak log that showcases the issue with the use of Galactic Civilizations III v3.9, as the smaller amounts of memory leaked when DXVK is not in use make tracing the exact call that permanently leaked memory easier - if I'm not anyhow mistaken, that would make it the one that leaked 68550656 bytes in this log.
Comment 5 Czcibor Bohusz-Dobosz 2019-09-03 20:49:04 UTC
Created attachment 145256 [details]
Galactic Civilizations III memleak log with DXVK

For comparison, I also attach a similar log that I made with the DXVK translation layer enabled, which caused the game to leak much larger amounts of memory, to the point of making it unplayable.

While Galactic Civilizations III is the only game which I've confirmed to permanently leak memory through this call when DXVK is not used, virtually all D3D games I've tried to translate to Vulkan so far have leaks like this; unfortunately, I don't currently have my hands on any native Vulkan production to test. In the logs I am only launching the game until it reaches the main menu, thus the leak is, well, pretty serious in my case... :)

The Vulkan driver reports itself as AMD RADV KAVERI (LLVM 8.0.1) 1.9.15.
Comment 6 Czcibor Bohusz-Dobosz 2019-09-03 21:31:08 UTC
Created attachment 145257 [details]
Galactic Civilizations III memleak log with DXVK

Apologies, looks like I had forgotten to update the methodology in several places of the DXVK memleak log - this one should be much more accurate.

The updated methodology had however, to my understanding, showcased something that I had not expected: apparently, the memory allocated by amdgpu_bo_create() does not actually accumulate in a linear fashion, instead, it seems like it is replaced the second time the game is launched. Because of that, there is a chance that more than the 65 megabytes were actually unavailable after the test without DXVK, perhaps a sum of all the amdgpu_bo_create() calls' allocations.
Comment 7 Czcibor Bohusz-Dobosz 2019-09-06 21:14:42 UTC
Created attachment 145288 [details]
DRM/Radeon glxgears memleak log

Took a while to perform some more tests, and it turns out that running glxgears with amdgpu also leaks memory - launching a hundred of glxgears instances leaks about 400 megabytes, only freed after they are killed and the caches are manually dropped with the command `echo 3 > /proc/sys/vm/drop_caches`.

Because glxgears does not need Vulkan support, it had also been possible for me to confirm that the massive persisting leak is definitely caused by the amdgpu driver - attached is a bcc memleak log of glxgears taken with the radeon driver.

On a side note, launching vkcube seems to leak memory with the described call at a very similar rate as well.
Comment 8 Czcibor Bohusz-Dobosz 2019-09-06 21:16:12 UTC
Created attachment 145289 [details]
DRM/AMDgpu glxgears memleak log

For comparison, I attach the bcc memleak log of glxgears taken with amdgpu.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.