Bug 111077 - link_shader and deserialize_glsl_program suddenly consume huge amount of RAM
Summary: link_shader and deserialize_glsl_program suddenly consume huge amount of RAM
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: 18.3
Hardware: x86-64 (AMD64) Linux (All)
: medium blocker
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords: bisected
Depends on:
Blocks:
 
Reported: 2019-07-06 17:46 UTC by roland@rptd.ch
Modified: 2019-09-25 18:50 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Valgrind massif log (56.79 KB, text/plain)
2019-07-06 17:46 UTC, roland@rptd.ch
Details
Linking fail log (4.76 KB, text/plain)
2019-08-17 07:25 UTC, roland@rptd.ch
Details
attachment-32202-0.html (81 bytes, text/html)
2019-09-10 20:21 UTC, Matt Turner
Details

Description roland@rptd.ch 2019-07-06 17:46:22 UTC
Created attachment 144715 [details]
Valgrind massif log

Since a recent update a few days ago an application which barely consumes 2G RAM at full load is very slow to load and compiling shaders causes over 16G RAM to be consumed when the app eventually crashes.

I don't know what exactly in the update caused problems but certainly Mesa, the amdgpu driver and LLVM did get updates.

I also tried using Mesa 19.x but the problem is the same.

Driver is xf86-video-amdgpu-19.0.1 . LLVM is 7.0.x .

I've already deleted the mesa shader cache and all caches the application creates. I've totally recompiled the system (GenToo) to make sure no strange problems can be around. I've also tried with a completely fresh user to run the app.

Using valgrind --tool=massif the culprit seems to be ralloc_size which is called by the two above mentioned methods. I've attached a massif log of a couple of seconds running of the application and shutting it down before memory skyrockets even more. The app in question shows at that point of time only an empty scene with a simple shader doing a sky-box. The rest is Non OpenGL UI stuff.

Classified this as blocker since as soon as you try loading more shaders not even 32G seems to be enough to cope with the rampaging glsl compiler.
Comment 1 roland@rptd.ch 2019-07-06 18:01:14 UTC
I've started the application now also in a debugger and went to loading a simple model which causes tons of RAM to be consumes by the shader compiler. I interrupted with GDB and made a trace:

#0 0x00007f650ee794e7 in __memcpy_ssse3 () from /lib64/libc.so.6
#1 0x00007f650776a390 in blob_write_bytes () from /usr/lib64/dri/radeonsi_dri.so
#2 0x00007f650776a4e8 in blob_write_uint32 () from /usr/lib64/dri/radeonsi_dri.so
#3 0x00007f6507636421 in serialize_glsl_program () from /usr/lib64/dri/radeonsi_dri.so
#4 0x00007f6507638132 in shader_cache_write_program_metadata(gl_context*, gl_shader_program*) () from /usr/lib64/dri/radeonsi_dri.so
#5 0x00007f65074a9a38 in link_program_error () from /usr/lib64/dri/radeonsi_dri.so
#6 0x00007f6509d85a3d in deoglShaderLanguage::pLinkShader (this=0x7f65004360c0, handle=298) at src/modules/graphic/opengl/src/shaders/deoglShaderLanguage.cpp:1272
#7 0x00007f6509d86537 in deoglShaderLanguage::CompileShader (this=0x7f65004360c0, program=...) at src/modules/graphic/opengl/src/shaders/deoglShaderLanguage.cpp:530

Mesa gets stuck inside "link_program_error" => "shader_cache_write_program_metadata" => "serialize_glsl_program" . Most probably serialize_glsl_program goes rampage there but I have no idea if this is the real reason. According to the massif logs though ralloc_size is called with GB of data multiple time in there somewhere.
Comment 2 roland@rptd.ch 2019-07-07 17:57:42 UTC
I don't know what other information can help so I collected information about the state that worked (before the update) and the state that does not work anymore (after the update):

before update (working state):
media-libs/mesa-18.2.8
- x11-drivers/xf86-video-amdgpu-18.1.0
- x11-libs/libdrm-2.4.96
- sys-devel/llvm-6.0.1
- sys-devel/llvmgold-6
- sys-devel/llvm-common-6.0.1

after update (memory consumption bug present):
- media-libs/mesa-18.3.6 (I also tested media-libs/mesa-19.0.6 and
  media-libs/mesa-19.1.1 with same result)
- x11-drivers/xf86-video-amdgpu-19.0.1
- x11-libs/libdrm-2.4.97
- sys-devel/llvm-7.1.0
- sys-devel/llvmgold-7
- sys-devel/llvm-common-7.1.0

Is there anything else that can help?
Comment 3 Timothy Arceri 2019-07-08 02:04:59 UTC
Are you able to build mesa from git and do a git bisect to find the problem commit?
Comment 4 roland@rptd.ch 2019-07-12 16:02:10 UTC
I tried compiling from source but it does not work. Seems to have troubles with libdrm.

configure: error: Package requirements (libdrm >= 2.4.75 libdrm_intel >= 2.4.75) were not met:

Can't seem to get past this one.
Comment 5 roland@rptd.ch 2019-07-12 16:03:29 UTC
For the records: installed version is 2.4.97
Comment 6 Timothy Arceri 2019-07-15 23:59:29 UTC
(In reply to roland@rptd.ch from comment #4)
> I tried compiling from source but it does not work. Seems to have troubles
> with libdrm.
> 
> configure: error: Package requirements (libdrm >= 2.4.75 libdrm_intel >=
> 2.4.75) were not met:
> 
> Can't seem to get past this one.

Are you sure you have the libdrm-devel package installed?
Comment 7 roland@rptd.ch 2019-07-16 16:42:38 UTC
I'm on GenToo. It's a source based distro thus every package installed has headers and libraries installed too otherwise the distro won't work. So I'm positive everything required for compiling mesa is there (as I've mesa "emerged").
Comment 8 Matt Turner 2019-07-18 07:05:26 UTC
(In reply to roland@rptd.ch from comment #4)
> I tried compiling from source but it does not work. Seems to have troubles
> with libdrm.
> 
> configure: error: Package requirements (libdrm >= 2.4.75 libdrm_intel >=
> 2.4.75) were not met:
> 
> Can't seem to get past this one.

You must be building with more things enabled than your system Mesa is built with.

Run

> ebuild /path/to/mesa-19.0.8.ebuild configure clean

and copy the line that it uses to configure with meson. Use that in your build that you're using to bisect.

Alternatively, add "intel" to your VIDEO_CARDS setting and rebuild libdrm.
Comment 9 roland@rptd.ch 2019-07-18 16:24:29 UTC
I get two config attempts. This is the second one. Do you see anything out of place here?

meson --buildtype plain --libdir lib64 --localstatedir /var/lib --prefix /usr --sysconfdir /etc --wrap-mode nodownload -Dplatforms=x11,surfaceless,wayland,drm -Dllvm=true -Dlmsensors=true -Dlibunwind=false -Dgallium-nine=false -Dgallium-va=false -Dgallium-vdpau=false -Dgallium-xa=false -Dgallium-xvmc=false -Dgallium-opencl=icd -Dosmesa=none -Dbuild-tests=false -Dglx=dri -Dshared-glapi=true -Ddri3=true -Degl=true -Dgbm=true -Dgles1=false -Dgles2=true -Dglvnd=false -Dselinux=false -Dvalgrind=false -Ddri-drivers=r100,r200 -Dgallium-drivers=r300,r600,radeonsi,swrast -Dvulkan-drivers=amd --buildtype plain -Db_ndebug=true /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8 /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8-abi_x86_64.amd64
Comment 10 Matt Turner 2019-07-20 17:46:48 UTC
(In reply to roland@rptd.ch from comment #9)
> I get two config attempts. This is the second one. Do you see anything out
> of place here?
> 
> meson --buildtype plain --libdir lib64 --localstatedir /var/lib --prefix
> /usr --sysconfdir /etc --wrap-mode nodownload
> -Dplatforms=x11,surfaceless,wayland,drm -Dllvm=true -Dlmsensors=true
> -Dlibunwind=false -Dgallium-nine=false -Dgallium-va=false
> -Dgallium-vdpau=false -Dgallium-xa=false -Dgallium-xvmc=false
> -Dgallium-opencl=icd -Dosmesa=none -Dbuild-tests=false -Dglx=dri
> -Dshared-glapi=true -Ddri3=true -Degl=true -Dgbm=true -Dgles1=false
> -Dgles2=true -Dglvnd=false -Dselinux=false -Dvalgrind=false
> -Ddri-drivers=r100,r200 -Dgallium-drivers=r300,r600,radeonsi,swrast
> -Dvulkan-drivers=amd --buildtype plain -Db_ndebug=true
> /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8
> /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8-abi_x86_64.amd64

Just change the /var/tmp/portage/... path to your own build directory and you should be good :)
Comment 11 roland@rptd.ch 2019-07-20 23:37:32 UTC
Unfortunately it does not compile like this:

FAILED: src/amd/common/2a96a08@@amd_common@sta/ac_nir_to_llvm.c.o 
cc -Isrc/amd/common/2a96a08@@amd_common@sta -Isrc/amd/common -I../src/amd/common -Isrc/../include -I../src/../include -Isrc -I../src -Isrc/mapi -I../src/mapi -Isrc/mesa -I../src/mesa -I../src/gallium/include -Isrc/gallium/auxiliary -I../src/gallium/auxiliary -Isrc/compiler -I../src/compiler -Isrc/amd -I../src/amd -Isrc/compiler/nir -I../src/compiler/nir -I/usr/lib64/llvm/7/include -I/usr/include/libdrm -fdiagnostics-color=always -DNDEBUG -pipe -D_FILE_OFFSET_BITS=64 -std=c99 '-DVERSION="18.0.0-rc2"' -DPACKAGE_VERSION=VERSION '-DPACKAGE_BUGREPORT="https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa"' -DGLX_USE_TLS -DHAVE_X11_PLATFORM -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING -DGLX_USE_DRM -DHAVE_DRM_PLATFORM -DHAVE_SURFACELESS_PLATFORM -DENABLE_SHADER_CACHE -DHAVE___BUILTIN_BSWAP32 -DHAVE___BUILTIN_BSWAP64 -DHAVE___BUILTIN_CLZ -DHAVE___BUILTIN_CLZLL -DHAVE___BUILTIN_CTZ -DHAVE___BUILTIN_EXPECT -DHAVE___BUILTIN_FFS -DHAVE___BUILTIN_FFSLL -DHAVE___BUILTIN_POPCOUNT -DHAVE___BUILTIN_POPCOUNTLL -DHAVE___BUILTIN_UNREACHABLE -DHAVE_FUNC_ATTRIBUTE_CONST -DHAVE_FUNC_ATTRIBUTE_FLATTEN -DHAVE_FUNC_ATTRIBUTE_MALLOC -DHAVE_FUNC_ATTRIBUTE_PURE -DHAVE_FUNC_ATTRIBUTE_UNUSED -DHAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT -DHAVE_FUNC_ATTRIBUTE_WEAK -DHAVE_FUNC_ATTRIBUTE_FORMAT -DHAVE_FUNC_ATTRIBUTE_PACKED -DHAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL -DHAVE_FUNC_ATTRIBUTE_VISIBILITY -DHAVE_FUNC_ATTRIBUTE_ALIAS -DHAVE_FUNC_ATTRIBUTE_NORETURN -DUSE_SSE41 -DUSE_GCC_ATOMIC_BUILTINS -DUSE_X86_64_ASM -DMAJOR_IN_SYSMACROS -DHAVE_SYS_SYSCTL_H -DHAVE_LINUX_FUTEX_H -DHAVE_STRTOF -DHAVE_MKOSTEMP -DHAVE_POSIX_MEMALIGN -DHAVE_TIMESPEC_GET -DHAVE_MEMFD_CREATE -DHAVE_STRTOD_L -DHAVE_DLADDR -DHAVE_DL_ITERATE_PHDR -DHAVE_LIBDRM -DHAVE_ZLIB -DHAVE_PTHREAD -DHAVE_LLVM=0x0710 -DMESA_LLVM_VERSION_PATCH=0 -DHAVE_WAYLAND_PLATFORM -DWL_HIDE_DEPRECATED -DHAVE_DRI3 -DHAVE_LIBSENSORS=1 -Wall -Werror=implicit-function-declaration -Werror=missing-prototypes -fno-math-errno -fno-trapping-math -fPIC -pthread -D__STDC_LIMIT_MACROS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -fvisibility=hidden -MD -MQ 'src/amd/common/2a96a08@@amd_common@sta/ac_nir_to_llvm.c.o' -MF 'src/amd/common/2a96a08@@amd_common@sta/ac_nir_to_llvm.c.o.d' -o 'src/amd/common/2a96a08@@amd_common@sta/ac_nir_to_llvm.c.o' -c ../src/amd/common/ac_nir_to_llvm.c
../src/amd/common/ac_nir_to_llvm.c: In function ‘ac_llvm_finalize_module’:
../src/amd/common/ac_nir_to_llvm.c:6614:2: error: implicit declaration of function ‘LLVMAddPromoteMemoryToRegisterPass’; did you mean ‘LLVMAddDemoteMemoryToRegisterPass’? [-Werror=implicit-function-declaration]
  LLVMAddPromoteMemoryToRegisterPass(passmgr);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  LLVMAddDemoteMemoryToRegisterPass
cc1: some warnings being treated as errors
[425/1549] Compiling C object 'src/mesa/bbe4a73@@mesa_gallium@sta/main_format_utils.c.o'.
ninja: build stopped: subcommand failed.
Comment 12 Matt Turner 2019-07-22 16:57:19 UTC
(In reply to roland@rptd.ch from comment #11)
> ../src/amd/common/ac_nir_to_llvm.c
> ../src/amd/common/ac_nir_to_llvm.c: In function ‘ac_llvm_finalize_module’:
> ../src/amd/common/ac_nir_to_llvm.c:6614:2: error: implicit declaration of
> function ‘LLVMAddPromoteMemoryToRegisterPass’; did you mean
> ‘LLVMAddDemoteMemoryToRegisterPass’? [-Werror=implicit-function-declaration]
>   LLVMAddPromoteMemoryToRegisterPass(passmgr);
>   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   LLVMAddDemoteMemoryToRegisterPass
> cc1: some warnings being treated as errors
> [425/1549] Compiling C object
> 'src/mesa/bbe4a73@@mesa_gallium@sta/main_format_utils.c.o'.
> ninja: build stopped: subcommand failed.

Is this during the bisect, or when?

Try adding 

#if HAVE_LLVM >= 0x0700
#include <llvm-c/Transforms/Utils.h>
#endif

to the #include section of src/amd/common/ac_nir_to_llvm.c
Comment 13 roland@rptd.ch 2019-07-24 11:49:47 UTC
I checked out the 18.2 branch which I assume should work (if the theory is correct).

Modifying files won't work with bisecting, right?
Comment 14 Matt Turner 2019-07-26 17:11:55 UTC
(In reply to roland@rptd.ch from comment #13)
> I checked out the 18.2 branch which I assume should work (if the theory is
> correct).
> 
> Modifying files won't work with bisecting, right?

You'll probably have to 'git apply' the patch after each step, and 'git checkout -f' before each time you do 'git bisect good/bad'
Comment 15 roland@rptd.ch 2019-08-17 07:24:44 UTC
I've tried now the patching approach. I had to patch in total three files:

src/amd/common/ac_nir_to_llvm.c
src/gallium/auxiliary/gallivm/lp_bld_init.c
src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c

Nevertheless the build still fails at linking stage (see fail_log attachment).
Comment 16 roland@rptd.ch 2019-08-17 07:25:12 UTC
Created attachment 145083 [details]
Linking fail log
Comment 17 Matt Turner 2019-08-17 15:29:30 UTC
The build failure is in Clover, the OpenCL implementation. If the application that triggers the huge amount of RAM problem is not using OpenCL, disable OpenCL in meson configure and try to get past that.
Comment 18 Matt Turner 2019-08-24 16:33:06 UTC
Cc'ing myself in case more help is needed bisecting.
Comment 19 roland@rptd.ch 2019-08-25 18:45:42 UTC
Need to shift this back to GenToo since compiling (aka bisecting) is not the problem right now but GenToo (which actually is the "only" distro I've seen this problem on so far).
Comment 20 Matt Turner 2019-08-28 18:35:03 UTC
Great. We've got a bisect, and reverting the commit from 19.0.8 fixes the issue.

commit 9176703788c66de8287c6224650b1ff8d4238126
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Wed Aug 8 15:37:21 2018 -0400

    radeonsi: increase the maximum UBO size to 2 GB
    
    Same as the closed driver.
    
    This causes a failure in GL45-CTS.compute_shader.max, which has a trivial
    bug.
    
    Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

(see https://bugs.gentoo.org/690066#c33)
Comment 21 Marek Olšák 2019-08-28 19:24:44 UTC
How do I reproduce it?
Comment 22 roland@rptd.ch 2019-09-02 17:47:14 UTC
Hi Marek,

This is going to be complicated. The application is not yet free to use by others (working on getting it to release shape). I would have to figure out first how to break this down into a reproducible test case since I don't know myself what triggers the bug.

If you can think of some corner values to narrow down in what direction to search I can fully mess with the source code over here. The faulty commit talks about UBO maximum size so this might be a start. The OpenGL Capabilities from the GPU is this:

- UBO Maximum Block Size = 65536
- UBO Buffer Offset Alignment = 4

So the maximum size used by the application is 65536 bytes.

UBOs are used as shared buffers so blocks of data are placed next to each other respecting alignment and updated.

UBOs are created like this:

glBindBuffer(GL_UNIFORM_BUFFER, pUBO)  // <= done once
glBufferData(GL_UNIFORM_BUFFER, bufferSize, NULL, GL_DYNAMIC_DRAW)  // <= done once
glMapBufferRange(GL_UNIFORM_BUFFER, stride * elementCount, elementStride, GL_WRITE_ONLY | GL_MAP_INVALIDATE_RANGE_BIT)  // <= done for each data block written

Data then written and unmapped

In particular this means a larger UBO is created once then individual blocks are written to it using ranged mapping. Just a wield guess but could the problem be related to this kind of usage pattern?
Comment 23 Matt Turner 2019-09-03 23:24:24 UTC
Can you make an apitrace of the application that demonstrates the problem?
Comment 24 roland@rptd.ch 2019-09-04 17:30:00 UTC
(In reply to Matt Turner from comment #23)
> Can you make an apitrace of the application that demonstrates the problem?

I can only try RenderDoc. But can you export an API trace with it?
Comment 25 Matt Turner 2019-09-04 18:17:15 UTC
(In reply to roland@rptd.ch from comment #24)
> (In reply to Matt Turner from comment #23)
> > Can you make an apitrace of the application that demonstrates the problem?
> 
> I can only try RenderDoc. But can you export an API trace with it?

A RenderDoc trace would probably be fine. Any way to reproduce it would be fine.
Comment 26 roland@rptd.ch 2019-09-06 00:27:13 UTC
I've tried getting something done with RenderDoc but I did not get anywhere. I can capture the scene in the application (basically just a sky-box) but that's not really helpful. The problem happens while shaders are compiled. So the RAM consumption goes through the roof while loading. Once the shaders are loaded the RAM does not raise anymore while rendering. But I can't seem to get RenderDoc to trace the shader compilation steps. Any ideas how to trace this situation?
Comment 27 Matt Turner 2019-09-07 18:11:46 UTC
Is the shader-cache causing the shaders to be loaded from disk instead of compiled?

Try with the environment variable MESA_GLSL_CACHE_DISABLE=1
Comment 28 roland@rptd.ch 2019-09-08 08:57:39 UTC
I had deleted the cache already so I doubt this is the problem. I tried pulling all the shaders and trying to make a small console app just compiling them but so far no luck. I guess I need to check out the culprit commit to better narrow down what I need in the console app to show the problem.
Comment 29 Timothy Arceri 2019-09-09 00:58:42 UTC
(In reply to roland@rptd.ch from comment #22)
> Hi Marek,
> 
> This is going to be complicated. The application is not yet free to use by
> others (working on getting it to release shape). I would have to figure out
> first how to break this down into a reproducible test case since I don't
> know myself what triggers the bug.
> 
> If you can think of some corner values to narrow down in what direction to
> search I can fully mess with the source code over here. The faulty commit
> talks about UBO maximum size so this might be a start. The OpenGL
> Capabilities from the GPU is this:
> 
> - UBO Maximum Block Size = 65536
> - UBO Buffer Offset Alignment = 4
> 
> So the maximum size used by the application is 65536 bytes.
> 
> UBOs are used as shared buffers so blocks of data are placed next to each
> other respecting alignment and updated.
> 
> UBOs are created like this:
> 
> glBindBuffer(GL_UNIFORM_BUFFER, pUBO)  // <= done once
> glBufferData(GL_UNIFORM_BUFFER, bufferSize, NULL, GL_DYNAMIC_DRAW)  // <=
> done once
> glMapBufferRange(GL_UNIFORM_BUFFER, stride * elementCount, elementStride,
> GL_WRITE_ONLY | GL_MAP_INVALIDATE_RANGE_BIT)  // <= done for each data block
> written
> 
> Data then written and unmapped
> 
> In particular this means a larger UBO is created once then individual blocks
> are written to it using ranged mapping. Just a wield guess but could the
> problem be related to this kind of usage pattern?

UBOs are written to the shader cache, and shade cache items are kept in memory in a queue when all threads are busy compiling. If you have large UBOs this could indeed be your problem.

The following merge request might be helpful. Are you able to give this a test?

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852
Comment 30 roland@rptd.ch 2019-09-09 17:03:33 UTC
How I best test this? Just check out the branch and build it or apply it somehow to the mesa branch I have here from bisecting?
Comment 31 Matt Turner 2019-09-09 17:47:20 UTC
(In reply to roland@rptd.ch from comment #30)
> How I best test this? Just check out the branch and build it or apply it
> somehow to the mesa branch I have here from bisecting?

wget 'https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch' and store it in /etc/portage/patches/media-libs/mesa/

Then just rebuild (with portage) a version of Mesa that is known to have the bug.
Comment 32 roland@rptd.ch 2019-09-10 19:26:19 UTC
Is there a way I can verify the rebuilding has picked up the patch file? I rebuild and started the application but the result is the same. Now I'm not sure if the patch is not working or not picked up properly.
Comment 33 Matt Turner 2019-09-10 20:21:30 UTC
Created attachment 145325 [details]
attachment-32202-0.html

It will show the patch being applied before configuration.
Comment 34 roland@rptd.ch 2019-09-10 22:36:16 UTC
I do not see anything mentioned there. I've wget the file into /etc/portage/patches/media-libs/mesa/ directory, which did not exist. Anything else I need to do to get Gentoo to pick up the patch?
Comment 35 Matt Turner 2019-09-11 00:10:15 UTC
(In reply to roland@rptd.ch from comment #34)
> I do not see anything mentioned there. I've wget the file into
> /etc/portage/patches/media-libs/mesa/ directory, which did not exist.
> Anything else I need to do to get Gentoo to pick up the patch?

p50-ethernet ~ # mkdir -p /etc/portage/patches/media-libs/mesa/
p50-ethernet ~ # cd /etc/portage/patches/media-libs/mesa/
p50-ethernet /etc/portage/patches/media-libs/mesa # wget 'https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch'
--2019-09-10 17:09:04--  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch
Resolving gitlab.freedesktop.org... 35.185.111.185
Connecting to gitlab.freedesktop.org|35.185.111.185|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‘1852.patch’

1852.patch                                               [ <=>                                                                                                                  ]  13.33K  --.-KB/s    in 0.08s   

2019-09-10 17:09:04 (163 KB/s) - ‘1852.patch’ saved [13646]

p50-ethernet /etc/portage/patches/media-libs/mesa # ebuild /var/db/repos/gentoo/media-libs/mesa/mesa-19.2.0_rc2.ebuild prepare
 * mesa-19.2.0-rc2.tar.xz BLAKE2B SHA512 size ;-) ...                                                                                                                                                       [ ok ]
>>> Unpacking source...
>>> Unpacking mesa-19.2.0-rc2.tar.xz to /var/tmp/portage/media-libs/mesa-19.2.0_rc2/work
>>> Source unpacked in /var/tmp/portage/media-libs/mesa-19.2.0_rc2/work
>>> Preparing source in /var/tmp/portage/media-libs/mesa-19.2.0_rc2/work/mesa-19.2.0-rc2 ...
 * Applying 1852.patch ...                                                                                                                                                                                  [ ok ]
 * User patches applied.
>>> Source prepared.


"Applying 1852.patch" is what you should see.

There's nothing else you need to do other than putting the patch into that directory.
Comment 36 roland@rptd.ch 2019-09-11 16:39:58 UTC
# mkdir -p /etc/portage/patches/media-libs/mesa/
# cd /etc/portage/patches/media-libs/mesa/
wget 'https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch'
--2019-09-11 18:35:51--  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch
Resolving gitlab.freedesktop.org... 35.185.111.185
Connecting to gitlab.freedesktop.org|35.185.111.185|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‘1852.patch’

1852.patch                                 [ <=>                                                                         ]  13.33K  --.-KB/s    in 0.1s    

2019-09-11 18:35:52 (113 KB/s) - ‘1852.patch’ saved [13646]
# ebuild /usr/portage/media-libs/mesa/mesa-19.0.8.ebuild prepare
 * mesa-19.0.8.tar.xz BLAKE2B SHA512 size ;-) ...                                                                                                    [ ok ]
 * checking ebuild checksums ;-) ...                                                                                                                 [ ok ]
 * checking miscfile checksums ;-) ...                                                                                                               [ ok ]
>>> Unpacking source...
>>> Unpacking mesa-19.0.8.tar.xz to /var/tmp/portage/media-libs/mesa-19.0.8/work
>>> Source unpacked in /var/tmp/portage/media-libs/mesa-19.0.8/work
>>> Preparing source in /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8 ...
>>> Source prepared.

I'm sorry but this procedure does not pick up the patch. Any ideas why this could be the case? The directory /var/db/repos/gentoo/media-libs/mesa/ in your version does not exist on my system.
Comment 37 Matt Turner 2019-09-11 16:50:26 UTC
(In reply to roland@rptd.ch from comment #36)
> # mkdir -p /etc/portage/patches/media-libs/mesa/
> # cd /etc/portage/patches/media-libs/mesa/
> wget 'https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch'
> --2019-09-11 18:35:51-- 
> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch
> Resolving gitlab.freedesktop.org... 35.185.111.185
> Connecting to gitlab.freedesktop.org|35.185.111.185|:443... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: unspecified [text/plain]
> Saving to: ‘1852.patch’
> 
> 1852.patch                                 [ <=>                            
> ]  13.33K  --.-KB/s    in 0.1s    
> 
> 2019-09-11 18:35:52 (113 KB/s) - ‘1852.patch’ saved [13646]
> # ebuild /usr/portage/media-libs/mesa/mesa-19.0.8.ebuild prepare
>  * mesa-19.0.8.tar.xz BLAKE2B SHA512 size ;-) ...                           
> [ ok ]
>  * checking ebuild checksums ;-) ...                                        
> [ ok ]
>  * checking miscfile checksums ;-) ...                                      
> [ ok ]
> >>> Unpacking source...
> >>> Unpacking mesa-19.0.8.tar.xz to /var/tmp/portage/media-libs/mesa-19.0.8/work
> >>> Source unpacked in /var/tmp/portage/media-libs/mesa-19.0.8/work
> >>> Preparing source in /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8 ...
> >>> Source prepared.
> 
> I'm sorry but this procedure does not pick up the patch. Any ideas why this
> could be the case? The directory /var/db/repos/gentoo/media-libs/mesa/ in
> your version does not exist on my system.

Huh. Very strange. :|

(The /var/db/repos/gentoo path is the new path for ebuild repositories. It's exactly the same as /usr/portage on your system. It's a pretty recent change. See https://wiki.gentoo.org/wiki//var/db/repos/gentoo)

Maybe just build and test from a git checkout, like you were doing in the git bisect. I'm not sure why portage isn't applying the patch :(
Comment 38 roland@rptd.ch 2019-09-12 16:40:41 UTC
Could it be that your patch is not for version "mesa-19.0.8" as I'm using? (newer ones are keyworded)
Comment 39 Matt Turner 2019-09-13 22:55:38 UTC
(In reply to roland@rptd.ch from comment #38)
> Could it be that your patch is not for version "mesa-19.0.8" as I'm using?
> (newer ones are keyworded)

Perhaps try 19.2.0_rc3. That's much closer to master.

Whatever the case, if the patch doesn't apply to 19.0.8 and portage tried (and as a result failed) to apply it, it would give an error and stop.
Comment 40 roland@rptd.ch 2019-09-15 13:31:45 UTC
I tested the patch now against the master GIT from mesa. It makes no difference. RAM consumption still sky-rockets and application then crashes.
Comment 41 roland@rptd.ch 2019-09-15 13:35:39 UTC
EDIT: to sum up:
1) GIT master + patch => bug present
2) GIT master - patch + revert-commit => bug fixed
Comment 42 Matt Turner 2019-09-19 05:01:39 UTC
(In reply to roland@rptd.ch from comment #41)
> EDIT: to sum up:
> 1) GIT master + patch => bug present
> 2) GIT master - patch + revert-commit => bug fixed

If you don't have a way for developers to reproduce the issue themselves, it's likely to go unfixed.
Comment 43 roland@rptd.ch 2019-09-19 17:23:12 UTC
Sorry, but this is rude. I waited for an answer on my result of testing the patch (as requested) and that's what I get as answer? I can continue trying to find a reproducible task but then please say that you acknowledged my finding so I can continue in a helpful way.
Comment 44 Matt Turner 2019-09-19 20:20:18 UTC
(In reply to roland@rptd.ch from comment #43)
> Sorry, but this is rude. I waited for an answer on my result of testing the
> patch (as requested) and that's what I get as answer? I can continue trying
> to find a reproducible task but then please say that you acknowledged my
> finding so I can continue in a helpful way.

Huh? Thanks for testing, but I'm just informing you now that the stab-in-the-dark attempted fix didn't solve it that you're probably not going to see this fixed unless you can find a way for developers to reproduce. I'm not sure why you think that's rude.

And honestly I've spent soooo much time trying to help you and it's been *painful* at every step. I've done this kind of triage for 10 years and I don't remember a more frustrating experience helping a user. I'm un'Cc'ing myself since I don't work on this driver and I feel that my time is better spent elsewhere.

Best of luck.
Comment 45 roland@rptd.ch 2019-09-20 16:03:46 UTC
That was now really rude. Anyways. I'm interested in a solution, not discussion so I've pulled an apitrace of the situation. This does reproduce the problem. It's though 60M in size. Compressed it's 8M. Can I attach such a file here or shall I put it on an external file host?
Comment 46 roland@rptd.ch 2019-09-21 09:11:06 UTC
(In reply to Marek Olšák from comment #21)
> How do I reproduce it?

Looks like the trace file is too large to be attached here. I've put the file on my server to access: http://rptd.ch/misc/debug/deigde.trace.7z

Replaying that trace shows the problematic behavior.

With the commit reverted the trace loads very fast and memory consumption goes up normally. With the commit not reverted loading takes long and memory consumption jumps by 4G roughly.

Frame 0 is maybe a bit tricky to evaluate. Here the game engine does OpenGL capability detection and and OpenGL implementation error checks. So you will find various warnings, errors and such which only serve the purpose to figure out if what the OpenGL claims to support is actually really supported.

Frame 2279 is where the loading takes place. This is no more detection code so if anything is wrong in there then it's of interest.
Comment 47 GitLab Migration User 2019-09-25 18:50:22 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1416.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.