Created attachment 144715 [details] Valgrind massif log Since a recent update a few days ago an application which barely consumes 2G RAM at full load is very slow to load and compiling shaders causes over 16G RAM to be consumed when the app eventually crashes. I don't know what exactly in the update caused problems but certainly Mesa, the amdgpu driver and LLVM did get updates. I also tried using Mesa 19.x but the problem is the same. Driver is xf86-video-amdgpu-19.0.1 . LLVM is 7.0.x . I've already deleted the mesa shader cache and all caches the application creates. I've totally recompiled the system (GenToo) to make sure no strange problems can be around. I've also tried with a completely fresh user to run the app. Using valgrind --tool=massif the culprit seems to be ralloc_size which is called by the two above mentioned methods. I've attached a massif log of a couple of seconds running of the application and shutting it down before memory skyrockets even more. The app in question shows at that point of time only an empty scene with a simple shader doing a sky-box. The rest is Non OpenGL UI stuff. Classified this as blocker since as soon as you try loading more shaders not even 32G seems to be enough to cope with the rampaging glsl compiler.
I've started the application now also in a debugger and went to loading a simple model which causes tons of RAM to be consumes by the shader compiler. I interrupted with GDB and made a trace: #0 0x00007f650ee794e7 in __memcpy_ssse3 () from /lib64/libc.so.6 #1 0x00007f650776a390 in blob_write_bytes () from /usr/lib64/dri/radeonsi_dri.so #2 0x00007f650776a4e8 in blob_write_uint32 () from /usr/lib64/dri/radeonsi_dri.so #3 0x00007f6507636421 in serialize_glsl_program () from /usr/lib64/dri/radeonsi_dri.so #4 0x00007f6507638132 in shader_cache_write_program_metadata(gl_context*, gl_shader_program*) () from /usr/lib64/dri/radeonsi_dri.so #5 0x00007f65074a9a38 in link_program_error () from /usr/lib64/dri/radeonsi_dri.so #6 0x00007f6509d85a3d in deoglShaderLanguage::pLinkShader (this=0x7f65004360c0, handle=298) at src/modules/graphic/opengl/src/shaders/deoglShaderLanguage.cpp:1272 #7 0x00007f6509d86537 in deoglShaderLanguage::CompileShader (this=0x7f65004360c0, program=...) at src/modules/graphic/opengl/src/shaders/deoglShaderLanguage.cpp:530 Mesa gets stuck inside "link_program_error" => "shader_cache_write_program_metadata" => "serialize_glsl_program" . Most probably serialize_glsl_program goes rampage there but I have no idea if this is the real reason. According to the massif logs though ralloc_size is called with GB of data multiple time in there somewhere.
I don't know what other information can help so I collected information about the state that worked (before the update) and the state that does not work anymore (after the update): before update (working state): media-libs/mesa-18.2.8 - x11-drivers/xf86-video-amdgpu-18.1.0 - x11-libs/libdrm-2.4.96 - sys-devel/llvm-6.0.1 - sys-devel/llvmgold-6 - sys-devel/llvm-common-6.0.1 after update (memory consumption bug present): - media-libs/mesa-18.3.6 (I also tested media-libs/mesa-19.0.6 and media-libs/mesa-19.1.1 with same result) - x11-drivers/xf86-video-amdgpu-19.0.1 - x11-libs/libdrm-2.4.97 - sys-devel/llvm-7.1.0 - sys-devel/llvmgold-7 - sys-devel/llvm-common-7.1.0 Is there anything else that can help?
Are you able to build mesa from git and do a git bisect to find the problem commit?
I tried compiling from source but it does not work. Seems to have troubles with libdrm. configure: error: Package requirements (libdrm >= 2.4.75 libdrm_intel >= 2.4.75) were not met: Can't seem to get past this one.
For the records: installed version is 2.4.97
(In reply to roland@rptd.ch from comment #4) > I tried compiling from source but it does not work. Seems to have troubles > with libdrm. > > configure: error: Package requirements (libdrm >= 2.4.75 libdrm_intel >= > 2.4.75) were not met: > > Can't seem to get past this one. Are you sure you have the libdrm-devel package installed?
I'm on GenToo. It's a source based distro thus every package installed has headers and libraries installed too otherwise the distro won't work. So I'm positive everything required for compiling mesa is there (as I've mesa "emerged").
(In reply to roland@rptd.ch from comment #4) > I tried compiling from source but it does not work. Seems to have troubles > with libdrm. > > configure: error: Package requirements (libdrm >= 2.4.75 libdrm_intel >= > 2.4.75) were not met: > > Can't seem to get past this one. You must be building with more things enabled than your system Mesa is built with. Run > ebuild /path/to/mesa-19.0.8.ebuild configure clean and copy the line that it uses to configure with meson. Use that in your build that you're using to bisect. Alternatively, add "intel" to your VIDEO_CARDS setting and rebuild libdrm.
I get two config attempts. This is the second one. Do you see anything out of place here? meson --buildtype plain --libdir lib64 --localstatedir /var/lib --prefix /usr --sysconfdir /etc --wrap-mode nodownload -Dplatforms=x11,surfaceless,wayland,drm -Dllvm=true -Dlmsensors=true -Dlibunwind=false -Dgallium-nine=false -Dgallium-va=false -Dgallium-vdpau=false -Dgallium-xa=false -Dgallium-xvmc=false -Dgallium-opencl=icd -Dosmesa=none -Dbuild-tests=false -Dglx=dri -Dshared-glapi=true -Ddri3=true -Degl=true -Dgbm=true -Dgles1=false -Dgles2=true -Dglvnd=false -Dselinux=false -Dvalgrind=false -Ddri-drivers=r100,r200 -Dgallium-drivers=r300,r600,radeonsi,swrast -Dvulkan-drivers=amd --buildtype plain -Db_ndebug=true /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8 /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8-abi_x86_64.amd64
(In reply to roland@rptd.ch from comment #9) > I get two config attempts. This is the second one. Do you see anything out > of place here? > > meson --buildtype plain --libdir lib64 --localstatedir /var/lib --prefix > /usr --sysconfdir /etc --wrap-mode nodownload > -Dplatforms=x11,surfaceless,wayland,drm -Dllvm=true -Dlmsensors=true > -Dlibunwind=false -Dgallium-nine=false -Dgallium-va=false > -Dgallium-vdpau=false -Dgallium-xa=false -Dgallium-xvmc=false > -Dgallium-opencl=icd -Dosmesa=none -Dbuild-tests=false -Dglx=dri > -Dshared-glapi=true -Ddri3=true -Degl=true -Dgbm=true -Dgles1=false > -Dgles2=true -Dglvnd=false -Dselinux=false -Dvalgrind=false > -Ddri-drivers=r100,r200 -Dgallium-drivers=r300,r600,radeonsi,swrast > -Dvulkan-drivers=amd --buildtype plain -Db_ndebug=true > /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8 > /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8-abi_x86_64.amd64 Just change the /var/tmp/portage/... path to your own build directory and you should be good :)
Unfortunately it does not compile like this: FAILED: src/amd/common/2a96a08@@amd_common@sta/ac_nir_to_llvm.c.o cc -Isrc/amd/common/2a96a08@@amd_common@sta -Isrc/amd/common -I../src/amd/common -Isrc/../include -I../src/../include -Isrc -I../src -Isrc/mapi -I../src/mapi -Isrc/mesa -I../src/mesa -I../src/gallium/include -Isrc/gallium/auxiliary -I../src/gallium/auxiliary -Isrc/compiler -I../src/compiler -Isrc/amd -I../src/amd -Isrc/compiler/nir -I../src/compiler/nir -I/usr/lib64/llvm/7/include -I/usr/include/libdrm -fdiagnostics-color=always -DNDEBUG -pipe -D_FILE_OFFSET_BITS=64 -std=c99 '-DVERSION="18.0.0-rc2"' -DPACKAGE_VERSION=VERSION '-DPACKAGE_BUGREPORT="https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa"' -DGLX_USE_TLS -DHAVE_X11_PLATFORM -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING -DGLX_USE_DRM -DHAVE_DRM_PLATFORM -DHAVE_SURFACELESS_PLATFORM -DENABLE_SHADER_CACHE -DHAVE___BUILTIN_BSWAP32 -DHAVE___BUILTIN_BSWAP64 -DHAVE___BUILTIN_CLZ -DHAVE___BUILTIN_CLZLL -DHAVE___BUILTIN_CTZ -DHAVE___BUILTIN_EXPECT -DHAVE___BUILTIN_FFS -DHAVE___BUILTIN_FFSLL -DHAVE___BUILTIN_POPCOUNT -DHAVE___BUILTIN_POPCOUNTLL -DHAVE___BUILTIN_UNREACHABLE -DHAVE_FUNC_ATTRIBUTE_CONST -DHAVE_FUNC_ATTRIBUTE_FLATTEN -DHAVE_FUNC_ATTRIBUTE_MALLOC -DHAVE_FUNC_ATTRIBUTE_PURE -DHAVE_FUNC_ATTRIBUTE_UNUSED -DHAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT -DHAVE_FUNC_ATTRIBUTE_WEAK -DHAVE_FUNC_ATTRIBUTE_FORMAT -DHAVE_FUNC_ATTRIBUTE_PACKED -DHAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL -DHAVE_FUNC_ATTRIBUTE_VISIBILITY -DHAVE_FUNC_ATTRIBUTE_ALIAS -DHAVE_FUNC_ATTRIBUTE_NORETURN -DUSE_SSE41 -DUSE_GCC_ATOMIC_BUILTINS -DUSE_X86_64_ASM -DMAJOR_IN_SYSMACROS -DHAVE_SYS_SYSCTL_H -DHAVE_LINUX_FUTEX_H -DHAVE_STRTOF -DHAVE_MKOSTEMP -DHAVE_POSIX_MEMALIGN -DHAVE_TIMESPEC_GET -DHAVE_MEMFD_CREATE -DHAVE_STRTOD_L -DHAVE_DLADDR -DHAVE_DL_ITERATE_PHDR -DHAVE_LIBDRM -DHAVE_ZLIB -DHAVE_PTHREAD -DHAVE_LLVM=0x0710 -DMESA_LLVM_VERSION_PATCH=0 -DHAVE_WAYLAND_PLATFORM -DWL_HIDE_DEPRECATED -DHAVE_DRI3 -DHAVE_LIBSENSORS=1 -Wall -Werror=implicit-function-declaration -Werror=missing-prototypes -fno-math-errno -fno-trapping-math -fPIC -pthread -D__STDC_LIMIT_MACROS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -fvisibility=hidden -MD -MQ 'src/amd/common/2a96a08@@amd_common@sta/ac_nir_to_llvm.c.o' -MF 'src/amd/common/2a96a08@@amd_common@sta/ac_nir_to_llvm.c.o.d' -o 'src/amd/common/2a96a08@@amd_common@sta/ac_nir_to_llvm.c.o' -c ../src/amd/common/ac_nir_to_llvm.c ../src/amd/common/ac_nir_to_llvm.c: In function ‘ac_llvm_finalize_module’: ../src/amd/common/ac_nir_to_llvm.c:6614:2: error: implicit declaration of function ‘LLVMAddPromoteMemoryToRegisterPass’; did you mean ‘LLVMAddDemoteMemoryToRegisterPass’? [-Werror=implicit-function-declaration] LLVMAddPromoteMemoryToRegisterPass(passmgr); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LLVMAddDemoteMemoryToRegisterPass cc1: some warnings being treated as errors [425/1549] Compiling C object 'src/mesa/bbe4a73@@mesa_gallium@sta/main_format_utils.c.o'. ninja: build stopped: subcommand failed.
(In reply to roland@rptd.ch from comment #11) > ../src/amd/common/ac_nir_to_llvm.c > ../src/amd/common/ac_nir_to_llvm.c: In function ‘ac_llvm_finalize_module’: > ../src/amd/common/ac_nir_to_llvm.c:6614:2: error: implicit declaration of > function ‘LLVMAddPromoteMemoryToRegisterPass’; did you mean > ‘LLVMAddDemoteMemoryToRegisterPass’? [-Werror=implicit-function-declaration] > LLVMAddPromoteMemoryToRegisterPass(passmgr); > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > LLVMAddDemoteMemoryToRegisterPass > cc1: some warnings being treated as errors > [425/1549] Compiling C object > 'src/mesa/bbe4a73@@mesa_gallium@sta/main_format_utils.c.o'. > ninja: build stopped: subcommand failed. Is this during the bisect, or when? Try adding #if HAVE_LLVM >= 0x0700 #include <llvm-c/Transforms/Utils.h> #endif to the #include section of src/amd/common/ac_nir_to_llvm.c
I checked out the 18.2 branch which I assume should work (if the theory is correct). Modifying files won't work with bisecting, right?
(In reply to roland@rptd.ch from comment #13) > I checked out the 18.2 branch which I assume should work (if the theory is > correct). > > Modifying files won't work with bisecting, right? You'll probably have to 'git apply' the patch after each step, and 'git checkout -f' before each time you do 'git bisect good/bad'
I've tried now the patching approach. I had to patch in total three files: src/amd/common/ac_nir_to_llvm.c src/gallium/auxiliary/gallivm/lp_bld_init.c src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c Nevertheless the build still fails at linking stage (see fail_log attachment).
Created attachment 145083 [details] Linking fail log
The build failure is in Clover, the OpenCL implementation. If the application that triggers the huge amount of RAM problem is not using OpenCL, disable OpenCL in meson configure and try to get past that.
Cc'ing myself in case more help is needed bisecting.
Need to shift this back to GenToo since compiling (aka bisecting) is not the problem right now but GenToo (which actually is the "only" distro I've seen this problem on so far).
Great. We've got a bisect, and reverting the commit from 19.0.8 fixes the issue. commit 9176703788c66de8287c6224650b1ff8d4238126 Author: Marek Olšák <marek.olsak@amd.com> Date: Wed Aug 8 15:37:21 2018 -0400 radeonsi: increase the maximum UBO size to 2 GB Same as the closed driver. This causes a failure in GL45-CTS.compute_shader.max, which has a trivial bug. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (see https://bugs.gentoo.org/690066#c33)
How do I reproduce it?
Hi Marek, This is going to be complicated. The application is not yet free to use by others (working on getting it to release shape). I would have to figure out first how to break this down into a reproducible test case since I don't know myself what triggers the bug. If you can think of some corner values to narrow down in what direction to search I can fully mess with the source code over here. The faulty commit talks about UBO maximum size so this might be a start. The OpenGL Capabilities from the GPU is this: - UBO Maximum Block Size = 65536 - UBO Buffer Offset Alignment = 4 So the maximum size used by the application is 65536 bytes. UBOs are used as shared buffers so blocks of data are placed next to each other respecting alignment and updated. UBOs are created like this: glBindBuffer(GL_UNIFORM_BUFFER, pUBO) // <= done once glBufferData(GL_UNIFORM_BUFFER, bufferSize, NULL, GL_DYNAMIC_DRAW) // <= done once glMapBufferRange(GL_UNIFORM_BUFFER, stride * elementCount, elementStride, GL_WRITE_ONLY | GL_MAP_INVALIDATE_RANGE_BIT) // <= done for each data block written Data then written and unmapped In particular this means a larger UBO is created once then individual blocks are written to it using ranged mapping. Just a wield guess but could the problem be related to this kind of usage pattern?
Can you make an apitrace of the application that demonstrates the problem?
(In reply to Matt Turner from comment #23) > Can you make an apitrace of the application that demonstrates the problem? I can only try RenderDoc. But can you export an API trace with it?
(In reply to roland@rptd.ch from comment #24) > (In reply to Matt Turner from comment #23) > > Can you make an apitrace of the application that demonstrates the problem? > > I can only try RenderDoc. But can you export an API trace with it? A RenderDoc trace would probably be fine. Any way to reproduce it would be fine.
I've tried getting something done with RenderDoc but I did not get anywhere. I can capture the scene in the application (basically just a sky-box) but that's not really helpful. The problem happens while shaders are compiled. So the RAM consumption goes through the roof while loading. Once the shaders are loaded the RAM does not raise anymore while rendering. But I can't seem to get RenderDoc to trace the shader compilation steps. Any ideas how to trace this situation?
Is the shader-cache causing the shaders to be loaded from disk instead of compiled? Try with the environment variable MESA_GLSL_CACHE_DISABLE=1
I had deleted the cache already so I doubt this is the problem. I tried pulling all the shaders and trying to make a small console app just compiling them but so far no luck. I guess I need to check out the culprit commit to better narrow down what I need in the console app to show the problem.
(In reply to roland@rptd.ch from comment #22) > Hi Marek, > > This is going to be complicated. The application is not yet free to use by > others (working on getting it to release shape). I would have to figure out > first how to break this down into a reproducible test case since I don't > know myself what triggers the bug. > > If you can think of some corner values to narrow down in what direction to > search I can fully mess with the source code over here. The faulty commit > talks about UBO maximum size so this might be a start. The OpenGL > Capabilities from the GPU is this: > > - UBO Maximum Block Size = 65536 > - UBO Buffer Offset Alignment = 4 > > So the maximum size used by the application is 65536 bytes. > > UBOs are used as shared buffers so blocks of data are placed next to each > other respecting alignment and updated. > > UBOs are created like this: > > glBindBuffer(GL_UNIFORM_BUFFER, pUBO) // <= done once > glBufferData(GL_UNIFORM_BUFFER, bufferSize, NULL, GL_DYNAMIC_DRAW) // <= > done once > glMapBufferRange(GL_UNIFORM_BUFFER, stride * elementCount, elementStride, > GL_WRITE_ONLY | GL_MAP_INVALIDATE_RANGE_BIT) // <= done for each data block > written > > Data then written and unmapped > > In particular this means a larger UBO is created once then individual blocks > are written to it using ranged mapping. Just a wield guess but could the > problem be related to this kind of usage pattern? UBOs are written to the shader cache, and shade cache items are kept in memory in a queue when all threads are busy compiling. If you have large UBOs this could indeed be your problem. The following merge request might be helpful. Are you able to give this a test? https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852
How I best test this? Just check out the branch and build it or apply it somehow to the mesa branch I have here from bisecting?
(In reply to roland@rptd.ch from comment #30) > How I best test this? Just check out the branch and build it or apply it > somehow to the mesa branch I have here from bisecting? wget 'https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch' and store it in /etc/portage/patches/media-libs/mesa/ Then just rebuild (with portage) a version of Mesa that is known to have the bug.
Is there a way I can verify the rebuilding has picked up the patch file? I rebuild and started the application but the result is the same. Now I'm not sure if the patch is not working or not picked up properly.
Created attachment 145325 [details] attachment-32202-0.html It will show the patch being applied before configuration.
I do not see anything mentioned there. I've wget the file into /etc/portage/patches/media-libs/mesa/ directory, which did not exist. Anything else I need to do to get Gentoo to pick up the patch?
(In reply to roland@rptd.ch from comment #34) > I do not see anything mentioned there. I've wget the file into > /etc/portage/patches/media-libs/mesa/ directory, which did not exist. > Anything else I need to do to get Gentoo to pick up the patch? p50-ethernet ~ # mkdir -p /etc/portage/patches/media-libs/mesa/ p50-ethernet ~ # cd /etc/portage/patches/media-libs/mesa/ p50-ethernet /etc/portage/patches/media-libs/mesa # wget 'https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch' --2019-09-10 17:09:04-- https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch Resolving gitlab.freedesktop.org... 35.185.111.185 Connecting to gitlab.freedesktop.org|35.185.111.185|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/plain] Saving to: ‘1852.patch’ 1852.patch [ <=> ] 13.33K --.-KB/s in 0.08s 2019-09-10 17:09:04 (163 KB/s) - ‘1852.patch’ saved [13646] p50-ethernet /etc/portage/patches/media-libs/mesa # ebuild /var/db/repos/gentoo/media-libs/mesa/mesa-19.2.0_rc2.ebuild prepare * mesa-19.2.0-rc2.tar.xz BLAKE2B SHA512 size ;-) ... [ ok ] >>> Unpacking source... >>> Unpacking mesa-19.2.0-rc2.tar.xz to /var/tmp/portage/media-libs/mesa-19.2.0_rc2/work >>> Source unpacked in /var/tmp/portage/media-libs/mesa-19.2.0_rc2/work >>> Preparing source in /var/tmp/portage/media-libs/mesa-19.2.0_rc2/work/mesa-19.2.0-rc2 ... * Applying 1852.patch ... [ ok ] * User patches applied. >>> Source prepared. "Applying 1852.patch" is what you should see. There's nothing else you need to do other than putting the patch into that directory.
# mkdir -p /etc/portage/patches/media-libs/mesa/ # cd /etc/portage/patches/media-libs/mesa/ wget 'https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch' --2019-09-11 18:35:51-- https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch Resolving gitlab.freedesktop.org... 35.185.111.185 Connecting to gitlab.freedesktop.org|35.185.111.185|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/plain] Saving to: ‘1852.patch’ 1852.patch [ <=> ] 13.33K --.-KB/s in 0.1s 2019-09-11 18:35:52 (113 KB/s) - ‘1852.patch’ saved [13646] # ebuild /usr/portage/media-libs/mesa/mesa-19.0.8.ebuild prepare * mesa-19.0.8.tar.xz BLAKE2B SHA512 size ;-) ... [ ok ] * checking ebuild checksums ;-) ... [ ok ] * checking miscfile checksums ;-) ... [ ok ] >>> Unpacking source... >>> Unpacking mesa-19.0.8.tar.xz to /var/tmp/portage/media-libs/mesa-19.0.8/work >>> Source unpacked in /var/tmp/portage/media-libs/mesa-19.0.8/work >>> Preparing source in /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8 ... >>> Source prepared. I'm sorry but this procedure does not pick up the patch. Any ideas why this could be the case? The directory /var/db/repos/gentoo/media-libs/mesa/ in your version does not exist on my system.
(In reply to roland@rptd.ch from comment #36) > # mkdir -p /etc/portage/patches/media-libs/mesa/ > # cd /etc/portage/patches/media-libs/mesa/ > wget 'https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch' > --2019-09-11 18:35:51-- > https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1852.patch > Resolving gitlab.freedesktop.org... 35.185.111.185 > Connecting to gitlab.freedesktop.org|35.185.111.185|:443... connected. > HTTP request sent, awaiting response... 200 OK > Length: unspecified [text/plain] > Saving to: ‘1852.patch’ > > 1852.patch [ <=> > ] 13.33K --.-KB/s in 0.1s > > 2019-09-11 18:35:52 (113 KB/s) - ‘1852.patch’ saved [13646] > # ebuild /usr/portage/media-libs/mesa/mesa-19.0.8.ebuild prepare > * mesa-19.0.8.tar.xz BLAKE2B SHA512 size ;-) ... > [ ok ] > * checking ebuild checksums ;-) ... > [ ok ] > * checking miscfile checksums ;-) ... > [ ok ] > >>> Unpacking source... > >>> Unpacking mesa-19.0.8.tar.xz to /var/tmp/portage/media-libs/mesa-19.0.8/work > >>> Source unpacked in /var/tmp/portage/media-libs/mesa-19.0.8/work > >>> Preparing source in /var/tmp/portage/media-libs/mesa-19.0.8/work/mesa-19.0.8 ... > >>> Source prepared. > > I'm sorry but this procedure does not pick up the patch. Any ideas why this > could be the case? The directory /var/db/repos/gentoo/media-libs/mesa/ in > your version does not exist on my system. Huh. Very strange. :| (The /var/db/repos/gentoo path is the new path for ebuild repositories. It's exactly the same as /usr/portage on your system. It's a pretty recent change. See https://wiki.gentoo.org/wiki//var/db/repos/gentoo) Maybe just build and test from a git checkout, like you were doing in the git bisect. I'm not sure why portage isn't applying the patch :(
Could it be that your patch is not for version "mesa-19.0.8" as I'm using? (newer ones are keyworded)
(In reply to roland@rptd.ch from comment #38) > Could it be that your patch is not for version "mesa-19.0.8" as I'm using? > (newer ones are keyworded) Perhaps try 19.2.0_rc3. That's much closer to master. Whatever the case, if the patch doesn't apply to 19.0.8 and portage tried (and as a result failed) to apply it, it would give an error and stop.
I tested the patch now against the master GIT from mesa. It makes no difference. RAM consumption still sky-rockets and application then crashes.
EDIT: to sum up: 1) GIT master + patch => bug present 2) GIT master - patch + revert-commit => bug fixed
(In reply to roland@rptd.ch from comment #41) > EDIT: to sum up: > 1) GIT master + patch => bug present > 2) GIT master - patch + revert-commit => bug fixed If you don't have a way for developers to reproduce the issue themselves, it's likely to go unfixed.
Sorry, but this is rude. I waited for an answer on my result of testing the patch (as requested) and that's what I get as answer? I can continue trying to find a reproducible task but then please say that you acknowledged my finding so I can continue in a helpful way.
(In reply to roland@rptd.ch from comment #43) > Sorry, but this is rude. I waited for an answer on my result of testing the > patch (as requested) and that's what I get as answer? I can continue trying > to find a reproducible task but then please say that you acknowledged my > finding so I can continue in a helpful way. Huh? Thanks for testing, but I'm just informing you now that the stab-in-the-dark attempted fix didn't solve it that you're probably not going to see this fixed unless you can find a way for developers to reproduce. I'm not sure why you think that's rude. And honestly I've spent soooo much time trying to help you and it's been *painful* at every step. I've done this kind of triage for 10 years and I don't remember a more frustrating experience helping a user. I'm un'Cc'ing myself since I don't work on this driver and I feel that my time is better spent elsewhere. Best of luck.
That was now really rude. Anyways. I'm interested in a solution, not discussion so I've pulled an apitrace of the situation. This does reproduce the problem. It's though 60M in size. Compressed it's 8M. Can I attach such a file here or shall I put it on an external file host?
(In reply to Marek Olšák from comment #21) > How do I reproduce it? Looks like the trace file is too large to be attached here. I've put the file on my server to access: http://rptd.ch/misc/debug/deigde.trace.7z Replaying that trace shows the problematic behavior. With the commit reverted the trace loads very fast and memory consumption goes up normally. With the commit not reverted loading takes long and memory consumption jumps by 4G roughly. Frame 0 is maybe a bit tricky to evaluate. Here the game engine does OpenGL capability detection and and OpenGL implementation error checks. So you will find various warnings, errors and such which only serve the purpose to figure out if what the OpenGL claims to support is actually really supported. Frame 2279 is where the loading takes place. This is no more detection code so if anything is wrong in there then it's of interest.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1416.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.