When run a android unity3D game in a android container (fork from anbox, http://anbox.io). The memory leak very fast. It seems mesa alloc many memory, but not free. If run tested app in android phone, there is no leak. I tested mesa 17.2.4, and 18.0-devel
*** Bug 104883 has been marked as a duplicate of this bug. ***
Could you please attach here some traces of what kind of leaks are you experiencing?
==3734== 173,149 bytes in 2,731 blocks are possibly lost in loss record 26,105 of 26,112 ==3734== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==3734== by 0x11BF7C3C: ralloc_size (ralloc.c:121) ==3734== by 0x11BF80B4: ralloc_array_size (ralloc.c:208) ==3734== by 0x11BF80B4: ralloc_strdup (ralloc.c:353) ==3734== by 0x11BDA987: glsl_type::glsl_type(glsl_struct_field const*, unsigned int, glsl_interface_packing, bool, char const*) (glsl_types.cpp:152) ==3734== by 0x11BDC7E1: glsl_type::get_interface_instance(glsl_struct_field const*, unsigned int, glsl_interface_packing, bool, char const*) (glsl_types.cpp:1145) ==3734== by 0x11B9C91E: construct_interface_instance (builtin_variables.cpp:352) ==3734== by 0x11B9C91E: generate_varyings (builtin_variables.cpp:1406) ==3734== by 0x11B9C91E: _mesa_glsl_initialize_variables(exec_list*, _mesa_glsl_parse_state*) (builtin_variables.cpp:1436) ==3734== by 0x11B65C98: _mesa_ast_to_hir(exec_list*, _mesa_glsl_parse_state*) (ast_to_hir.cpp:127) ==3734== by 0x11BCC9E2: _mesa_glsl_compile_shader (glsl_parser_extras.cpp:2103) ==3734== by 0x11A35F37: _mesa_compile_shader (shaderapi.c:1128) ==3734== by 0x99E97F: gles2_decoder_context_t::decode(void*, unsigned long, IOStream*) (gles2_dec.cpp:321) ==3734== by 0x95C118: RenderThread::main() (RenderThread.cpp:74) ==3734== by 0x97C8A4: emugl::Thread::thread_main(void*) (thread_pthread.cpp:124)
I found _mesa_glsl_compile_shader is called very frequently. which will eventually call into glsl_type::get_interface_instance() and then glsl_type::glsl_type(). The glsl_type::get_interface_instance() declare a stack variable gls_type() in glsl_type(), has a free ralloc_array, ralloc_strdup call which will call malloc. If the program exit. this memory will released _mesa_destroy_shader_compiler we call _mesa_glsl_release_types(). Since this memory allocated is useless(just be cached). at least for the stacked object. It is better to release it very soon. My program needs to run a long time. Those memory get no chance to be released. And the system memory is exhausted.
Created attachment 137292 [details] [review] fix attempt Does this attached patch fix this for you?
Created attachment 137294 [details] [review] fix attempt Modified the fix slightly. I'll be trying alternative approach but this can be used for testing if there are other leaks still reported.
FYI I've tried alternative approach getting rid of stack allocated types (moving to ralloc and using glsl_type itself as context) but there are issues with builtin types declared with DECL_TYPE macro, it seems much bigger change as these types are not allocated using ralloc.
So the previous fix is not correct? Should this be fixed in later mesa release?
(In reply to yanhua from comment #8) > So the previous fix is not correct? Should this be fixed in later mesa > release? It is correct but not very elegant. Ideally the class itself would manage the memory it allocates. You can use this patch for testing.
Created attachment 137353 [details] [review] Simple patch that changes glsl_type to manage its own memory I've made a patch that attempts to simplify the glsl_type memory management and as a bonus also gets rid of the global mutex. Is this patch worth submitting to the list?
(In reply to Simon Hausmann from comment #10) > Created attachment 137353 [details] [review] [review] > Simple patch that changes glsl_type to manage its own memory > > I've made a patch that attempts to simplify the glsl_type memory management > and as a bonus also gets rid of the global mutex. Is this patch worth > submitting to the list? Cool, yeah it looks like it would work, I guess strdup needs to be util_strdup. I thought about this earlier but I wonder if the reason for having the global context is that there are a *lot* of types so now we will have loads of ralloc contexts .. not sure. Please send to mesa-dev for discussion?
(In reply to Tapani Pälli from comment #11) > (In reply to Simon Hausmann from comment #10) > > Created attachment 137353 [details] [review] [review] [review] > > Simple patch that changes glsl_type to manage its own memory > > > > I've made a patch that attempts to simplify the glsl_type memory management > > and as a bonus also gets rid of the global mutex. Is this patch worth > > submitting to the list? > > Cool, yeah it looks like it would work, I guess strdup needs to be > util_strdup. I thought about this earlier but I wonder if the reason for > having the global context is that there are a *lot* of types so now we will > have loads of ralloc contexts .. not sure. Please send to mesa-dev for > discussion? The reason for global context is that we want these glsl_types to live during the lifetime of context until compiler gets destroyed and only destroy them at end. This way all shaders will enjoy using the types and not needing to create the same types all the time. So whatever the fix is, that functionality likely needs to stay.
Simon, I've sent your patch to mesa-dev with some small changes: https://lists.freedesktop.org/archives/mesa-dev/2018-February/185676.html
Testing with Simon's patch,removing ralloc_strdup, it still be found following possible leaking points. ==13823== 40,176 bytes in 1 blocks are possibly lost in loss record 907 of 915 ==13823== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==13823== by 0x11BA013C: ralloc_size (ralloc.c:121) ==13823== by 0x11BA021D: rzalloc_size (ralloc.c:153) ==13823== by 0x11BA1331: ra_alloc_reg_set (register_allocate.c:197) ==13823== by 0x11EF1EEC: brw_vec4_alloc_reg_set (brw_vec4_reg_allocate.cpp:116) ==13823== by 0x11E2D128: brw_compiler_create (brw_compiler.c:113) ==13823== by 0x11CDD337: intelInitScreen2 (intel_screen.c:2667) ==13823== by 0x11C6D1B6: driCreateNewScreen2 (dri_util.c:151) ==13823== by 0xFFDDED6: dri3_create_screen (dri3_glx.c:869) ==13823== by 0xFFB1CD0: AllocAndFetchScreenConfigs (glxext.c:820) ==13823== by 0xFFB1CD0: __glXInitialize (glxext.c:946) ==13823== by 0xFFAD811: glXGetFBConfigs (glxcmds.c:1655) ==13823== by 0x1CA398E4: (anonymous namespace)::GlxDisplay::queryConfigs(int, void (*)(void*, EglOS::ConfigInfo const*)
(In reply to Jiancong from comment #14) > Testing with Simon's patch,removing ralloc_strdup, it still be found > following possible leaking points. > > ==13823== 40,176 bytes in 1 blocks are possibly lost in loss record 907 of > 915 > ==13823== at 0x4C2DB8F: malloc (in > /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) > ==13823== by 0x11BA013C: ralloc_size (ralloc.c:121) > ==13823== by 0x11BA021D: rzalloc_size (ralloc.c:153) > ==13823== by 0x11BA1331: ra_alloc_reg_set (register_allocate.c:197) > ==13823== by 0x11EF1EEC: brw_vec4_alloc_reg_set > (brw_vec4_reg_allocate.cpp:116) > ==13823== by 0x11E2D128: brw_compiler_create (brw_compiler.c:113) > ==13823== by 0x11CDD337: intelInitScreen2 (intel_screen.c:2667) > ==13823== by 0x11C6D1B6: driCreateNewScreen2 (dri_util.c:151) > ==13823== by 0xFFDDED6: dri3_create_screen (dri3_glx.c:869) > ==13823== by 0xFFB1CD0: AllocAndFetchScreenConfigs (glxext.c:820) > ==13823== by 0xFFB1CD0: __glXInitialize (glxext.c:946) > ==13823== by 0xFFAD811: glXGetFBConfigs (glxcmds.c:1655) > ==13823== by 0x1CA398E4: (anonymous > namespace)::GlxDisplay::queryConfigs(int, void (*)(void*, EglOS::ConfigInfo > const*) That happens when you forget to glXDestroyContext and glXMakeCurrent to NULL to unbind it so it can actually get freed. It's not a driver bug.
Created attachment 137522 [details] Patched with previous fix, and rerun the valgrind memcheck.
(In reply to Kenneth Graunke from comment #15) > (In reply to Jiancong from comment #14) > > Testing with Simon's patch,removing ralloc_strdup, it still be found > > following possible leaking points. ... > That happens when you forget to glXDestroyContext and glXMakeCurrent to NULL > to unbind it so it can actually get freed. It's not a driver bug. (In reply to Jiancong from comment #16) > Created attachment 137522 [details] > Patched with previous fix, and rerun the valgrind memcheck. Please fix the issue pointed out by Kenneth first, and some of the other application bugs visible in the valgrind output first. Otherwise it's hard to say which of those numerous items in your log are relevant. Alternative, you could kill the valgrind process with a signal that the application doesn't catch, but Valgrind can catch (e.g. SIGHUP or SIGXCPU could be such). That way application cannot mess up the leakage info with bugs it has in its exit path (which are irrelevant for tracking down run-time leakage). Note that for tracking down larger leaks, Valgrind Massif output is often more useful than Valgrind Memcheck output.
Created attachment 137720 [details] The massif out by 4 fours testing case
(In reply to Eero Tamminen from comment #17) > (In reply to Kenneth Graunke from comment #15) > > (In reply to Jiancong from comment #14) > > > Testing with Simon's patch,removing ralloc_strdup, it still be found > > > following possible leaking points. > ... > > That happens when you forget to glXDestroyContext and glXMakeCurrent to NULL > > to unbind it so it can actually get freed. It's not a driver bug. > > > (In reply to Jiancong from comment #16) > > Created attachment 137522 [details] > > Patched with previous fix, and rerun the valgrind memcheck. > > Please fix the issue pointed out by Kenneth first, and some of the other > application bugs visible in the valgrind output first. Otherwise it's hard > to say which of those numerous items in your log are relevant. > > Alternative, you could kill the valgrind process with a signal that the > application doesn't catch, but Valgrind can catch (e.g. SIGHUP or SIGXCPU > could be such). That way application cannot mess up the leakage info with > bugs it has in its exit path (which are irrelevant for tracking down > run-time leakage). > > Note that for tracking down larger leaks, Valgrind Massif output is often > more useful than Valgrind Memcheck output. Thanks for your advice. I've tested it again with massif and killed valgrind process with SIGHUP. After 4 hours testing, And I attached the massif.out file in attachment named . From the file, the #1 memory occupation is ralloc_size. As android app open and exit, the ralloc_size increase dramtically. The #2 memory occupation is eglCreateContext. But the weird is the callstack shows mesa called _swrast_CreateContext finally. In the configuration process, I've disabled swrast compile with "./configure --enable-glx=dri --enable-dri --with-dri-drivers=i915,i965 --with-gallium-drivers=i915 --disable-gallium-llvm". Is there anything wrong in my configuration?
i965 uses swrast functionality internally for legacy OpenGL features - so if you're using OpenGL 1.x-3.0 or GLES 1.x...then you'll see _swrast_CreateContext. It should be cleaned up if you destroy contexts properly. It's weird that you would see that on Android, though, because GLES2-3 don't hit that.
(In reply to Kenneth Graunke from comment #20) > i965 uses swrast functionality internally for legacy OpenGL features - so if > you're using OpenGL 1.x-3.0 or GLES 1.x...then you'll see > _swrast_CreateContext. It should be cleaned up if you destroy contexts > properly. It's weird that you would see that on Android, though, because > GLES2-3 don't hit that. The application use a Google translator to map GLES to GL command first, that maybe the reason.
Created attachment 137858 [details] [review] let's kill some programs If I read the Massif output correctly, does this patch help? This should release shader programs more aggressively and not only at context destroy.
Actually I think my patch in comment #22 is incorrect, we can't be sure there if rendering is finished and we can free the shaders. Correct way to fix these leaks is for application to: --- 8< --- eglMakeCurrent(info.egl_dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT); eglTerminate(info.egl_dpy); --- 8< --- and similar on GLX. Simon's patch landed master, I will close this one as FIXED. Please reopen if there are still leaks that happen when cleanup of the context has been added to the application.
That's OK. With Simon's patch the memory leaks decreased dramastically.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.