Bug 64568

Summary: SIGSEGV src/mesa/main/bufferobj.c:291
Product: Mesa Reporter: Vinson Lee <vlee>
Component: Mesa coreAssignee: Marek Olšák <maraeo>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: medium CC: brianp, jfonseca, maraeo
Version: gitKeywords: regression
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Vinson Lee 2013-05-13 22:27:58 UTC
mesa: a16a2d7147865634d68151d681a399f669146ff1 (master)

Run glxgears on softpipe or llvmpipe.

(gdb) bt
#0  0x00007f756787537f in _mesa_reference_buffer_object_ (ctx=0x1b19610, ptr=0x1b540e0, bufObj=0x0) at src/mesa/main/bufferobj.c:291
#1  0x00007f7567874e6a in _mesa_reference_buffer_object (ctx=0x1b19610, ptr=0x1b540e0, bufObj=0x0) at src/mesa/main/bufferobj.h:89
#2  0x00007f7567875c49 in _mesa_free_buffer_objects (ctx=0x1b19610) at src/mesa/main/bufferobj.c:651
#3  0x00007f7567754ca9 in _mesa_free_context_data (ctx=0x1b19610) at src/mesa/main/context.c:1159
#4  0x00007f756783e336 in st_destroy_context (st=0x1b6f670) at src/mesa/state_tracker/st_context.c:315
#5  0x00007f756775174d in st_context_destroy (stctxi=0x1b6f670) at src/mesa/state_tracker/st_manager.c:596
#6  0x00007f7567730acb in XMesaDestroyContext (c=0x1aa7960) at src/gallium/state_trackers/glx/xlib/xm_api.c:937
#7  0x00007f75677345f2 in glXDestroyContext (dpy=0x1a91960, ctx=0x1aa7920) at src/gallium/state_trackers/glx/xlib/glx_api.c:1363
#8  0x0000000000403755 in ?? ()
#9  0x00007f7566b4076d in __libc_start_main (main=0x402aa0, argc=1, ubp_av=0x7ffff2c3bcd8, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7ffff2c3bcc8) at libc-start.c:226
#10 0x00000000004017b9 in ?? ()
#11 0x00007ffff2c3bcc8 in ?? ()
#12 0x000000000000001c in ?? ()
#13 0x0000000000000001 in ?? ()
#14 0x00007ffff2c3c4cf in ?? ()
#15 0x0000000000000000 in ?? ()
(gdb) frame 0
#0  0x00007f756787537f in _mesa_reference_buffer_object_ (ctx=0x1b19610, ptr=0x1b540e0, bufObj=0x0) at src/mesa/main/bufferobj.c:291
291	      oldObj->RefCount--;

614ee25077b7ffafeb87b22563d01856824fb4bc is the first bad commit
commit 614ee25077b7ffafeb87b22563d01856824fb4bc
Author: Marek Olšák <maraeo@gmail.com>
Date:   Thu May 2 02:38:43 2013 +0200

    st/mesa: initialize all program constants and UBO limits
    
    Also simplify UBO support checking.
    
    NOTE: This is a candidate for the 9.1 branch.
    
    Reviewed-by: Brian Paul <brianp@vmware.com>
Comment 1 Jose Fonseca 2013-05-14 15:21:08 UTC
> Run glxgears on softpipe or llvmpipe.

One needs to exit cleanly, by pressing Escape key.

This is what valgrind says:

$ valgrind glxgears
==3762== Memcheck, a memory error detector
==3762== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==3762== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==3762== Command: glxgears
==3762== 
==3762== Invalid read of size 8
==3762==    at 0x51DB2B6: _mesa_reference_buffer_object (bufferobj.h:88)
==3762==    by 0x51DC028: _mesa_free_buffer_objects (bufferobj.c:651)
==3762==    by 0x50B0009: _mesa_free_context_data (context.c:1159)
==3762==    by 0x51A373C: st_destroy_context (st_context.c:315)
==3762==    by 0x50AC984: st_context_destroy (st_manager.c:596)
==3762==    by 0x508BD45: XMesaDestroyContext (xm_api.c:937)
==3762==    by 0x508F8B3: glXDestroyContext (glx_api.c:1363)
==3762==    by 0x403754: ??? (in /usr/bin/glxgears)
==3762==    by 0x6DF176C: (below main) (libc-start.c:226)
==3762==  Address 0x8838850 is 0 bytes after a block of size 1,152 alloc'd
==3762==    at 0x4C29E46: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3762==    by 0x51DBEA4: _mesa_init_buffer_objects (bufferobj.c:622)
==3762==    by 0x50AF313: init_attrib_groups (context.c:744)
==3762==    by 0x50AFBBE: _mesa_initialize_context (context.c:1013)
==3762==    by 0x50AFE54: _mesa_create_context (context.c:1113)
==3762==    by 0x51A340F: st_create_context (st_context.c:232)
==3762==    by 0x50ACAF0: st_api_create_context (st_manager.c:640)
==3762==    by 0x508BC7D: XMesaCreateContext (xm_api.c:909)
==3762==    by 0x508F19F: create_context (glx_api.c:1066)
==3762==    by 0x508F28C: glXCreateContext (glx_api.c:1097)
==3762==    by 0x402E21: ??? (in /usr/bin/glxgears)
==3762==    by 0x6DF176C: (below main) (libc-start.c:226)
==3762== 
==3762== Invalid read of size 8
==3762==    at 0x51DB78C: _mesa_reference_buffer_object_ (bufferobj.c:284)
==3762==    by 0x51DB2D5: _mesa_reference_buffer_object (bufferobj.h:89)
==3762==    by 0x51DC028: _mesa_free_buffer_objects (bufferobj.c:651)
==3762==    by 0x50B0009: _mesa_free_context_data (context.c:1159)
==3762==    by 0x51A373C: st_destroy_context (st_context.c:315)
==3762==    by 0x50AC984: st_context_destroy (st_manager.c:596)
==3762==    by 0x508BD45: XMesaDestroyContext (xm_api.c:937)
==3762==    by 0x508F8B3: glXDestroyContext (glx_api.c:1363)
==3762==    by 0x403754: ??? (in /usr/bin/glxgears)
==3762==    by 0x6DF176C: (below main) (libc-start.c:226)
==3762==  Address 0x8838870 is not stack'd, malloc'd or (recently) free'd
==3762== 
==3762== Invalid read of size 8
==3762==    at 0x51DB7A0: _mesa_reference_buffer_object_ (bufferobj.c:287)
==3762==    by 0x51DB2D5: _mesa_reference_buffer_object (bufferobj.h:89)
==3762==    by 0x51DC028: _mesa_free_buffer_objects (bufferobj.c:651)
==3762==    by 0x50B0009: _mesa_free_context_data (context.c:1159)
==3762==    by 0x51A373C: st_destroy_context (st_context.c:315)
==3762==    by 0x50AC984: st_context_destroy (st_manager.c:596)
==3762==    by 0x508BD45: XMesaDestroyContext (xm_api.c:937)
==3762==    by 0x508F8B3: glXDestroyContext (glx_api.c:1363)
==3762==    by 0x403754: ??? (in /usr/bin/glxgears)
==3762==    by 0x6DF176C: (below main) (libc-start.c:226)
==3762==  Address 0x8838870 is not stack'd, malloc'd or (recently) free'd
==3762== 
==3762== Invalid read of size 4
==3762==    at 0x75B2E84: pthread_mutex_lock (pthread_mutex_lock.c:50)
==3762==    by 0x51DB7B2: _mesa_reference_buffer_object_ (bufferobj.c:289)
==3762==    by 0x51DB2D5: _mesa_reference_buffer_object (bufferobj.h:89)
==3762==    by 0x51DC028: _mesa_free_buffer_objects (bufferobj.c:651)
==3762==    by 0x50B0009: _mesa_free_context_data (context.c:1159)
==3762==    by 0x51A373C: st_destroy_context (st_context.c:315)
==3762==    by 0x50AC984: st_context_destroy (st_manager.c:596)
==3762==    by 0x508BD45: XMesaDestroyContext (xm_api.c:937)
==3762==    by 0x508F8B3: glXDestroyContext (glx_api.c:1363)
==3762==    by 0x403754: ??? (in /usr/bin/glxgears)
==3762==    by 0x6DF176C: (below main) (libc-start.c:226)
==3762==  Address 0x260 is not stack'd, malloc'd or (recently) free'd
==3762== 
==3762== 
==3762== Process terminating with default action of signal 11 (SIGSEGV)
==3762==  Access not within mapped region at address 0x260
==3762==    at 0x75B2E84: pthread_mutex_lock (pthread_mutex_lock.c:50)
==3762==    by 0x51DB7B2: _mesa_reference_buffer_object_ (bufferobj.c:289)
==3762==    by 0x51DB2D5: _mesa_reference_buffer_object (bufferobj.h:89)
==3762==    by 0x51DC028: _mesa_free_buffer_objects (bufferobj.c:651)
==3762==    by 0x50B0009: _mesa_free_context_data (context.c:1159)
==3762==    by 0x51A373C: st_destroy_context (st_context.c:315)
==3762==    by 0x50AC984: st_context_destroy (st_manager.c:596)
==3762==    by 0x508BD45: XMesaDestroyContext (xm_api.c:937)
==3762==    by 0x508F8B3: glXDestroyContext (glx_api.c:1363)
==3762==    by 0x403754: ??? (in /usr/bin/glxgears)
==3762==    by 0x6DF176C: (below main) (libc-start.c:226)
==3762==  If you believe this happened as a result of a stack
==3762==  overflow in your program's main thread (unlikely but
==3762==  possible), you can try to increase the size of the
==3762==  main thread stack using the --main-stacksize= flag.
==3762==  The main thread stack size used in this run was 8388608.
==3762== 
==3762== HEAP SUMMARY:
==3762==     in use at exit: 9,178,298 bytes in 7,756 blocks
==3762==   total heap usage: 78,735 allocs, 70,979 frees, 29,436,940 bytes allocated
==3762== 
==3762== LEAK SUMMARY:
==3762==    definitely lost: 6,400 bytes in 1 blocks
==3762==    indirectly lost: 0 bytes in 0 blocks
==3762==      possibly lost: 877,234 bytes in 1,626 blocks
==3762==    still reachable: 8,294,664 bytes in 6,129 blocks
==3762==         suppressed: 0 bytes in 0 blocks
==3762== Rerun with --leak-check=full to see details of leaked memory
==3762== 
==3762== For counts of detected and suppressed errors, rerun with: -v
==3762== ERROR SUMMARY: 5 errors from 4 contexts (suppressed: 2 from 2)
Segmentation fault
Comment 2 Jose Fonseca 2013-05-14 15:27:46 UTC
> 614ee25077b7ffafeb87b22563d01856824fb4bc is the first bad commit
> commit 614ee25077b7ffafeb87b22563d01856824fb4bc
> Author: Marek Olšák <maraeo@gmail.com>
> Date:   Thu May 2 02:38:43 2013 +0200
> 
>     st/mesa: initialize all program constants and UBO limits
>     
>     Also simplify UBO support checking.
>     
>     NOTE: This is a candidate for the 9.1 branch.
>     
>     Reviewed-by: Brian Paul <brianp@vmware.com>

I confirm the same here.


The output of glxinfo changed radically with this patch:

--- /tmp/old.txx        2013-05-14 16:24:41.543837747 +0100
+++ /tmp/new.txx        2013-05-14 16:25:47.448257369 +0100
@@ -126,7 +126,7 @@
     GL_MAX_TEXTURE_LOD_BIAS_EXT = 16
     GL_MAX_DRAW_BUFFERS_ARB = 8
     GL_VERTEX_PROGRAM_ARB:
-        GL_MAX_PROGRAM_INSTRUCTIONS_ARB = 16384
+        GL_MAX_PROGRAM_INSTRUCTIONS_ARB = 1048576
         GL_MAX_PROGRAM_NATIVE_INSTRUCTIONS_ARB = 1048576
         GL_MAX_PROGRAM_TEMPORARIES_ARB = 256
         GL_MAX_PROGRAM_NATIVE_TEMPORARIES_ARB = 256
@@ -134,26 +134,26 @@
         GL_MAX_PROGRAM_NATIVE_PARAMETERS_ARB = 32384
         GL_MAX_PROGRAM_ATTRIBS_ARB = 16
         GL_MAX_PROGRAM_NATIVE_ATTRIBS_ARB = 32
-        GL_MAX_PROGRAM_ADDRESS_REGISTERS_ARB = 1
+        GL_MAX_PROGRAM_ADDRESS_REGISTERS_ARB = 16
         GL_MAX_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB = 16
         GL_MAX_PROGRAM_LOCAL_PARAMETERS_ARB = 4096
         GL_MAX_PROGRAM_ENV_PARAMETERS_ARB = 256
     GL_FRAGMENT_PROGRAM_ARB:
-        GL_MAX_PROGRAM_INSTRUCTIONS_ARB = 16384
+        GL_MAX_PROGRAM_INSTRUCTIONS_ARB = 1048576
         GL_MAX_PROGRAM_NATIVE_INSTRUCTIONS_ARB = 1048576
         GL_MAX_PROGRAM_TEMPORARIES_ARB = 256
         GL_MAX_PROGRAM_NATIVE_TEMPORARIES_ARB = 256
         GL_MAX_PROGRAM_PARAMETERS_ARB = 32384
         GL_MAX_PROGRAM_NATIVE_PARAMETERS_ARB = 32384
-        GL_MAX_PROGRAM_ATTRIBS_ARB = 12
+        GL_MAX_PROGRAM_ATTRIBS_ARB = 32
         GL_MAX_PROGRAM_NATIVE_ATTRIBS_ARB = 32
-        GL_MAX_PROGRAM_ADDRESS_REGISTERS_ARB = 0
+        GL_MAX_PROGRAM_ADDRESS_REGISTERS_ARB = 16
         GL_MAX_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB = 16
         GL_MAX_PROGRAM_LOCAL_PARAMETERS_ARB = 4096
         GL_MAX_PROGRAM_ENV_PARAMETERS_ARB = 256
-        GL_MAX_PROGRAM_ALU_INSTRUCTIONS_ARB = 16384
-        GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB = 16384
-        GL_MAX_PROGRAM_TEX_INDIRECTIONS_ARB = 16384
+        GL_MAX_PROGRAM_ALU_INSTRUCTIONS_ARB = 1048576
+        GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB = 1048576
+        GL_MAX_PROGRAM_TEX_INDIRECTIONS_ARB = 1048576
         GL_MAX_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB = 1048576
         GL_MAX_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB = 1048576
         GL_MAX_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB = 1048576

I suspect that this is causing a buffer overflow somewhere.
Comment 3 Jose Fonseca 2013-05-14 15:58:28 UTC
The problem is that st_init_limits is called *after* _mesa_init_buffer_objects.

(gdb) break st_init_limits
Breakpoint 1 at 0x7ffff67e8f6e: file src/mesa/state_tracker/st_extensions.c, line 69.
(gdb) break _mesa_init_buffer_objects
Breakpoint 2 at 0x7ffff68206cb: file src/mesa/main/bufferobj.c, line 610.
(gdb) r
Starting program: /usr/bin/glxinfo -l
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
name of display: :0.0

Breakpoint 2, _mesa_init_buffer_objects (ctx=0x6931f0) at src/mesa/main/bufferobj.c:610
warning: Source file is more recent than executable.
610	   memset(&DummyBufferObject, 0, sizeof(DummyBufferObject));
(gdb) c
Continuing.

Breakpoint 1, st_init_limits (st=0x6e92a0) at src/mesa/state_tracker/st_extensions.c:69
warning: Source file is more recent than executable.
69	   struct pipe_screen *screen = st->pipe->screen;
(gdb) 


Not sure what's the best way of fixing this, but I'll go ahead and commit a workaround, as this affects all apps:

commit a149f9d4c792455efd46af46093f61a9144451af
Author: José Fonseca <jfonseca@vmware.com>
Date:   Tue May 14 16:55:56 2013 +0100

    mesa/st: Workaround fdo bug 64568.
    
    Effectively reverting the problematic hunk of
    commit 614ee25077b7ffafeb87b22563d01856824fb4bc

diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c
index b64d363..982e652 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -279,10 +279,15 @@ void st_init_limits(struct st_context *st)
       st->ctx->Extensions.ARB_uniform_buffer_object = GL_TRUE;
       c->UniformBufferOffsetAlignment =
          screen->get_param(screen, PIPE_CAP_CONSTANT_BUFFER_OFFSET_ALIGNMENT);
+      /* FIXME: _mesa_init_buffer_objects() already has been, and
+       * ctx->UniformBufferBindings allocated, so unfortunately we can't just
+       * change MaxUniformBufferBindings a posteriori. */
+#if 0
       c->MaxCombinedUniformBlocks = c->MaxUniformBufferBindings =
          c->VertexProgram.MaxUniformBlocks +
          c->GeometryProgram.MaxUniformBlocks +
          c->FragmentProgram.MaxUniformBlocks;
+#endif
    }
 }
 


I'm also surprised this didn't affect r600g. I see thousands of regressions with piglit on softpipe/llvmpipe. Does the same not happen with r600g?
Comment 4 Jose Fonseca 2013-05-14 16:09:51 UTC
(In reply to comment #3)
> I'm also surprised this didn't affect r600g. I see thousands of regressions
> with piglit on softpipe/llvmpipe. Does the same not happen with r600g?

I suspect that r600g advertises less than 12 (i.e, 36 total), so instead of crashing it would merely leak memory.

Anyway, this is as far as I'll go on this issue (my immediate concern was merely to prevent the regression, and I think I addressed it with the temporary workaround).

 I'll leave to you (Marek & Brian) to find a proper fix.

And thanks for reporting Vinson!
Comment 5 Marek Olšák 2013-05-14 16:17:51 UTC
This should fix it:
http://lists.freedesktop.org/archives/mesa-dev/2013-May/039360.html

It really depends on how many constant buffers you expose. r600 and i965 expose 13 constant buffers (12 uniform buffers) from 2 shader stages. The core Mesa limit is 36 uniform buffers, anything greater than that will crash.

A side note: Both softpipe and llvmpipe expose 31 uniform buffers for each of the vertex, geometry, and fragment shader. If they don't actually fully implement the geometry shader, they shouldn't expose any limits for it.
Comment 6 Vinson Lee 2013-07-04 20:46:29 UTC
mesa: f3bbf65929e395360e5565d08d015977dd5b79fa (master)

commit 15a4b6db2192b0adc05c3dc07cf043316c556f2e
Author: Marek Olšák <maraeo@gmail.com>
Date:   Tue May 14 17:58:32 2013 +0200

    mesa: declare UniformBufferBindings as an array with a static size
    
    Some Gallium drivers were crashing, because the array was not large enough.
    
    v2: clamp the per-shader maximum in st/mesa, then sum them all up
    
    NOTE: This is a candidate for the stable branches.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.