Created attachment 140994 [details] Simple reproducer program Description: Some X11/GLX clients can cause a crash of the Xserver in Mesa code. How reproducible: Always Steps to reproduce: 1. Run a Xephyr Xserver wit hglamor enabled: $ Xephyr -glamor :12 2. Save and build the attached reproducer: $ gcc -g gl3.cxx -o gl3 $(pkg-config --cflags --libs x11) -lGL -lGLU (this is based on https://www.khronos.org/opengl/wiki/Tutorial:_OpenGL_3.0_Context_Creation_(GLX) modified to use indirect contexts) 3. Run the reproducer against the Xephyr display: $ DISPLAY=:12 ./gl3 Actual result: Crash of the Xserver in Mesa code because of a NULL pointer dereference (see below). Expected result: No crash Additional data: This is not a new issue and it seems it's been around for quite some time, yet it's fairly rare to trigger because it involves trying to use an indirect GLX context. One noticeable occurrence of this issue is "Slack" being run from "snap": https://bugzilla.redhat.com/1611140 https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1762971 https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/1754693 etc. I posted a (simple) patch that fixes the issue here: https://patchwork.freedesktop.org/series/47617/ But wanted to get a bit further in my investigation and possibly come up with a simple[r] reproducer (see “Steps to reproduce” above). Crash occurs in `st_renderbuffer_delete()` because cts->st in NULL: Thread 1 "Xephyr" received signal SIGSEGV, Segmentation fault. in st_renderbuffer_delete () at state_tracker/st_cb_fbo.c:241 241 pipe_surface_release(st->pipe, &strb->surface_srgb); (gdb) bt #0 0x00007fffe73054d0 in st_renderbuffer_delete (ctx=0x9c6020, rb=0xfc7320) at state_tracker/st_cb_fbo.c:241 #1 0x00007fffe726a931 in _mesa_reference_renderbuffer_ (ptr=ptr@entry=0xfc6fd8, rb=rb@entry=0x0) at main/renderbuffer.c:212 #2 0x00007fffe71f96ca in _mesa_reference_renderbuffer (rb=0x0, ptr=0xfc6fd8) at main/renderbuffer.h:72 #3 _mesa_free_framebuffer_data (fb=fb@entry=0xfc6e90) at main/framebuffer.c:229 #4 0x00007fffe71f971e in _mesa_destroy_framebuffer (fb=0xfc6e90) at main/framebuffer.c:207 #5 0x00007fffe71f97c9 in _mesa_reference_framebuffer_ (ptr=ptr@entry=0xe3d4b0, fb=fb@entry=0x0) at main/framebuffer.c:265 #6 0x00007fffe7160782 in _mesa_reference_framebuffer (fb=0x0, ptr=0xe3d4b0) at main/framebuffer.h:63 #7 _mesa_free_context_data (ctx=ctx@entry=0xe3d3a0) at main/context.c:1326 #8 0x00007fffe7311ef5 in st_destroy_context (st=0xfc0eb0) at state_tracker/st_context.c:653 #9 0x00007fffe74e9cb9 in dri_destroy_context () at dri_context.c:239 #10 0x00007fffe74e8c43 in driDestroyContext (pcp=0x895430) at dri_util.c:524 #11 0x00000000005110c9 in __glXDRIcontextDestroy (baseContext=0x895360) at glxdriswrast.c:132 #12 0x000000000050fe3b in __glXFreeContext (cx=0x895360) at glxext.c:190 #13 ContextGone (cx=0x895360, id=<optimized out>) at glxext.c:82 #14 0x0000000000468f7d in doFreeResource (res=0xfc6bc0, skip=0) at resource.c:880 #15 0x0000000000469be5 in FreeResourceByType (id=<optimized out>, type=<optimized out>, skipFree=<optimized out>) at resource.c:941 #16 0x0000000000514fa1 in __glXDisp_DestroyContext (cl=<optimized out>, pc=0xdd9440 "\225\004\002") at glxcmds.c:437 #17 0x000000000052ebc8 in dispatch_DestroyContext (client=0xadbf70) at vnd_dispatch_stubs.c:86 #18 0x00000000004450e0 in Dispatch () at dispatch.c:478 #19 0x0000000000448fe6 in dix_main (argc=3, argv=0x7fffffffced8, envp=<optimized out>) at main.c:276 #20 0x00007ffff32b024b in __libc_start_main () from /lib64/libc.so.6 #21 0x000000000042ba5a in _start () at ephyrinit.c:51 (gdb) p st $1 = (struct st_context *) 0x0 That context `ctx=0x9c6020` with the `ctx->st` == `NULL` was created by glamor for the screen pixmap: Thread 1 "Xephyr" hit Breakpoint 3, _mesa_init_renderbuffer (rb=0xadc910, name=4294967295) at main/renderbuffer.c:41 41 GET_CURRENT_CONTEXT(ctx); (gdb) bt #0 _mesa_init_renderbuffer (rb=0xadc910, name=4294967295) at main/renderbuffer.c:41 #1 0x00007fffedec21bc in intel_new_renderbuffer (ctx=<optimized out>, name=4294967295) at intel_fbo.c:506 #2 0x00007fffedb1aa3b in _mesa_update_texture_renderbuffer (ctx=0x9c6020, fb=0xadd910, att=0xaddb68) at main/fbobject.c:459 #3 0x00007fffedb1dc90 in set_texture_attachment (layered=0 '\000', layer=0, level=0, texTarget=<optimized out>, texObj=0xab5630, att=0xaddb68, fb=0xadd910, ctx=0x9c6020) at main/fbobject.c:528 #4 _mesa_framebuffer_texture (ctx=0x9c6020, fb=0xadd910, attachment=36064, att=0xaddb68, texObj=0xab5630, textarget=<optimized out>, level=<optimized out>, layer=<optimized out>, layered=<optimized out>) at main/fbobject.c:3516 #5 0x00007fffedb1e017 in framebuffer_texture_with_dims (dims=dims@entry=2, target=<optimized out>, attachment=36064, textarget=3553, texture=<optimized out>, level=0, layer=0, caller=0x7fffee1a084f "glFramebufferTexture2D") at main/fbobject.c:3614 #6 0x00007fffedb1e2d4 in _mesa_FramebufferTexture2D (target=<optimized out>, attachment=<optimized out>, textarget=<optimized out>, texture=<optimized out>, level=<optimized out>) at main/fbobject.c:3652 #7 0x00000000004ae23d in glamor_pixmap_ensure_fb (glamor_priv=glamor_priv@entry=0xa0a760, fbo=fbo@entry=0xad9ea0) at glamor_fbo.c:59 #8 0x00000000004ae5fb in glamor_create_fbo_from_tex (glamor_priv=glamor_priv@entry=0xa0a760, w=w@entry=640, h=h@entry=480, format=format@entry=6408, tex=1, flag=flag@entry=261) at glamor_fbo.c:112 #9 0x00000000004ae64d in glamor_create_fbo (glamor_priv=glamor_priv@entry=0xa0a760, w=w@entry=640, h=h@entry=480, format=format@entry=6408, flag=flag@entry=261) at glamor_fbo.c:166 #10 0x00000000004956c1 in glamor_create_pixmap (screen=0x869fe0, w=640, h=480, depth=24, usage=261) at glamor.c:222 #11 0x0000000000431d20 in ephyr_glamor_create_screen_resources (pScreen=pScreen@entry=0x869fe0) at hostx.c:1623 #12 0x000000000042ce06 in ephyrCreateResources (pScreen=0x869fe0) at ephyr.c:729 #13 0x0000000000448ea9 in dix_main (argc=3, argv=0x7fffffffced8, envp=<optimized out>) at main.c:213 #14 0x00007ffff32b024b in __libc_start_main () from /lib64/libc.so.6 #15 0x000000000042ba5a in _start () at ephyrinit.c:51
Note that I checked with both gdb/breakpoints and valgrind as well, it is not a double free or memory corruption.
Important (sorry I forgot), to reproduce, it requires Ajax' patch: https://patchwork.freedesktop.org/series/47686/ (Otherwise the reproducer dies with a [xcb] “Unknown sequence number while processing queue”)
Created attachment 141001 [details] Simple reproducer program Simpler version of the reproducer which doesn't need any patch in Mesa.
Humm, back to this... Some more observations: 1. The issue does not occur with direct context 2. The issue does not occur if the reproducer does _not_ call `glXMakeCurrent(display, 0, 0)` prior to `glXDestroyContext(display, ctx)` The reason for #2 is because `ContextGone()` in xserver/glx/glxext.c only call `__glXFreeContext()` if `cx->currentClient` is true: ``` 76 ContextGone(__GLXcontext * cx, XID id) 77 { 78 if (!cx) 79 return TRUE; 80 81 if (!cx->currentClient) 82 __glXFreeContext(cx); 83 84 return TRUE; 85 } 86 ``` So in gdb I added a breakpoint in `__glXFreeContext()` and ran the reproducer with a direct and an indirect context, to see the difference, which gives: A. With a `direct` context: Thread 1 "Xwayland" hit Breakpoint 1, __glXFreeContext (cx=0x2acf000) at glxext.c:173 173 if (cx->idExists || cx->currentClient) (gdb) p *cx $1 = {destroy = 0x4faf10 <__glXdirectContextDestroy>, makeCurrent = 0x0, loseCurrent = 0x4faf00 <__glXdirectContextLoseCurrent>, copy = 0x0, wait = 0x0, bindTexImage = 0x0, releaseTexImage = 0x0, next = 0x0, config = 0x2776c90, pGlxScreen = 0x272fb60, currentClient = 0x0, id = 4194308, share_id = 0, idExists = 0 '\000', isDirect = 1 '\001', renderMode = 7168, resetNotificationStrategy = 33377, releaseBehavior = 0, feedbackBuf = 0x0, feedbackBufSize = 0, selectBuf = 0x0, selectBufSize = 0, largeCmdBytesSoFar = 0, largeCmdBytesTotal = 0, largeCmdRequestsSoFar = 0, largeCmdRequestsTotal = 0, largeCmdBuf = 0x0, largeCmdBufSize = 0, drawPriv = 0x0, readPriv = 0x0} B. With an `indirect` context: (gdb) p *cx $2 = {destroy = 0x4f8a60 <__glXDRIcontextDestroy>, makeCurrent = 0x4f8680 <__glXDRIcontextMakeCurrent>, loseCurrent = 0x4f86b0 <__glXDRIcontextLoseCurrent>, copy = 0x4f86d0 <__glXDRIcontextCopy>, wait = 0x0, bindTexImage = 0x4f86f0 <__glXDRIbindTexImage>, releaseTexImage = 0x4f8750 <__glXDRIreleaseTexImage>, next = 0x0, config = 0x2776c90, pGlxScreen = 0x272fb60, currentClient = 0x0, id = 4194308, share_id = 0, idExists = 0 '\000', isDirect = 0 '\000', renderMode = 7168, resetNotificationStrategy = 33377, releaseBehavior = 0, feedbackBuf = 0x0, feedbackBufSize = 0, selectBuf = 0x0, selectBufSize = 0, largeCmdBytesSoFar = 0, largeCmdBytesTotal = 0, largeCmdRequestsSoFar = 0, largeCmdRequestsTotal = 0, largeCmdBuf = 0x0, largeCmdBufSize = 0, drawPriv = 0x0, readPriv = 0x0} So the indirect context calls __glXDRIcontextDestroy() from swrast: (gdb) list *cx->destroy 0x4f8a60 is in __glXDRIcontextDestroy (glxdriswrast.c:132). 127 __glXDRIcontextDestroy(__GLXcontext * baseContext) 128 { 129 __GLXDRIcontext *context = (__GLXDRIcontext *) baseContext; 130 __GLXDRIscreen *screen = (__GLXDRIscreen *) context->base.pGlxScreen; 131 132 (*screen->core->destroyContext) (context->driContext); 133 __glXContextDestroy(&context->base); 134 free(context); 135 } 136 The reason for this is `__glXDisp_CreateContextAttribsARB()` from xserver/glx/createcontext.c which does: ``` 317 */ 318 if (req->isDirect) { 319 ctx = __glXdirectContextCreate(glxScreen, config, shareCtx); 320 err = BadAlloc; 321 } 322 else { 323 ctx = glxScreen->createContext(glxScreen, config, shareCtx, 324 req->numAttribs, (uint32_t *) attribs, 325 &err); 326 } 327 ```
And now I suspect this might be an xserver issue, crating indirect context should fail unless enableIndirectGLX is true... `DoCreateContext()` does that but `__glXDisp_CreateContextAttribsARB()` doesn't
So for the Xserver part, I posted https://patchwork.freedesktop.org/series/49182/ which will avoid the issue by not allowing the indirect conext creation, but chances are this particular issue/crash would still occur in Mesa if indirect contexts are re-enabled in the Xserver some day.
Fixed accepted in xserver. Closing.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.