Bug 107508 - Crash in st_renderbuffer_delete()
Summary: Crash in st_renderbuffer_delete()
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-07 08:38 UTC by Olivier Fourdan
Modified: 2019-05-07 05:37 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Simple reproducer program (8.50 KB, text/plain)
2018-08-07 08:38 UTC, Olivier Fourdan
Details
Simple reproducer program (5.04 KB, text/plain)
2018-08-07 11:46 UTC, Olivier Fourdan
Details

Description Olivier Fourdan 2018-08-07 08:38:03 UTC
Created attachment 140994 [details]
Simple reproducer program

Description:

Some X11/GLX clients can cause a crash of the Xserver in Mesa code.

How reproducible:

Always

Steps to reproduce:

1. Run a Xephyr Xserver wit hglamor enabled:

   $ Xephyr -glamor :12

2. Save and build the attached reproducer:

   $ gcc -g gl3.cxx -o gl3 $(pkg-config --cflags --libs  x11) -lGL -lGLU

   (this is based on https://www.khronos.org/opengl/wiki/Tutorial:_OpenGL_3.0_Context_Creation_(GLX) modified to use indirect contexts)

3. Run the reproducer against the Xephyr display:

   $ DISPLAY=:12 ./gl3

Actual result:

Crash of the Xserver in Mesa code because of a NULL pointer dereference (see below).

Expected result:

No crash

Additional data:

This is not a new issue and it seems it's been around for quite some time, yet it's fairly rare to trigger because it involves trying to use an indirect GLX context.

One noticeable occurrence of this issue is "Slack" being run from "snap":

  https://bugzilla.redhat.com/1611140
  https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1762971
  https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/1754693
  etc.

I posted a (simple) patch that fixes the issue here:

https://patchwork.freedesktop.org/series/47617/

But wanted to get a bit further in my investigation and possibly come up with a simple[r] reproducer (see “Steps to reproduce” above).

Crash occurs in `st_renderbuffer_delete()` because cts->st in NULL:

Thread 1 "Xephyr" received signal SIGSEGV, Segmentation fault.
in st_renderbuffer_delete () at state_tracker/st_cb_fbo.c:241
241 pipe_surface_release(st->pipe, &strb->surface_srgb);
(gdb) bt
#0  0x00007fffe73054d0 in st_renderbuffer_delete (ctx=0x9c6020, rb=0xfc7320) at state_tracker/st_cb_fbo.c:241
#1  0x00007fffe726a931 in _mesa_reference_renderbuffer_ (ptr=ptr@entry=0xfc6fd8, rb=rb@entry=0x0) at main/renderbuffer.c:212
#2  0x00007fffe71f96ca in _mesa_reference_renderbuffer (rb=0x0, ptr=0xfc6fd8) at main/renderbuffer.h:72
#3  _mesa_free_framebuffer_data (fb=fb@entry=0xfc6e90) at main/framebuffer.c:229
#4  0x00007fffe71f971e in _mesa_destroy_framebuffer (fb=0xfc6e90) at main/framebuffer.c:207
#5  0x00007fffe71f97c9 in _mesa_reference_framebuffer_ (ptr=ptr@entry=0xe3d4b0, fb=fb@entry=0x0) at main/framebuffer.c:265
#6  0x00007fffe7160782 in _mesa_reference_framebuffer (fb=0x0, ptr=0xe3d4b0) at main/framebuffer.h:63
#7  _mesa_free_context_data (ctx=ctx@entry=0xe3d3a0) at main/context.c:1326
#8  0x00007fffe7311ef5 in st_destroy_context (st=0xfc0eb0) at state_tracker/st_context.c:653
#9  0x00007fffe74e9cb9 in dri_destroy_context () at dri_context.c:239
#10 0x00007fffe74e8c43 in driDestroyContext (pcp=0x895430) at dri_util.c:524
#11 0x00000000005110c9 in __glXDRIcontextDestroy (baseContext=0x895360) at glxdriswrast.c:132
#12 0x000000000050fe3b in __glXFreeContext (cx=0x895360) at glxext.c:190
#13 ContextGone (cx=0x895360, id=<optimized out>) at glxext.c:82
#14 0x0000000000468f7d in doFreeResource (res=0xfc6bc0, skip=0) at resource.c:880
#15 0x0000000000469be5 in FreeResourceByType (id=<optimized out>, type=<optimized out>, skipFree=<optimized out>) at resource.c:941
#16 0x0000000000514fa1 in __glXDisp_DestroyContext (cl=<optimized out>, pc=0xdd9440 "\225\004\002") at glxcmds.c:437
#17 0x000000000052ebc8 in dispatch_DestroyContext (client=0xadbf70) at vnd_dispatch_stubs.c:86
#18 0x00000000004450e0 in Dispatch () at dispatch.c:478
#19 0x0000000000448fe6 in dix_main (argc=3, argv=0x7fffffffced8, envp=<optimized out>) at main.c:276
#20 0x00007ffff32b024b in __libc_start_main () from /lib64/libc.so.6
#21 0x000000000042ba5a in _start () at ephyrinit.c:51
(gdb) p st
$1 = (struct st_context *) 0x0

That context `ctx=0x9c6020` with the `ctx->st` == `NULL` was created by glamor for the screen pixmap:

Thread 1 "Xephyr" hit Breakpoint 3, _mesa_init_renderbuffer (rb=0xadc910, name=4294967295) at main/renderbuffer.c:41
41	   GET_CURRENT_CONTEXT(ctx);
(gdb) bt
#0  _mesa_init_renderbuffer (rb=0xadc910, name=4294967295) at main/renderbuffer.c:41
#1  0x00007fffedec21bc in intel_new_renderbuffer (ctx=<optimized out>, name=4294967295) at intel_fbo.c:506
#2  0x00007fffedb1aa3b in _mesa_update_texture_renderbuffer (ctx=0x9c6020, fb=0xadd910, att=0xaddb68) at main/fbobject.c:459
#3  0x00007fffedb1dc90 in set_texture_attachment (layered=0 '\000', layer=0, level=0, texTarget=<optimized out>, texObj=0xab5630, att=0xaddb68, 
    fb=0xadd910, ctx=0x9c6020) at main/fbobject.c:528
#4  _mesa_framebuffer_texture (ctx=0x9c6020, fb=0xadd910, attachment=36064, att=0xaddb68, texObj=0xab5630, textarget=<optimized out>, 
    level=<optimized out>, layer=<optimized out>, layered=<optimized out>) at main/fbobject.c:3516
#5  0x00007fffedb1e017 in framebuffer_texture_with_dims (dims=dims@entry=2, target=<optimized out>, attachment=36064, textarget=3553, 
    texture=<optimized out>, level=0, layer=0, caller=0x7fffee1a084f "glFramebufferTexture2D") at main/fbobject.c:3614
#6  0x00007fffedb1e2d4 in _mesa_FramebufferTexture2D (target=<optimized out>, attachment=<optimized out>, textarget=<optimized out>, 
    texture=<optimized out>, level=<optimized out>) at main/fbobject.c:3652
#7  0x00000000004ae23d in glamor_pixmap_ensure_fb (glamor_priv=glamor_priv@entry=0xa0a760, fbo=fbo@entry=0xad9ea0) at glamor_fbo.c:59
#8  0x00000000004ae5fb in glamor_create_fbo_from_tex (glamor_priv=glamor_priv@entry=0xa0a760, w=w@entry=640, h=h@entry=480, 
    format=format@entry=6408, tex=1, flag=flag@entry=261) at glamor_fbo.c:112
#9  0x00000000004ae64d in glamor_create_fbo (glamor_priv=glamor_priv@entry=0xa0a760, w=w@entry=640, h=h@entry=480, format=format@entry=6408, 
    flag=flag@entry=261) at glamor_fbo.c:166
#10 0x00000000004956c1 in glamor_create_pixmap (screen=0x869fe0, w=640, h=480, depth=24, usage=261) at glamor.c:222
#11 0x0000000000431d20 in ephyr_glamor_create_screen_resources (pScreen=pScreen@entry=0x869fe0) at hostx.c:1623
#12 0x000000000042ce06 in ephyrCreateResources (pScreen=0x869fe0) at ephyr.c:729
#13 0x0000000000448ea9 in dix_main (argc=3, argv=0x7fffffffced8, envp=<optimized out>) at main.c:213
#14 0x00007ffff32b024b in __libc_start_main () from /lib64/libc.so.6
#15 0x000000000042ba5a in _start () at ephyrinit.c:51
Comment 1 Olivier Fourdan 2018-08-07 08:43:00 UTC
Note that I checked with both gdb/breakpoints and valgrind as well, it is not a double free or memory corruption.
Comment 2 Olivier Fourdan 2018-08-07 08:59:34 UTC
Important (sorry I forgot), to reproduce, it requires Ajax' patch:

https://patchwork.freedesktop.org/series/47686/

(Otherwise the reproducer dies with a [xcb] “Unknown sequence number while processing queue”)
Comment 3 Olivier Fourdan 2018-08-07 11:46:39 UTC
Created attachment 141001 [details]
Simple reproducer program

Simpler version of the reproducer which doesn't need any patch in Mesa.
Comment 4 Olivier Fourdan 2018-09-05 12:33:32 UTC
Humm, back to this...

Some more observations:

1. The issue does not occur with direct context
2. The issue does not occur if the reproducer does _not_ call `glXMakeCurrent(display, 0, 0)` prior to `glXDestroyContext(display, ctx)`

The reason for #2 is because `ContextGone()` in xserver/glx/glxext.c only call `__glXFreeContext()` if `cx->currentClient` is true:

```
 76 ContextGone(__GLXcontext * cx, XID id)
 77 {
 78     if (!cx)
 79         return TRUE;
 80 
 81     if (!cx->currentClient)
 82         __glXFreeContext(cx);
 83 
 84     return TRUE;
 85 }
 86 
```

So in gdb I added a breakpoint in `__glXFreeContext()` and ran the reproducer with a direct and an indirect context, to see the difference, which gives:

A. With a `direct` context:

Thread 1 "Xwayland" hit Breakpoint 1, __glXFreeContext (cx=0x2acf000) at glxext.c:173
173	    if (cx->idExists || cx->currentClient)
(gdb) p *cx
$1 = {destroy = 0x4faf10 <__glXdirectContextDestroy>, makeCurrent = 0x0, loseCurrent = 0x4faf00 <__glXdirectContextLoseCurrent>, copy = 0x0, 
  wait = 0x0, bindTexImage = 0x0, releaseTexImage = 0x0, next = 0x0, config = 0x2776c90, pGlxScreen = 0x272fb60, currentClient = 0x0, 
  id = 4194308, share_id = 0, idExists = 0 '\000', isDirect = 1 '\001', renderMode = 7168, resetNotificationStrategy = 33377, 
  releaseBehavior = 0, feedbackBuf = 0x0, feedbackBufSize = 0, selectBuf = 0x0, selectBufSize = 0, largeCmdBytesSoFar = 0, 
  largeCmdBytesTotal = 0, largeCmdRequestsSoFar = 0, largeCmdRequestsTotal = 0, largeCmdBuf = 0x0, largeCmdBufSize = 0, drawPriv = 0x0, 
  readPriv = 0x0}


B. With an `indirect` context:

(gdb) p *cx
$2 = {destroy = 0x4f8a60 <__glXDRIcontextDestroy>, makeCurrent = 0x4f8680 <__glXDRIcontextMakeCurrent>, 
  loseCurrent = 0x4f86b0 <__glXDRIcontextLoseCurrent>, copy = 0x4f86d0 <__glXDRIcontextCopy>, wait = 0x0, 
  bindTexImage = 0x4f86f0 <__glXDRIbindTexImage>, releaseTexImage = 0x4f8750 <__glXDRIreleaseTexImage>, next = 0x0, config = 0x2776c90, 
  pGlxScreen = 0x272fb60, currentClient = 0x0, id = 4194308, share_id = 0, idExists = 0 '\000', isDirect = 0 '\000', renderMode = 7168, 
  resetNotificationStrategy = 33377, releaseBehavior = 0, feedbackBuf = 0x0, feedbackBufSize = 0, selectBuf = 0x0, selectBufSize = 0, 
  largeCmdBytesSoFar = 0, largeCmdBytesTotal = 0, largeCmdRequestsSoFar = 0, largeCmdRequestsTotal = 0, largeCmdBuf = 0x0, largeCmdBufSize = 0, 
  drawPriv = 0x0, readPriv = 0x0}

So the indirect context calls __glXDRIcontextDestroy() from swrast:

(gdb) list *cx->destroy
0x4f8a60 is in __glXDRIcontextDestroy (glxdriswrast.c:132).
127	__glXDRIcontextDestroy(__GLXcontext * baseContext)
128	{
129	    __GLXDRIcontext *context = (__GLXDRIcontext *) baseContext;
130	    __GLXDRIscreen *screen = (__GLXDRIscreen *) context->base.pGlxScreen;
131	
132	    (*screen->core->destroyContext) (context->driContext);
133	    __glXContextDestroy(&context->base);
134	    free(context);
135	}
136

The reason for this is `__glXDisp_CreateContextAttribsARB()` from xserver/glx/createcontext.c which does:

```
317      */
318     if (req->isDirect) {
319         ctx = __glXdirectContextCreate(glxScreen, config, shareCtx);
320         err = BadAlloc;
321     }
322     else {
323         ctx = glxScreen->createContext(glxScreen, config, shareCtx,
324                                        req->numAttribs, (uint32_t *) attribs,
325                                        &err);
326     }
327 
```
Comment 5 Olivier Fourdan 2018-09-05 13:00:13 UTC
And now I suspect this might be an xserver issue, crating indirect context should fail unless enableIndirectGLX is true... 

`DoCreateContext()` does that but  `__glXDisp_CreateContextAttribsARB()` doesn't
Comment 6 Olivier Fourdan 2018-09-05 13:23:49 UTC
So for the Xserver part, I posted https://patchwork.freedesktop.org/series/49182/ which will avoid the issue by not allowing the indirect conext creation, but chances are this particular issue/crash would still occur in Mesa if indirect contexts are re-enabled in the Xserver some day.
Comment 7 Timothy Arceri 2019-05-07 05:37:11 UTC
Fixed accepted in xserver. Closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.