Bug 44919

Summary: Wayland clients segfault
Product: Mesa Reporter: Scott Moreau <oreaus>
Component: Mesa coreAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Scott Moreau 2012-01-18 19:42:05 UTC
I have been testing wayland on r300g and things have been working reasonably until now. I get a segfault when trying to start any weston client except simple-egl. Here is the back trace with mesa, cairo and weston built with -O0:

$ gdb ./clients/weston-desktop-shell 
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/scott/src/wayland/weston/clients/weston-desktop-shell...done.
(gdb) run
Starting program: /home/scott/src/wayland/weston/clients/weston-desktop-shell 
[Thread debugging using libthread_db enabled]
XDG_RUNTIME_DIR not set, falling back to .

Program received signal SIGSEGV, Segmentation fault.
0x00d9488d in st_framebuffer_validate (stfb=0x184b700, st=0x80e11d8) at state_tracker/st_manager.c:186
186	   int32_t new_stamp = p_atomic_read(&stfb->iface->stamp);
(gdb) bt
#0  0x00d9488d in st_framebuffer_validate (stfb=0x184b700, st=0x80e11d8) at state_tracker/st_manager.c:186
#1  0x00d958a3 in st_api_make_current (stapi=0x1802620, stctxi=0x80e11d8, stdrawi=0x0, streadi=0x0) at state_tracker/st_manager.c:731
#2  0x00d35ac6 in dri_make_current (cPriv=0x8073800, driDrawPriv=0x0, driReadPriv=0x0) at dri_context.c:216
#3  0x00d30bfc in driBindContext (pcp=0x8073800, pdp=0x0, prp=0x0) at ../../../../src/mesa/drivers/dri/common/dri_util.c:330
#4  0x00149efa in dri2_make_current (drv=0x8064910, disp=0x8063c38, dsurf=0x0, rsurf=0x0, ctx=0x8073858) at egl_dri2.c:818
#5  0x0013eb9b in eglMakeCurrent (dpy=0x8063c38, draw=0x0, read=0x0, ctx=0x8073858) at eglapi.c:502
#6  0x0028e4d2 in _egl_make_current_surfaceless (ctx=0x81bc548) at cairo-egl-context.c:127
#7  0x0028e5b9 in cairo_egl_device_create (dpy=0x8063c38, egl=0x8073858) at cairo-egl-context.c:160
#8  0x08050fe6 in init_egl (d=0x805ef00) at window.c:2787
#9  0x08051362 in display_create (argc=0xbffff350, argv=0xbffff354, option_entries=0x0) at window.c:2891
#10 0x0804c1f6 in main (argc=1, argv=0xbffff3f4) at desktop-shell.c:669
(gdb) bt full
#0  0x00d9488d in st_framebuffer_validate (stfb=0x184b700, st=0x80e11d8) at state_tracker/st_manager.c:186
        textures = {0x1842ff4, 0xbffff058, 0xd943dc, 0xbffff0ac, 0x184b700, 0x5f03, 0x184b700}
        width = 3221221512
        height = 25474816
        i = 3221221548
        changed = 0 '\000'
        new_stamp = 14242151
#1  0x00d958a3 in st_api_make_current (stapi=0x1802620, stctxi=0x80e11d8, stdrawi=0x0, streadi=0x0) at state_tracker/st_manager.c:731
        st = 0x80e11d8
        stdraw = 0x184b700
        stread = 0x184b700
        ret = 8 '\b'
#2  0x00d35ac6 in dri_make_current (cPriv=0x8073800, driDrawPriv=0x0, driReadPriv=0x0) at dri_context.c:216
        ctx = 0x80720c0
        draw = 0x0
        read = 0x0
        old_st = 0x80e11d8
#3  0x00d30bfc in driBindContext (pcp=0x8073800, pdp=0x0, prp=0x0) at ../../../../src/mesa/drivers/dri/common/dri_util.c:330
No locals.
#4  0x00149efa in dri2_make_current (drv=0x8064910, disp=0x8063c38, dsurf=0x0, rsurf=0x0, ctx=0x8073858) at egl_dri2.c:818
        dri2_drv = 0x8064910
        dri2_dpy = 0x8064b28
        dri2_dsurf = 0x0
        dri2_rsurf = 0x0
        dri2_ctx = 0x8073858
        old_ctx = 0x8073858
        old_dsurf = 0x0
        old_rsurf = 0x0
        ddraw = 0x0
        rdraw = 0x0
        cctx = 0x8073800
        __PRETTY_FUNCTION__ = "dri2_make_current"
#5  0x0013eb9b in eglMakeCurrent (dpy=0x8063c38, draw=0x0, read=0x0, ctx=0x8073858) at eglapi.c:502
        disp = 0x8063c38
        context = 0x8073858
        draw_surf = 0x0
        read_surf = 0x0
        drv = 0x8064910
        ret = 0
        __FUNCTION__ = "eglMakeCurrent"
#6  0x0028e4d2 in _egl_make_current_surfaceless (ctx=0x81bc548) at cairo-egl-context.c:127
        extensions = 0x80644a0 "EGL_MESA_drm_image EGL_WL_bind_wayland_display EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_renderbuffer_image EGL_KHR_surfaceless_gles1 EGL_KHR_surfaceless_gles2 EGL_KHR_surfacele"...
#7  0x0028e5b9 in cairo_egl_device_create (dpy=0x8063c38, egl=0x8073858) at cairo-egl-context.c:160
        ctx = 0x81bc548
        status = 134690904
        attribs = {12375, 1, 12374, 1, 12344}
        config = 0x123c40
        numConfigs = -1073745240
#8  0x08050fe6 in init_egl (d=0x805ef00) at window.c:2787
        major = 1
        minor = 4
        n = 1
        argb_cfg_attribs = {12339, 6, 12324, 1, 12323, 1, 12322, 1, 12321, 1, 12325, 1, 12352, 8, 12344}
        rgb_cfg_attribs = {12339, 6, 12324, 1, 12323, 1, 12322, 1, 12321, 0, 12325, 1, 12352, 8, 12344}
#9  0x08051362 in display_create (argc=0xbffff350, argv=0xbffff354, option_entries=0x0) at window.c:2891
        d = 0x805ef00
        context = 0x805e990
        xkb_option_group = 0x805e9c0
        error = 0xbffff348
#10 0x0804c1f6 in main (argc=1, argv=0xbffff3f4) at desktop-shell.c:669
        desktop = {display = 0x0, shell = 0x0, unlock_dialog = 0x0, unlock_task = {run = 0x804be34 <unlock_dialog_finish>, link = {prev = 0x0, 
              next = 0x0}}, outputs = {prev = 0xbffff31c, next = 0xbffff31c}}
        config_file = 0x76e324 ""
        output = 0xbffff348
(gdb) q
A debugging session is active.

	Inferior 1 [process 24323] will be killed.

Quit anyway? (y or n) y


After fiddling around a bit, I found a good mesa commit and bisected to arrive at the following:

c87247f6a8c5505fea3fa29dac372f9f5316a118 is the first bad commit
commit c87247f6a8c5505fea3fa29dac372f9f5316a118
Author: Brian Paul <brianp@vmware.com>
Date:   Fri Jan 6 12:42:40 2012 -0700

    mesa: remove gl_framebuffer:_DepthBuffer, _StencilBuffer fields
    
    These were used by swrast to make a combined depth+stencil buffer look
    like separate depth and stencil buffers.  But that's no longer needed
    after rewriting the depth/stencil code in swrast.
    
    Reviewed-by: Eric Anholt <eric@anholt.net>


I double checked and the previous commit does indeed work, while this one causes the issue. System is x86 32 bit with RV350. Kernel 2.6.38. Please let me know if any further information or testing is needed.
Comment 1 Scott Moreau 2012-01-18 19:47:46 UTC
Additionally, I've built mesa with the following configuration:

--with-egl-platforms=wayland,drm,x11 --disable-gallium-egl --with-dri-drivers="" --enable-gles1 --enable-gles2 --with-gallium-drivers=r300,swrast --enable-shared-glapi --enable-gbm

I've also tried --enable-gallium-egl with the same result, though the bt was using gallium paths.
Comment 2 Michel Dänzer 2012-01-19 01:22:32 UTC
It's hard to see how that commit could break anything. Have you made sure everything was rebuilt to match the new layout of struct gl_framebuffer, e.g. with make clean?
Comment 3 Scott Moreau 2012-01-19 04:18:56 UTC
Yes, I have a script that builds the entire stack from wayland to mesa,
cairo, weston and everything in between. For each component it does git
reset --hard origin/master as well as git clean -fdx and installs to a
nonstandard prefix. When I first found this bug, I removed the prefix and
built the entire stack fresh. I can reliably reproduce the issue or not by
toggling between the bad and previous commits respectively.
Comment 4 Damien Grassart 2012-01-20 02:51:07 UTC
Hi, I can confirm this issue also happens with a r600 card. Here's an example of my backtrace from the weston-desktop-shell client crashing:

#0  0x00007ffff4601fba in st_framebuffer_validate.isra.3 () from /home/damien/lib/dri/r600_dri.so
#1  0x00007ffff4603469 in st_api_make_current () from /home/damien/lib/dri/r600_dri.so
#2  0x00007ffff45bbe8f in driBindContext () from /home/damien/lib/dri/r600_dri.so
#3  0x00007ffff71bde90 in dri2_make_current () from /home/damien/lib/libEGL.so.1
#4  0x00007ffff71b6159 in eglMakeCurrent () from /home/damien/lib/libEGL.so.1
#5  0x00007ffff771e58d in cairo_egl_device_create () from /home/damien/lib/libcairo.so.2
#6  0x0000000000409545 in init_egl (d=0x620630) at window.c:2822
#7  display_create (argc=0x7fffffffde0c, argv=0x7fffffffde00, option_entries=<optimized out>) at window.c:2926
#8  0x00000000004040a6 in main (argc=1, argv=0x7fffffffdf58) at desktop-shell.c:672

When I build mesa from commit 21b28d520ff218d165e86aa71dbd02050a3aa0cd (just before the first bad commit), then it works fine.
Comment 5 Ran Benita 2012-01-22 01:46:17 UTC
I can also confirm the bad commit, with a different codebase than Wayland (but exactly the same mesa backtrace). I use nouveau.
Comment 6 Scott Moreau 2012-01-22 02:20:56 UTC
It might be useful if you post the backtrace, what program you're running
and give more details about your system etc.
Comment 7 Scott Moreau 2012-01-22 05:24:04 UTC
Another report from irc:

<stfacc> hi, whenever I try to run any wayland client I get a segfault
<stfacc> here is the bt http://dpaste.com/691450/
<stfacc> this happens for all clients using cairo (simple-egl works for example)

bt paste contents:

#0  st_framebuffer_validate (stfb=0x7ffff0acbce0, st=<optimized out>) at state_tracker/st_manager.c:186
#1  0x00007fffefc40e68 in st_api_make_current (stapi=<optimized out>, stctxi=0x7a36e0, stdrawi=<optimized out>, 
    streadi=<optimized out>) at state_tracker/st_manager.c:731
#2  0x00007fffefc0238f in driBindContext (pcp=<optimized out>, pdp=<optimized out>, prp=<optimized out>)
    at ../../../../src/mesa/drivers/dri/common/dri_util.c:330
#3  0x00007ffff55ac670 in dri2_make_current (drv=0x623120, disp=0x6223a0, dsurf=0x0, rsurf=0x0, ctx=0x62b3c0)
    at egl_dri2.c:818
#4  0x00007ffff55a5829 in eglMakeCurrent (dpy=0x6223a0, draw=0x0, read=0x0, ctx=0x62b3c0) at eglapi.c:502
#5  0x00007ffff61effcd in _egl_make_current_surfaceless (ctx=<optimized out>) at cairo-egl-context.c:127
#6  cairo_egl_device_create (dpy=0x6223a0, egl=0x62b3c0) at cairo-egl-context.c:160
#7  0x00000000004093f7 in init_egl (d=0x61d200) at window.c:2822
#8  display_create (argc=0x7fffffffdb1c, argv=0x7fffffffdb10, option_entries=<optimized out>) at window.c:2926
#9  0x0000000000404767 in main (argc=1, argv=0x7fffffffdc38) at gears.c:373
Comment 8 Ran Benita 2012-01-23 17:21:56 UTC
Sorry, here are some more details.

ran@ran:~$ uname -sr
Linux 3.2.1-1-ARCH
ran@ran:~$ lspci | grep nVi
01:00.0 VGA compatible controller: nVidia Corporation G94 [GeForce 9600 GT] (rev a1)
ran@ran:~$ glxinfo | grep nouveau -A3
OpenGL vendor string: nouveau
OpenGL renderer string: Gallium 0.4 on NV94
OpenGL version string: 2.1 Mesa 8.0-devel (git-c25e5300)
OpenGL shading language version string: 1.20

Mesa config:
--with-dri-drivers= --with-gallium-drivers=nouveau
--with-egl-platforms=drm,x11 --enable-gallium-egl
--enable-shared-dricore --enable-shared-glapi --enable-egl
--enable-gles2 --enable-glx-tls --enable-xcb --enable-texture-float

And the backtrace:
Core was generated by `./test_terminal'.
Program terminated with signal 11, Segmentation fault.
#0  st_framebuffer_validate (stfb=0x7f89888e1e60, st=<optimized out>) at state_tracker/st_manager.c:186
186	   int32_t new_stamp = p_atomic_read(&stfb->iface->stamp);
(gdb) bt
#0  st_framebuffer_validate (stfb=0x7f89888e1e60, st=<optimized out>) at state_tracker/st_manager.c:186
#1  0x00007f8987a5ca28 in st_api_make_current (stapi=<optimized out>, stctxi=0x1588910, stdrawi=<optimized out>, streadi=<optimized out>)
    at state_tracker/st_manager.c:731
#2  0x00007f89879b47cf in driBindContext (pcp=<optimized out>, pdp=<optimized out>, prp=<optimized out>)
    at ../../../../src/mesa/drivers/dri/common/dri_util.c:330
#3  0x00007f898c1aba60 in dri2_make_current (drv=0x14a4a70, disp=0x149eb20, dsurf=0x0, rsurf=0x0, ctx=0x14a5690) at egl_dri2.c:818
#4  0x00007f898c1a4d39 in eglMakeCurrent (dpy=0x149eb20, draw=0x0, read=0x0, ctx=0x14a5690) at eglapi.c:502
#5  0x00000000004065b2 in context_use (ctx=0x149c700) at src/output_context.c:589
#6  0x0000000000405206 in compositor_use (comp=0x146cf50) at src/output.c:936
#7  0x00000000004039e0 in setup_app (app=0x7fff094f6440) at tests/test_terminal.c:224
#8  0x0000000000403b98 in main (argc=1, argv=0x7fff094f6588) at tests/test_terminal.c:273

This only happens if eglMakeCurrent is called twice, which is the case in my program and in wayland also (e.g. there's a call to eglMakeCurrent followed by a call to cairo_egl_device_create, which also calls eglMakeCurrent).

Since we use the surfaceless extension the first call to st_manager.c:st_api_make_current uses an incomplete buffer as a dummy (I think?), so then:

(gdb) print stfb == &IncompleteFramebuffer 
$11 = 1

In the next call the following check at st_manager.c:730 :
if (stdraw && stread) {
passes but:

(gdb) print stfb->iface
$28 = (struct st_framebuffer_iface *) 0x0

So there's a null dereference. I'm not familiar with mesa so I can't help with a (correct) patch.
Comment 9 Alex Deucher 2012-01-24 06:15:48 UTC
Possible fix:
http://lists.freedesktop.org/archives/mesa-dev/2012-January/018029.html
Comment 10 Scott Moreau 2012-01-24 07:56:34 UTC
(In reply to comment #9)
> Possible fix:
> http://lists.freedesktop.org/archives/mesa-dev/2012-January/018029.html

I tested this patch and it solves the issue with weston clients here on r300g. Thanks Alex.
Comment 11 Benjamin Franzke 2012-01-25 01:23:46 UTC
The tested patch is committed as 36fb83e4a868e047521b3d5e0edc4d7a77a96aaf, closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.