Created attachment 122625 [details]
program to reproduce the bug
I've been trying to catch a flaky bug when with the following scenario:
- Create useful_window
- Create dummy_window
- Create glxcontext
- glXMakeCurrent(dpy, dummy_window, glxcontext)
- get opengl info
- glXMakeCurrent(dpy, None, NULL)
- XDestroyWindow(dpy, dummy_window)
- glXMakeCurrent(dpy, useful_window, glxcontext)
- render stuff
For the record, I've been debugging this because this is roughly what Qt5 does when creating a new OpenGL widget (it creates a dummy window just to get the OpenGL version)
Using the attached C program I could reproduce it most of the time when running with LIBGL_ALWAYS_INDIRECT=1, mesa 11.1.2 and Xephyr as the X11 server.
After using gdb and printing a lot of debug info, in summary this is what happens:
- glXMakeCurrent(dpy, dummy_window, glxcontext) creates dri_drawable_dummy
- glXMakeCurrent(dpy, None, NULL) does nothing relevant in this case
- XDestroyWindow(dpy, dummy_window) destroys the dri_drawable_dummy
- glXMakeCurrent(dpy, useful_window, glxcontext) creates dri_drawable_useful, which sometimes happens to have the same address as dri_drawable_dummy (which should be ok)
- st_api_make_current calls st_framebuffer_reuse_or_create
- st_framebuffer_reuse_or_create see that st->ctx->WinSysDrawBuffer->iface == stfbi and then just reuses the existing fb
- then when reusing this framebuffer the stamp contains the value that was set when the context was bound to the dummy_window, and nothing is rendered
Please be aware that this bug is not consistent as it depends on the second dri_drawable being allocated at the same address of the destroyed one. As I got tired of waiting for the bug to happen on each run, I created a patch that reproduces the exact behaviour (basically making the dri_drawable static)
I'm also attaching the Xephyr output with my debug messages. Sorry for the mess, but basically contains the functions being executed and its parameters indented.
I'm willing to fix this bug myself but I need some advice as I've got stuck here. I appreciate any help.
Created attachment 122626 [details] [review]
BE CAREFUL - patch that makes the bug consistent
Created attachment 122627 [details]
Xephyr output with lots of printf's from a single run of attached program
Just to add that I could also reproduce the bug with direct rendering. Of course, with the patch that forces the second dri_drawable to be allocated on the same address of the first one
(In reply to Guilherme from comment #3)
> Just to add that I could also reproduce the bug with direct rendering. Of
> course, with the patch that forces the second dri_drawable to be allocated
> on the same address of the first one
Ok, I was wrong. With direct rendering the bug cannot happen because the dri_drawable destruction is postponed and is done only after the second dri_drawable is created.
But then I raise the question should the fix be done on the Xorg server or Mesa? As this is an implementation detail I think that the fix should be in Mesa, probably finding a way to make sure that a dri_drawable will no be freed before the framebuffer associated to it.
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1003.