Bug 65030 - parallel indirect GLX causes server crash (context switch bug)
Summary: parallel indirect GLX causes server crash (context switch bug)
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/Ext/GLX (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Adam Jackson
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-27 09:26 UTC by Pierre Ossman
Modified: 2014-04-10 15:50 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
simple test case (1.56 KB, text/plain)
2013-05-28 14:12 UTC, Pierre Ossman
no flags Details

Description Pierre Ossman 2013-05-27 09:26:11 UTC
Something is very, very wrong with the context switching code in the server side of GLX. If you torture the code with multiple OpenGL clients, it will very quickly end up in a situation where the dispatch table is NULL, and a segfault soon follows.

There used to be a safety net in place, but was removed because someone figured it wasn't needed:

http://cgit.freedesktop.org/xorg/xserver/commit/?id=b0c665ac0fe6840dda581e4d0d0b76c703d62a7b

I've patched back that safety net here locally to avoid the crash. But that of course doesn't fix the underlying context switching bug.


I initially observed this in a TigerVNC Xvnc based on xorg-server 1.14. But I can also reliably crash my Fedora 18 workstation, which is running 1.13.3.


My provocateur has been piglit:

LIBGL_ALWAYS_INDIRECT=1 ./piglit-run.py tests/all.tests results/all.results
Comment 1 Pierre Ossman 2013-05-28 14:12:09 UTC
Created attachment 79898 [details]
simple test case

I've managed to figure out the sequence of events that trigger this, and made a simple test case that will reliably crash the server.


The problem is in DrawableGone() in glx/glxext.c. When it clears out a drawable that is attached to an indirect context, it makes sure that context isn't current. It also clears out the X servers current context, if it happens to be the one we're dealing with.

The problem is that Mesa has no distinction between "current for this process/thread" and "current for some X11 client" like the X server does. So it calls back through glapi and clears the dispatch table, believing it is clearing the context for the current thread. That might be true in many cases, but it could also be clearing the context for an X client we were servicing some time ago.

So the fix is probably to always call __glXFlushContextCache() when clearing any context as the underlying DRI driver will most likely mess around in various ways with the active context.
Comment 2 Pierre Ossman 2013-09-23 13:20:09 UTC
Nothing? I'd figure a bug where a client can easily crash the X server would get more attention. :/
Comment 3 Chris Wilson 2013-09-23 14:17:52 UTC
diff --git a/glx/glapi.c b/glx/glapi.c
index ad7329e..29cfb9b 100644
--- a/glx/glapi.c
+++ b/glx/glapi.c
@@ -171,8 +171,11 @@ _glapi_set_dispatch(struct _glapi_table *dispatch)
     _glthread_SetTSD(&_gl_DispatchTSD, (void *) dispatch);
     _glapi_Dispatch = dispatch;
 #else /*THREADS*/
-        _glapi_Dispatch = dispatch;
+    _glapi_Dispatch = dispatch;
 #endif /*THREADS*/
+
+    if (dispatch == 0)
+       __glXFlushContextCache();
 }
Comment 4 Adam Jackson 2013-09-30 16:20:33 UTC
Patch series posted:

http://lists.freedesktop.org/archives/xorg-devel/2013-September/037957.html
Comment 5 Adam Jackson 2014-04-10 15:50:19 UTC
A different version of that series was eventually applied, this is fixed in at least 1,15 and later.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.