Bug 41415

Summary: Attempting to share resources between a direct and indirect context causes server to crash.
Product: xorg Reporter: Rufus Hamade <rufus>
Component: Server/GeneralAssignee: Adam Jackson <ajax>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: high CC: jeremyhu
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard: 2011BRB_Reviewed
i915 platform: i915 features:
Attachments:
Description Flags
Test program to illustrate the problem none

Description Rufus Hamade 2011-10-03 03:01:27 UTC
Problem occured when testing mesa/xdemo/manywin application.  When this application runs out of memory attempting to create a new (direct) context, it attempts to fall back to indirect rendering.  (Or, at least it does with our GL library.)  This causes the X server to crash.

You can reproduce the problem easily by applying the following patch to manywin:
--- /home/rufus/code/mesa-demos-8.0.1/src/xdemos/manywin.c	2010-07-07 18:57:16.000000000 +0100
+++ manywin.c	2011-09-30 14:03:38.900588574 +0100
@@ -138,7 +138,7 @@
    else {
       /* share textures & dlists with 0th context */
       printf("sharing\n");
-      ctx = glXCreateContext(dpy, visinfo, Heads[0].Context, True);
+      ctx = glXCreateContext(dpy, visinfo, Heads[0].Context, False);
    }
    if (!ctx) {
       Error(displayName, "Couldn't create GLX context");
---
then running ./manywin 2

Bug occurs in glxcmds.c:DoCreateContext where it tries to create a new DRI context using a non-DRI shareglxc. 

Currently working on a fix.  I still need to test out the case where we try to create a direct context with an indirect shareglxc will fail in a similar way.  Though attempting to share resources between direct and indirect contexts seems a bad thing to do.
Comment 1 Michel Dänzer 2011-10-03 03:46:36 UTC
Please attach the Xorg.0.log file.

What does 'our GL library' refer to?
Comment 2 Rufus Hamade 2011-10-03 04:03:27 UTC
I'll attach a proper XOrg log when I can repro this at home with the latest server.  I should also be able to attach a proposed fix.

>> What does 'our GL library' refer to?
PowerVR/SGX.
Comment 3 Jeremy Huddleston Sequoia 2011-10-03 20:26:42 UTC
Increasing priority since this is a server crash.

Is this a regression?
Comment 4 Rufus Hamade 2011-10-04 01:25:56 UTC
Created attachment 51919 [details]
Test program to illustrate the problem

This program (based on manywin.c) illustrates the problem.

compile using something like:
 gcc -o heterogenous heterogenous.c -lGL

To crash the server:
heterogenous 1

To segv inside DRM lib
heterogenous 2

heterogenous 3 
and 
heterogenous 4
currently work.
Comment 5 Rufus Hamade 2011-10-04 01:28:37 UTC
(In reply to comment #3)
> Is this a regression?

Don't think so.  I discovered it in XServer 1.7 (Ubuntu 10.04).  I haven't yet confirmed it in latest, but having a look at the code it doesn't appear that anything has changed.
Comment 6 Jeremy Huddleston Sequoia 2011-10-04 11:40:15 UTC
(In reply to comment #5)
> (In reply to comment #3)
> > Is this a regression?
> 
> Don't think so.  I discovered it in XServer 1.7 (Ubuntu 10.04).  I haven't yet
> confirmed it in latest, but having a look at the code it doesn't appear that
> anything has changed.

Ok, thanks.  Not marking as a regression or adding to the 1.11 or 1.12 tracker, but leaving as critical since this is a server crash.
Comment 7 Adam Jackson 2014-04-10 15:07:35 UTC
This doesn't crash for me on xserver 1.16 and Mesa 10.0.4.  Mode '1' correctly throws BadMatch for the second context.  Mode '2' succeeds, at least against Xephyr+llvmpipe, but the second context ends up actually being indirect.
Comment 8 debguy 2014-11-28 05:42:20 UTC
i'm sorry can you be specific ?

do you have all in silicon GL version what ?  or a china nvidia / ati ?  or are using mesa totally emulated (not a bad idea, and old GL card never can run new render ware programs)

-----------------------------
seems to me GL hardware drivers are mighty complicated when you get to the "if multiple open contexts" are fully supported deptartment

i remember year ago on my all in silicon GL card it was easy and had nothing to do with X or mesa

the documentation was clear each had to be running in separate code threads and atomic locking to share data between threads was required

you might need to read deeply.

is there some profit you'll get if you can run multiple GL windows ?
Comment 9 debguy 2014-11-28 05:50:35 UTC
so you might have a problem with thread locking and sharing.

also.  the other thing i remember (having done it before)

is that when using the window manager to open the graphics context (Gc) a special option (other than bits per pixel) needed to be set, ie, not the same as opening the first GC for the first GL window.  but that might be not be the case anymore or with software renderers.
Comment 10 Adam Jackson 2018-06-12 19:19:02 UTC
Pretty sure this was fixed in Mesa by:

commit c4a8c54c3bb31547cba57702ffea99293afef522
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Tue Dec 6 12:19:39 2011 -0800

    glx: Don't create a shared context if the other context isn't the same kind
    
    Each of the DRI, DRI2, and DRISW backends contain code like the
    following in their create-context routine:
    
       if (shareList) {
          pcp_shared = (struct dri2_context *) shareList;
          shared = pcp_shared->driContext;
       }
    
    This assumes that the glx_context *shareList is actually the correct
    derived type.  However, if shareList was created as an
    indirect-rendering context, it will not be the expected type.  As a
    result, shared will contain garbage.  This garbage will be passed to
    the driver, and the driver will probably segfault.  This can be
    observed with the following GLX code:
    
        ctx0 = glXCreateContext(dpy, visinfo, NULL, False);
        ctx1 = glXCreateContext(dpy, visinfo, ctx0, True);
    
    Create-context is the only case where this occurs.  All other cases
    where a context is passed to the backend, it is the 'this' pointer
    (i.e., we got to the backend by call something from ctx->vtable).
    
    To work around this, check that the shareList->vtable->destroy method
    is the same as the destroy method of the expected type.  We could also
    check that shareList->vtable matches the vtable or by adding a "tag"
    to glx_context to identify the derived type.
    
    NOTE: This is a candidate for the 7.11 branch.
    
    Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Reviewed-by: Eric Anholt <eric@anholt.net>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.