Bug 26394 - Server sometimes crashes when closing OpenGL programs
Summary: Server sometimes crashes when closing OpenGL programs
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/Ext/DRI (show other bugs)
Version: unspecified
Hardware: x86 (IA32) All
: high critical
Assignee: Kristian Høgsberg
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 27326 27406 27473 28380 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-02-02 10:47 UTC by Patrik Olsson
Modified: 2011-10-16 23:53 UTC (History)
16 users (show)

See Also:
i915 platform:
i915 features:


Attachments
GDB debug log (1.43 KB, text/plain)
2010-02-02 10:47 UTC, Patrik Olsson
no flags Details
Reproduces the crash (2.13 KB, text/x-csrc)
2010-03-14 14:52 UTC, Patrik Olsson
no flags Details
glxinfo output (16.46 KB, text/plain)
2010-03-19 08:15 UTC, Patrik Olsson
no flags Details
Make DRI2DestroyDrawable more robust if pDraw is invalid (580 bytes, patch)
2010-04-02 12:07 UTC, Jesse Barnes
no flags Details | Splinter Review
Make sure corresponding X drawable exists before trying to use it (1.43 KB, patch)
2010-04-05 16:08 UTC, Jesse Barnes
no flags Details | Splinter Review
Alternate fix (1.36 KB, patch)
2010-04-06 14:32 UTC, Kristian Høgsberg
no flags Details | Splinter Review
DRI2: Track DRI2 drawables as resources, not privates (9.59 KB, patch)
2010-04-14 07:49 UTC, Kristian Høgsberg
no flags Details | Splinter Review
glx: Drop DestroyWindow hook (2.49 KB, patch)
2010-04-14 07:52 UTC, Kristian Høgsberg
no flags Details | Splinter Review
DRI2: Track DRI2 drawables as resources, not privates (9.59 KB, patch)
2010-04-14 07:54 UTC, Kristian Høgsberg
no flags Details | Splinter Review
glx: Drop DestroyWindow hook (2.49 KB, patch)
2010-04-14 07:54 UTC, Kristian Høgsberg
no flags Details | Splinter Review
DRI2: Track DRI2 drawables as resources, not privates (9.59 KB, patch)
2010-04-14 08:02 UTC, Kristian Høgsberg
no flags Details | Splinter Review
glx: Drop DestroyWindow hook (2.49 KB, patch)
2010-04-14 08:03 UTC, Kristian Høgsberg
no flags Details | Splinter Review
DRI2-drawable-id-check-and -free-resource.patch (1.10 KB, patch)
2010-04-20 19:09 UTC, Li Peng
no flags Details | Splinter Review
Also handle AIGLX (1.93 KB, patch)
2010-04-21 00:30 UTC, Michel Dänzer
no flags Details | Splinter Review

Description Patrik Olsson 2010-02-02 10:47:29 UTC
Created attachment 33011 [details]
GDB debug log

In about 1 of 5 times that I close OpenGL programs (that I have written), the X server crashes. It doesn't seem to happen when I kill or "^C" them, though. It is likely the programs are misbehaving (for example, I currently don't clean up all X and GLX resources before exit, and maybe I mix both new and old GLX api). I don't have the problem with other OpenGL programs I have tested such as glxgears or glxinfo. I any case, I don't think the server should crash, no matter whether programs/clients are misbehaving or not. Also, I should mention that in previous versions of the server, I didn't have the problem.

I attached to the server process with gdb to get a backtrace. The debug log (with prompts) is attached. pScreen in #2 was optimized out so I got it from #3 instead.

This is my theory:

pDraw at #3 is probably a freed pointer. Thus its pScreen field is undefined/arbitrary and in this case was 0x1 which made 'privates' in #0 a pointer a bit higher (because implicitly, #2 will find 'privates' by adding the field offset to the pScreen pointer, see the source code at that part). Thus in the end it uses an invalid pointer of value 0x151 for 'privates'.

I think the problem is here (#4 in backtrace, __glXDRIdrawableDestroy in glx/glxdri2.c):

    /* If the X window was destroyed, the dri DestroyWindow hook will
     * aready have taken care of this, so only call if pDraw isn't NULL. */
    if (drawable->pDraw != NULL)
	DRI2DestroyDrawable(drawable->pDraw);

It assumes that drawable->pDraw has been set to NULL if it is already freed. However, wherever it is freed this pointer is not set to NULL afterwards (at least that's what I suspect). I don't know where I can find the code of the "dri DestroyWindow hook", though.

I use xserver version 2:1.7.4-2 in Debian. My original report at bugs.debian.org is here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=567677
Comment 1 Patrik Olsson 2010-03-02 14:21:56 UTC
Update:

I first made sure that my application only used GLX >=1.3 API. This would still crash. I then made sure it only used GLX <1.3, and now I have no crashes anymore. I'm planning to write a simple program that reproduces the crash/bug.
Comment 2 Patrik Olsson 2010-03-14 14:52:44 UTC
Created attachment 34041 [details]
Reproduces the crash

Here is a program that should crash the server eventually (just run it a few times, it might not happen every time). Also, it seems to happen more often if the program is exited the normal way (e.g. the window close button) than if it is killed (with some signal).

I think it is interesting that the server seems to *not* crash if the make context current call is removed.

Another thing, I now use version 2:1.7.5-1 in Debian.

Now please, someone reply. Currently, the only thing I've got for sending in this report is spam to my email address.
Comment 3 Kristian Høgsberg 2010-03-19 06:41:41 UTC
Is your program using indirect rendering?
Comment 4 Patrik Olsson 2010-03-19 08:15:53 UTC
Created attachment 34244 [details]
glxinfo output

glxinfo says "direct rendering: Yes".

Adding this piece of code just after the context is created in the example program:

printf ("Direct: %s\n", (glXIsDirect (dpy, glxcontext) == True) ? "yes" : "no");

Gives output "yes" in both the cases when it does not crash and the cases it
does crash.

I have attached the output of glxinfo too.
Comment 5 Jason D. Clinton 2010-03-23 13:28:02 UTC
Exiting any Clutter 1.2 application causes the same crash. I have the same environment as the reporter but newer:

xserver-xorg-core: 2:1.7.5.902-1
xserver-xorg-video-intel: 2:2.10.903-1
linux-image-2.6.33-2-amd64: 2.6.33-1~experimental.4
Comment 6 Patrik Olsson 2010-03-25 09:23:07 UTC
I have tested running the program (attached before, with modification to print "Direct: yes/no") on my other computer (same server version, but software rasterizer). The program crashes, but not the server. The program outputs the following:

failed to create drawable
Direct: yes
Segmentation fault

Happens also when I run remote programs (it also says "Direct: yes" in that case, which makes no sense, I think).

Top of backtrace:

#0  0xb77d305f in XCreateDrawable (psc=0x80aa2f8, xDrawable=56623137, 
    drawable=56623137, modes=0x80b4480) at drisw_glx.c:89
        gcvalues = {function = 3, plane_mask = 3450467868, 
          foreground = 3540936910, background = 1915168984, 
          line_width = -928079311, line_style = -2102111902, 
          cap_style = 1901201503, join_style = -1946909669, 
          fill_style = 504982920, fill_rule = -1981880903, 
          arc_mode = 742396610, tile = 3411661115, stipple = 3456143762, 
          ts_x_origin = 2135257643, ts_y_origin = -1073749304, 
          font = 3221217988, subwindow_mode = -1073749324, 
          graphics_exposures = 0, clip_x_origin = 63, clip_y_origin = 1, 
          clip_mask = 3221218224, dash_offset = 1870652674, 
          dashes = -48 '\320'}
        visTemp = {visual = 0xbfffe494, visualid = 0, screen = 0, 
          depth = 134938528, class = 56623137, red_mask = 3221218296, 
          green_mask = 3078446832, blue_mask = 56623137, 
          colormap_size = 134934184, bits_per_rgb = -1073748968}
        num_visuals = 0
#1  driCreateDrawable (psc=0x80aa2f8, xDrawable=56623137, drawable=56623137, 
    modes=0x80b4480) at drisw_glx.c:321
        pdraw = <value optimized out>
        swrast = <value optimized out>
#2  0xb77b175e in FetchDRIDrawable (dpy=<value optimized out>, 
    glxDrawable=56623137, gc=0x80aeea8) at glxcurrent.c:294
        priv = <value optimized out>
        pdraw = <value optimized out>
        psc = <value optimized out>
#3  0xb77b1ab6 in MakeContextCurrent (dpy=0x8060a30, draw=56623137, 
    read=56623137, gc=0x80aeea8) at glxcurrent.c:370
        pdraw = 0xb7fff697
        pread = <value optimized out>
        reply = {type = 14 '\016', unused = 0 '\000', sequenceNumber = 0, 
          length = 0, contextTag = 134521636, pad2 = 1, pad3 = 3078236324, 
          pad4 = 134934184, pad5 = 3221218996, pad6 = 134899480}
        oldGC = 0xb77f77e0
        opcode = 151 '\227'
        oldOpcode = <value optimized out>
        bindReturnValue = <value optimized out>
        state = <value optimized out>
#4  0x08048dd9 in configure ()
...
Comment 7 Patrik Olsson 2010-03-25 10:36:32 UTC
In my previous comment I meant to say "run the program remotely" instead of "run remote programs".
Comment 8 Julien Cristau 2010-03-26 07:00:24 UTC
*** Bug 27326 has been marked as a duplicate of this bug. ***
Comment 9 Emmanuele Bassi (:ebassi) 2010-03-30 04:07:34 UTC
from bug 27326 - this issue is also affecting Clutter 1.2, since we switched to GLX 1.3 API.
Comment 10 roberth 2010-04-01 08:58:28 UTC
Here is a clearer backtrace of the crash in case it helps.

Program received signal SIGSEGV, Segmentation fault.
0x0807fdae in privateExists (privates=0x14f, key=0xffa0fc) at ../../dix/privates.c:79
79	    return *key && *privates &&
(gdb) bt full
#0  0x0807fdae in privateExists (privates=0x14f, key=0xffa0fc) at ../../dix/privates.c:79
No locals.
#1  0x0807ffd1 in dixLookupPrivate (privates=0x14f, key=0xffa0fc) at ../../dix/privates.c:162
        ptr = 0x4f0
#2  0x00ff6bf9 in DRI2GetScreen (pScreen=0xffffffff) at ../../../../hw/xfree86/dri2/dri2.c:78
No locals.
#3  0x00ff731b in DRI2DestroyDrawable (pDraw=0xbae9110) at ../../../../hw/xfree86/dri2/dri2.c:344
        ds = 0xa25c568
        pPriv = 0x0
        pWin = 0xffa100
        pPixmap = 0xbae9490
#4  0x0056c7f9 in __glXDRIdrawableDestroy (drawable=0xbae93c0) at ../../glx/glxdri2.c:110
        private = 0xbae93c0
        core = 0x129bd40
#5  0x0055cd31 in DrawableGone (glxPriv=0xbae93c0, xid=67108869) at ../../glx/glxext.c:163
        c = 0x0
#6  0x08094ab3 in FreeClientResources (client=0xb9ec3a8) at ../../dix/resource.c:806
        rtype = 54
        head = 0xb9ec47c
        resources = 0xb9ec468
        this = 0xbae9a10
        j = 5
#7  0x0808fa43 in CloseDownClient (client=0xb9ec3a8) at ../../dix/dispatch.c:3631
        really_close_down = 1
#8  0x08087bc8 in Dispatch () at ../../dix/dispatch.c:424
        clientReady = 0xb8feac8
        result = -1
        client = 0xb9ec3a8
        nready = 0
        icheck = 0x82594fc
        start_tick = 720
#9  0x08066d33 in main (argc=8, argv=0xbfd8efd4, envp=0xbfd8eff8) at ../../dix/main.c:285
        i = 1
        alwaysCheckForInput = {0, 1}
Comment 11 Bryce Harrington 2010-04-01 11:28:21 UTC
#2  0x00ff6bf9 in DRI2GetScreen (pScreen=0xffffffff)

That doesn't look good.

The backtraces in comments #6 and #10 are not congruent, are these actually the same crash?  #10 matches the original reported trace so perhaps is the one to focus on.
Comment 12 Patrik Olsson 2010-04-01 12:15:18 UTC
(In reply to comment #11)
> 
> The backtraces in comments #6 and #10 are not congruent, are these actually the
> same crash?  #10 matches the original reported trace so perhaps is the one to
> focus on.
> 

The back trace in comment #6 is not from the server, but the previously attached example program that reproduces the bug. When I run it on my laptop with Intel chip the server crashes, but when I run it on my desktop with software rasterizer the server survives but the program crashes.
Comment 13 roberth 2010-04-01 12:31:39 UTC
Applying this commit to debian/ubuntu's xserver 1.7.6 fixes the server segfault when closing clutter apps (with clutter 1.2.4) for me, but the problem still remains with the testcase on this bug. Both cases have identical backtraces to the one in comment 10, and forcing swrast with LIBGL_ALWAYS_SOFTWARE=1 gives an identical backtrace and experience as comment #6 for the test case on comment #2. I should mention I am using intel as well as is everyone I have seen reporting this so far.

http://cgit.freedesktop.org/xorg/xserver/commit/?id=fa5103a02bd509e4a102afdad2ab26cb22210367

dri2: No need to blit from front on DRI2GetBuffers if they're just being reused.
It can be quite an expensive operation, so we're better off not doing
it unless it's totally required.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Comment 14 Jason Smith 2010-04-02 08:18:07 UTC
The fix suggested in the previous comment does not resolve the crashes in clutter 1.2.4 here. Sorry :/
Comment 15 Jesse Barnes 2010-04-02 10:53:08 UTC
Can you try this mesa fix?

diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c
index c4b5cb9..27a700d 100644
--- a/src/glx/dri2_glx.c
+++ b/src/glx/dri2_glx.c
@@ -164,7 +164,7 @@ dri2DestroyDrawable(__GLXDRIdrawable * pdraw)
    const __DRIcoreExtension *core = pdraw->psc->core;
 
    (*core->destroyDrawable) (pdraw->driDrawable);
-   DRI2DestroyDrawable(pdraw->psc->dpy, pdraw->xDrawable);
+   DRI2DestroyDrawable(pdraw->psc->dpy, pdraw->drawable);
    Xfree(pdraw);
 }
Comment 16 Jesse Barnes 2010-04-02 12:07:40 UTC
Created attachment 34628 [details] [review]
Make DRI2DestroyDrawable more robust if pDraw is invalid

If pDraw is invalid (i.e. we already freed the priv) don't try to use it to look up the screen.

I think this is just a workaround... any comments Kristian?
Comment 17 Jesse Barnes 2010-04-02 12:08:01 UTC
*** Bug 27406 has been marked as a duplicate of this bug. ***
Comment 18 roberth 2010-04-02 16:24:21 UTC
(In reply to comment #16)
> Created an attachment (id=34628) [details]
> Make DRI2DestroyDrawable more robust if pDraw is invalid
> 
> If pDraw is invalid (i.e. we already freed the priv) don't try to use it to
> look up the screen.
> 
> I think this is just a workaround... any comments Kristian?

With this patch in a ubuntu lucid environment (xserver 1.7.6, intel 2.9.1+backports, mesa 7.7.1) it still happens, but the probability of occurance is greatly reduced instead of being guaranteed at every closure of quadrapassel. With xorg-edgers packages (mesa 7.9, intel 2.11) on lucid as well as this xserver patch it completely goes away though.
Comment 19 Michel Dänzer 2010-04-03 03:36:42 UTC
I don't see(In reply to comment #16)
> Created an attachment (id=34628) [details]
> Make DRI2DestroyDrawable more robust if pDraw is invalid
> 
> If pDraw is invalid (i.e. we already freed the priv) don't try to use it to
> look up the screen.

I don't see how that can work - if the WindowRec pointed to by pDraw was already freed, DRI2GetDrawable() can't be relied upon.

See e.g. commits 7b6400a1b8d2f228fcbedf17c30a7e3924e4dd2a ('glx: Fix drawable private leak on destroy'), 2075d4bf9e53b8baef0b919da6c44771220cd4a5 ('glx: If a destroyed window is bound to the current context, make it not current.') and 3020b1d43e34fca08cd51f7c7c8ed51497d49ef3 ('glx: Clean up more thoroughly if the drawable of a current context goes away.') for how similar issues were dealt with previously.

BTW, it may not be immediately related to this problem, but I suspect commit 711e26466ae04ae93ff4c48d377d83d68a6320e9 ('DRI2: handle drawable destruction properly at DRI2SwapComplete time') can't work properly for windows - it tries to defer the drawable destruction, but that's not possible for windows.
Comment 20 Jesse Barnes 2010-04-05 10:20:10 UTC
Yeah, I know it's just a bad workaround.  It makes the crashes much less frequent but definitely doesn't address the real problem.  Thanks for the heads up on the window destruction too; I think more of this code needs to use resource IDs and lookups for tracking rather than the drawable pointers.  Eric made a change like this to the intel driver...

I wonder if there's a better way of tracking the GLX drawable vs X window drawable handles in both the Mesa and X code; it seems like we've been confusing them a lot lately.
Comment 21 Jesse Barnes 2010-04-05 16:08:31 UTC
Created attachment 34687 [details] [review]
Make sure corresponding X drawable exists before trying to use it

Looked over this some more with Kristian today.  It looks like the root of the problem is that the GLX drawable we're destroying in __glXdrawableDestroy has a stale corresponding X drawable, which was destroyed in the GLX DestroyWindow hook (added by 120286aef59dabdb7c9fa762e08457e5cc8ec3a6).

That'll make the X drawable go away, but we'll be left with a pDraw pointing into freed space, probably crashing when we get to __glXdrawableDestroy for the actual GLX drawable in question.

This patch does a lookup on the X drawable ID corresponding to the GLX drawable at destruction time, and ignores it if it no longer exists.  Kristian put together an alternate approach as well but it needs a little work: http://people.freedesktop.org/~krh/glx-mess.patch.

And per the last comment, DRI2 priv destruction also needs some fixing; I think it's now safe to always free the priv and clear the private wherever we call DRI2FreeDrawable, since we check everywhere we need to whether the priv still exists and so shouldn't crash.
Comment 22 Li Peng 2010-04-05 22:49:33 UTC
(In reply to comment #21)
> Created an attachment (id=34687) [details]
> Make sure corresponding X drawable exists before trying to use it
> 
> Looked over this some more with Kristian today.  It looks like the root of the
> problem is that the GLX drawable we're destroying in __glXdrawableDestroy has a
> stale corresponding X drawable, which was destroyed in the GLX DestroyWindow
> hook (added by 120286aef59dabdb7c9fa762e08457e5cc8ec3a6).
> 
> That'll make the X drawable go away, but we'll be left with a pDraw pointing
> into freed space, probably crashing when we get to __glXdrawableDestroy for the
> actual GLX drawable in question.
> 
> This patch does a lookup on the X drawable ID corresponding to the GLX drawable
> at destruction time, and ignores it if it no longer exists.  Kristian put
> together an alternate approach as well but it needs a little work:
> http://people.freedesktop.org/~krh/glx-mess.patch.
> 
> And per the last comment, DRI2 priv destruction also needs some fixing; I think
> it's now safe to always free the priv and clear the private wherever we call
> DRI2FreeDrawable, since we check everywhere we need to whether the priv still
> exists and so shouldn't crash.

Jesse, I tested your patch and it fixed the X crash issue in MeeGo. Thank you.
Comment 23 Julien Cristau 2010-04-06 02:11:20 UTC
*** Bug 27473 has been marked as a duplicate of this bug. ***
Comment 24 Michel Dänzer 2010-04-06 04:53:53 UTC
(In reply to comment #21)
> Looked over this some more with Kristian today.  It looks like the root of the
> problem is that the GLX drawable we're destroying in __glXdrawableDestroy has a
> stale corresponding X drawable, which was destroyed in the GLX DestroyWindow
> hook (added by 120286aef59dabdb7c9fa762e08457e5cc8ec3a6).

glxDestroyWindow() itself doesn't destroy the DrawableRec, but as I've pointed out over and over again, the destruction of a window indeed cannot be deferred.

> That'll make the X drawable go away, but we'll be left with a pDraw pointing
> into freed space, probably crashing when we get to __glXdrawableDestroy for the
> actual GLX drawable in question.

The purpose of glxDestroyWindow() was to clean up such dangling references. Apparently it's missing some of them (now).

> This patch does a lookup on the X drawable ID corresponding to the GLX drawable
> at destruction time, and ignores it if it no longer exists.  Kristian put
> together an alternate approach as well but it needs a little work:
> http://people.freedesktop.org/~krh/glx-mess.patch.

Yeah, when we discussed this kind of issue earlier, Kristian was against using IDs everywhere, and I think there are good arguments for not doing that.

> And per the last comment, DRI2 priv destruction also needs some fixing; I think
> it's now safe to always free the priv and clear the private wherever we call
> DRI2FreeDrawable, since we check everywhere we need to whether the priv still
> exists and so shouldn't crash.

If you're referring to the last paragraph of comment #19, I'm not sure that'll do. If there are outstanding swaps referencing the DrawableRec of a window being destroyed, those swaps will need to be cancelled or whatever is necessary to clean up the dangling references.
Comment 25 Jesse Barnes 2010-04-06 08:48:43 UTC
(In reply to comment #24)
> (In reply to comment #21)
> > Looked over this some more with Kristian today.  It looks like the root of the
> > problem is that the GLX drawable we're destroying in __glXdrawableDestroy has a
> > stale corresponding X drawable, which was destroyed in the GLX DestroyWindow
> > hook (added by 120286aef59dabdb7c9fa762e08457e5cc8ec3a6).
> 
> glxDestroyWindow() itself doesn't destroy the DrawableRec, but as I've pointed
> out over and over again, the destruction of a window indeed cannot be deferred.

Right, thus this patch.

> > That'll make the X drawable go away, but we'll be left with a pDraw pointing
> > into freed space, probably crashing when we get to __glXdrawableDestroy for the
> > actual GLX drawable in question.
> 
> The purpose of glxDestroyWindow() was to clean up such dangling references.
> Apparently it's missing some of them (now).
> 
> > This patch does a lookup on the X drawable ID corresponding to the GLX drawable
> > at destruction time, and ignores it if it no longer exists.  Kristian put
> > together an alternate approach as well but it needs a little work:
> > http://people.freedesktop.org/~krh/glx-mess.patch.
> 
> Yeah, when we discussed this kind of issue earlier, Kristian was against using
> IDs everywhere, and I think there are good arguments for not doing that.

Such as?  Do you have ideas for an alternate approach?  Or do you like Kristian's proposed patch better?

> 
> > And per the last comment, DRI2 priv destruction also needs some fixing; I think
> > it's now safe to always free the priv and clear the private wherever we call
> > DRI2FreeDrawable, since we check everywhere we need to whether the priv still
> > exists and so shouldn't crash.
> 
> If you're referring to the last paragraph of comment #19, I'm not sure that'll
> do. If there are outstanding swaps referencing the DrawableRec of a window
> being destroyed, those swaps will need to be cancelled or whatever is necessary
> to clean up the dangling references.

I don't know how to handle this yet, I'll look at it some more...
Comment 26 Michel Dänzer 2010-04-06 09:35:42 UTC
(In reply to comment #25)
> Such as?  Do you have ideas for an alternate approach?  Or do you like
> Kristian's proposed patch better?

I'll leave the argument to Kristian, as he was previously arguing against patches of mine using ID lookup, but I thought the alternative approach should be clear by now: Clean up all references to the DrawableRec of a window being destroyed.
Comment 27 Jesse Barnes 2010-04-06 09:43:24 UTC
(In reply to comment #26)
> (In reply to comment #25)
> > Such as?  Do you have ideas for an alternate approach?  Or do you like
> > Kristian's proposed patch better?
> 
> I'll leave the argument to Kristian, as he was previously arguing against
> patches of mine using ID lookup, but I thought the alternative approach should
> be clear by now: Clean up all references to the DrawableRec of a window being
> destroyed.

Ok I'll chat with Kristian.  The main problem with the cleanup approach is that it's a bit more invasive than the small patch here, and seems to violate some of the existing GLX layering; I was hoping you had something clever to make the impact small.  I agree that we'll add some lookup overhead, I'm just not sure how much impact that will have.
Comment 28 roberth 2010-04-06 10:19:11 UTC
Not to derail, but I just wanted to say that the patch in comment #21 did fix all 4 machines I've tested in a ubuntu lucid environment. Thank you!

There's a xserver package here for anyone else experiencing this on lucid and wanting to test it. The publisher is being a little slow today so it may not show up for a bit.

https://edge.launchpad.net/~sarvatt/+archive/bugfixes
Comment 29 Kristian Høgsberg 2010-04-06 13:01:29 UTC
(In reply to comment #24)
> (In reply to comment #21)
> > Looked over this some more with Kristian today.  It looks like the root of the
> > problem is that the GLX drawable we're destroying in __glXdrawableDestroy has a
> > stale corresponding X drawable, which was destroyed in the GLX DestroyWindow
> > hook (added by 120286aef59dabdb7c9fa762e08457e5cc8ec3a6).u
> 
> glxDestroyWindow() itself doesn't destroy the DrawableRec, but as I've pointed
> out over and over again, the destruction of a window indeed cannot be deferred.
> 
> > That'll make the X drawable go away, but we'll be left with a pDraw pointing
> > into freed space, probably crashing when we get to __glXdrawableDestroy for the
> > actual GLX drawable in question.
> 
> The purpose of glxDestroyWindow() was to clean up such dangling references.
> Apparently it's missing some of them (now).

It (120286aef59dabdb7c9fa762e08457e5cc8ec3a6) always missed GLX windows created using glXCreateWindow.  Those have a different XID than the X window.  For the DestroyWindow hook to work correctly, we have to store the __GLXdrawable as a private on the window, look it up in the hook and then call FreeResource on the glx drawable XID.

> > This patch does a lookup on the X drawable ID corresponding to the GLX drawable
> > at destruction time, and ignores it if it no longer exists.  Kristian put
> > together an alternate approach as well but it needs a little work:
> > http://people.freedesktop.org/~krh/glx-mess.patch.
> 
> Yeah, when we discussed this kind of issue earlier, Kristian was against using
> IDs everywhere, and I think there are good arguments for not doing that.

The idea was mostly to avoid the lookup overhead - not that it's a too expensive operation, but since we already looked up the drawable pointer once we might as well hold on to it.  The other concern is robustness in case of reused XIDs.

> > And per the last comment, DRI2 priv destruction also needs some fixing; I think
> > it's now safe to always free the priv and clear the private wherever we call
> > DRI2FreeDrawable, since we check everywhere we need to whether the priv still
> > exists and so shouldn't crash.
> 
> If you're referring to the last paragraph of comment #19, I'm not sure that'll
> do. If there are outstanding swaps referencing the DrawableRec of a window
> being destroyed, those swaps will need to be cancelled or whatever is necessary
> to clean up the dangling references.

Before 711e26466ae04ae93ff4c48d377d83d68a6320e9, the callback mechanism was designed so that the we would hold on to the DRI2 private and pass it back in the DRI2SwapComplete callback.  While we can't prevent a window from being destroyed, we can make the DRI2 private ref-counted and keep it alive long enough that any outstanding flips finish.  If a window goes away while we have a flip pending, we can NULL the DrawablePtr in the DRI2 private. When the DRI2SwapComplete eventually happens, the DRI2 private is still valid, and will allow us to reliably discover that the window is gone.  That mechanism worked well enough I think, but the problem was that we didn't always check to see if the drawable was gone in the callback.  Jesse is saying that that's now fixed an we can roll back 711e26466ae04ae93ff4c48d377d83d68a6320e9.
Comment 30 Kristian Høgsberg 2010-04-06 14:32:39 UTC
Created attachment 34730 [details] [review]
Alternate fix

To sum up, what I'd recommend is that we pick up Francisco's patch:

  http://lists.freedesktop.org/archives/xorg-devel/2010-February/005525.html

and add the patch attached here on top to properly destroy GLX drawables in the DestroyWindow hook, whether they were created through the compatibility mechanism or glXCreateWindow().  Compared to Jesses patch, this doesn't have issues with reused XID (not sure that it's a big issue though).

If we go with Jesses patch we should be able to just drop the DestroyWindow hook, ie revert 120286aef59dabdb7c9fa762e08457e5cc8ec3a6.
Comment 31 Jesse Barnes 2010-04-12 13:17:16 UTC
Reassigning to Kristian as he is pushing the fix.
Comment 32 Kristian Høgsberg 2010-04-14 07:49:44 UTC
Created attachment 35001 [details] [review]
DRI2: Track DRI2 drawables as resources, not privates

The main motivation here is to have the resource system clean up the
DRI2 drawable automatically so glx doesn't have to.  Right now, the
glx drawable resource must be destroyed before the X drawable, so that
calling DRI2DestroyDrawable doesn't crash.  By making the DRI2
drawable a resource, GLX doesn't have to worry about that and the
resource destruction order becomes irrelevant.

https://bugs.freedesktop.org/show_bug.cgi?id=26394

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Comment 33 Kristian Høgsberg 2010-04-14 07:52:18 UTC
Created attachment 35002 [details] [review]
glx: Drop DestroyWindow hook

Now that glx doesn't call DRI2DestroyDrawable anymore, we don't need to
force a specific resource destruction order in the DestroyWindow hook.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Comment 34 Kristian Høgsberg 2010-04-14 07:54:12 UTC
Created attachment 35003 [details] [review]
DRI2: Track DRI2 drawables as resources, not privates

The main motivation here is to have the resource system clean up the
DRI2 drawable automatically so glx doesn't have to.  Right now, the
glx drawable resource must be destroyed before the X drawable, so that
calling DRI2DestroyDrawable doesn't crash.  By making the DRI2
drawable a resource, GLX doesn't have to worry about that and the
resource destruction order becomes irrelevant.

https://bugs.freedesktop.org/show_bug.cgi?id=26394

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Comment 35 Kristian Høgsberg 2010-04-14 07:54:25 UTC
Created attachment 35004 [details] [review]
glx: Drop DestroyWindow hook

Now that glx doesn't call DRI2DestroyDrawable anymore, we don't need to
force a specific resource destruction order in the DestroyWindow hook.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Comment 36 Kristian Høgsberg 2010-04-14 08:02:50 UTC
Created attachment 35005 [details] [review]
DRI2: Track DRI2 drawables as resources, not privates

The main motivation here is to have the resource system clean up the
DRI2 drawable automatically so glx doesn't have to.  Right now, the
glx drawable resource must be destroyed before the X drawable, so that
calling DRI2DestroyDrawable doesn't crash.  By making the DRI2
drawable a resource, GLX doesn't have to worry about that and the
resource destruction order becomes irrelevant.

https://bugs.freedesktop.org/show_bug.cgi?id=26394

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Comment 37 Kristian Høgsberg 2010-04-14 08:03:01 UTC
Created attachment 35006 [details] [review]
glx: Drop DestroyWindow hook

Now that glx doesn't call DRI2DestroyDrawable anymore, we don't need to
force a specific resource destruction order in the DestroyWindow hook.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Comment 38 Kristian Høgsberg 2010-04-14 08:09:45 UTC
Attachment 35005 [details] and attachment 35006 [details] [review] are the latest proposed fix.  They've been posted to xorg-devel for review, blocking on keithp to review and commit at this point.
Comment 39 Kristian Høgsberg 2010-04-19 06:34:51 UTC
The patches are now on xserver master, please give that a try and close the bug if it fixes the problem for you.
Comment 40 Li Peng 2010-04-20 19:05:16 UTC
Kristian:

I have a question about the patch "DRI2: Track DRI2 drawables as resources, not privates", it calls AddResource() in DRI2CreateDrawable() to add DRI2 drawable as resources, but where we free them, should we call FreeResourceByType() to free the DRI2 drawable in ProcDRI2DestroyDrawable() ?

Also in my test in MeeGo. I find that in ProcDRI2CreateDrawable(), sometimes the value of "pDrawable->id" is 0, which is different with stuff->drawable. that would cause some problem. because we will get an old DRI2 drawable from DRI2GetDrawable() that attached to the zero value of "pDrawable->id". I think we need to add some check to make sure "pDrawable->id" is synced with "stuff->drawable".

I attached a draft patch that addressing above two points, please review.
Comment 41 Li Peng 2010-04-20 19:09:46 UTC
Created attachment 35191 [details] [review]
DRI2-drawable-id-check-and -free-resource.patch
Comment 42 Li Peng 2010-04-20 19:12:57 UTC
with the below 4 commits in xserver, I don't see crash anymore.

glx: Drop DestroyWindow hook
DRI2: Track DRI2 drawables as resources, not privates
glx: Let the resource system destroy pixmaps
glx: Track GLX 1.3 style GLX drawables under their X drawable ID as well
Comment 43 Michel Dänzer 2010-04-21 00:30:41 UTC
Created attachment 35195 [details] [review]
Also handle AIGLX

This patch I posted to the xorg-devel list also handles the DRI2CreateDrawable() call in __glXDRIscreenCreateDrawable(). However, I'm not sure it's valid to just assign the pixmap id field like this.

See bug 27767 for one possible symptom of this problem.
Comment 44 Chris Wilson 2010-06-07 02:56:52 UTC
*** Bug 28380 has been marked as a duplicate of this bug. ***
Comment 45 Jeremy Huddleston Sequoia 2011-10-16 23:53:19 UTC
Closing based on comments.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.