Bug 30089 - glean fails with GLXBadContextTag error with indirect rendering
Summary: glean fails with GLXBadContextTag error with indirect rendering
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-08 07:54 UTC by Jon Turney
Modified: 2014-01-09 15:41 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Patch used to produce the debug output shown (3.21 KB, patch)
2010-09-08 08:00 UTC, Jon Turney
no flags Details | Splinter Review
Don't delete the current context of a drawable when drawable is deleted (1.51 KB, patch)
2010-09-21 07:30 UTC, Jon Turney
no flags Details | Splinter Review
Small testcase to demonstrate GLXBadContextTag error when window is deleted with a current context (1.94 KB, text/plain)
2010-09-24 08:11 UTC, Jon Turney
no flags Details
Small testcase to demonstrate GLXBadContextTag error when window is deleted with a current context (1.97 KB, text/plain)
2014-01-09 15:05 UTC, Jon Turney
no flags Details

Description Jon Turney 2010-09-08 07:54:00 UTC
testing with git master xserver and mesa, mesa configured --with-dri-drivers=swrast so only swrast dri driver is built and installed.

$ LIBGL_ALWAYS_INDIRECT=1 ./glean -r results -o --quick -t "api2 basic"
api2:  NOTE rgba8, db, z24, s8, win+pmap, id 33
        Test skipped/non-applicable

X Error of failed request:  GLXBadContextTag
  Major opcode of failed request:  153 (GLX)
  Minor opcode of failed request:  5 (X_GLXMakeCurrent)
  Serial number of failed request:  34
  Current serial number in output stream:  34

Note that running the tests individually succeeds, so it's appears to be the sequence of them which causes a failure.

$ LIBGL_ALWAYS_INDIRECT=1 ./glean -r results -o --quick -t api2
$ LIBGL_ALWAYS_INDIRECT=1 ./glean -r results -o --quick -t basic
Comment 1 Jon Turney 2010-09-08 08:00:19 UTC
Created attachment 38555 [details] [review]
Patch used to produce the debug output shown

As observed in bug #29977, this seems to be related to commit 3020b1d43e34fca08cd51f7c7c8ed51497d49ef3.  That commit makes the X server invalidate a context tag when a drawable on which it is current is deleted.  But the client still has that tag and a subsequent glXMakeCurrent protocol request send the old context tag along with the the new context ID, which the X server now thinks is invalid:

Added some debug output (patch attached) to the X server to demonstrate what's happening:

DoMakeCurrent: oldtag 0
DoMakeCurrent: context a296748
AddCurrentContext: Context a296748 was allocated tag 1 (grew table)
DoMakeCurrent: replying with tag 1
__glXLookupContextByTag: tag 1 (max 1) found context a296748
__glXLookupContextByTag: tag 1 (max 1) found context a296748
DrawableGone: drawable a5e6540 XID 400002
DrawableGone: clearing tag 1, was context a296748
DoMakeCurrent: oldtag 1
__glXLookupContextByTag: tag 1 (max 1) found context 0
Comment 2 Jon Turney 2010-09-21 06:21:21 UTC
(cont'd from bug #29977)
(In reply to comment #12)
> (In reply to comment #11)
> > I still have this problem (see bug bug #30089), [...]
> 
> So why aren't you following up there rather than here?

It seems I made a mistake :-)

> > Trying to understand the original problem a bit more, with this change
> > reverted, I've tried to reproduce the compiz crash mentioned in the change
> > description on i915, but have had no luck so far.  The relevant bug seems to be
> > #21132, but that doesn't give a great deal of detail.  Do I need to do anything
> > more than 'compiz --replace' to see the bug?
> 
> AFAIR that should be it. You may need to force indirect rendering via compiz
> --indirect-rendering or by setting the environment variable
> LIBGL_ALWAYS_INDIRECT=1 for compiz. Or maybe it's only reproducible with DRI1.

Yes, I was using LIBGL_ALWAYS_INDIRECT. The current intel driver from git seems to require DRI2 to start, so I'm not sure how to test with DRI1 only.
 
> > Also the patch attached there differs from that which has been pushed in that
> > it doesn't include the tag invalidation code which causes my problem.
> 
> The patch from bug 21132 is commit 2075d4bf9e53b8baef0b919da6c44771220cd4a5,
> commit 3020b1d43e34fca08cd51f7c7c8ed51497d49ef3 is a followup fix.

You are quite correct, my mistake.  But I'm afraid that makes me even less confident I have the correct reproduction steps.
 
> > From the point of view of simply reading the code, I have some difficulty
> > understanding how old tags in cl->currentContexts can have an affect after the
> > client is restarted (surely a new client structure is allocated for a new
> > instance of the client?)
> 
> Unfortunately I don't remember the details, but maybe the crash didn't happen
> due to the new client starting up but due the old one shutting down.
> 
> 
> As a shot in the dark, does removing the
> 
>         if (!c->idExists) {
>         __glXFreeContext(c);
>         }
> 
> block fix your problem?

No, but this doesn't surprise me much.  We never get around to trying to use the context, because the tag the client thinks refers to it is invalid, hence the GLXBadContextTag error.

As I see it, the problem with commit 3020b1d4 is the following sequence of client actions:

Create Drawable 1
Create Context A
Make Context A current
Delete Drawable 1
Create Drawable 2
Create context B
Make Context B current

The last step fails GLXBadContextTag, as the 'previously current' tag for A sent with the make curent request for B is no longer valid, as it's been invalidated when drawable 1 was deleted.
Comment 3 Jon Turney 2010-09-21 07:30:08 UTC
Created attachment 38845 [details] [review]
Don't delete the current context of a drawable when drawable is deleted

Attached an example patch to fix

I'm not sure how to proceed though: I don't know how to demonstrate that this patch doesn't regress the compiz problem the original commit aims to fix.
Comment 4 Michel Dänzer 2010-09-22 03:05:53 UTC
(In reply to comment #2)
> (cont'd from bug #29977)
> (In reply to comment #12)
> > (In reply to comment #11)
> > As a shot in the dark, does removing the
> > 
> >         if (!c->idExists) {
> >         __glXFreeContext(c);
> >         }
> > 
> > block fix your problem?
> 
> No, but this doesn't surprise me much.  We never get around to trying to use
> the context, because the tag the client thinks refers to it is invalid, hence
> the GLXBadContextTag error.

However, looking at your patch, I don't understand how the rest of the code it removes has anything to do with deleting the context...


> As I see it, the problem with commit 3020b1d4 is the following sequence of
> client actions:
> 
> Create Drawable 1
> Create Context A
> Make Context A current
> Delete Drawable 1
> Create Drawable 2
> Create context B
> Make Context B current
> 
> The last step fails GLXBadContextTag, as the 'previously current' tag for A
> sent with the make curent request for B is no longer valid, as it's been
> invalidated when drawable 1 was deleted.

FWIW, the commit in question didn't intend to 'invalidate' the context but just to make it not current. But maybe that's the same as far as this is concerned.

I'm afraid I won't be able to help much more with finding a solution for this. I guess reverting the commit and seeing what happens might be an option.
Comment 5 Kristian Høgsberg 2010-09-23 06:25:40 UTC
Does this mesa patch help?

diff --git a/src/glx/glxcmds.c b/src/glx/glxcmds.c
index 4f7e84e..cb474ac 100644
--- a/src/glx/glxcmds.c
+++ b/src/glx/glxcmds.c
@@ -368,6 +368,7 @@ DestroyContext(Display * dpy, GLXContext ctx)
       if (!gc->imported)
         glx_send_destroy_context(dpy, gc->xid);
       gc->xid = None;
+      gc->currentContextTag = 0;
       __glXUnlock();
       return;
    }
Comment 6 Kristian Høgsberg 2010-09-23 08:08:35 UTC
(In reply to comment #5)
> Does this mesa patch help?
> 
> diff --git a/src/glx/glxcmds.c b/src/glx/glxcmds.c
> index 4f7e84e..cb474ac 100644
> --- a/src/glx/glxcmds.c
> +++ b/src/glx/glxcmds.c
> @@ -368,6 +368,7 @@ DestroyContext(Display * dpy, GLXContext ctx)
>        if (!gc->imported)
>          glx_send_destroy_context(dpy, gc->xid);
>        gc->xid = None;
> +      gc->currentContextTag = 0;
>        __glXUnlock();
>        return;
>     }

Just tested this and it works.  However,  I went back and looked at older libGL versions, and this isn't actually a regression in libGL.  If we go with this change, we're essentially changing the protocol between X and libGL.  The GLX spec isn't very clear on this point, but the old libGL behaviour I'd guess that the context should stay current even if the window goes away, but any other operation than making it current with another drawable or making another context current should generate GLXBadDrawable.  I think...
Comment 7 Jon Turney 2010-09-24 08:05:51 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Does this mesa patch help?
> > 
> > diff --git a/src/glx/glxcmds.c b/src/glx/glxcmds.c
> > index 4f7e84e..cb474ac 100644
> > --- a/src/glx/glxcmds.c
> > +++ b/src/glx/glxcmds.c
> > @@ -368,6 +368,7 @@ DestroyContext(Display * dpy, GLXContext ctx)
> >        if (!gc->imported)
> >          glx_send_destroy_context(dpy, gc->xid);
> >        gc->xid = None;
> > +      gc->currentContextTag = 0;
> >        __glXUnlock();
> >        return;
> >     }
> 
> Just tested this and it works.

Yes, thank you.

Thanks for your explainations here and on-list. I'd completely failed to note this deferred delete was going on.  I begin to understand why you dislike GLX so much :-)

> However,  I went back and looked at older libGL
> versions, and this isn't actually a regression in libGL.

Preumably it's ther server side change in 3020b1d4 which makes this behaviour give an error, as the server is now marks the stale tag as invalid.

> If we go with this
> change, we're essentially changing the protocol between X and libGL.  The GLX
> spec isn't very clear on this point, but the old libGL behaviour I'd guess that
> the context should stay current even if the window goes away, but any other
> operation than making it current with another drawable or making another
> context current should generate GLXBadDrawable.  I think...

I agree that the specification is slient on this point.
Comment 8 Jon Turney 2010-09-24 08:11:52 UTC
Created attachment 38936 [details]
Small testcase to demonstrate GLXBadContextTag error when window is deleted with a current context

> Create Drawable 1
> Create Context A
> Make Context A current
> Delete Drawable 1
> Create Drawable 2
> Create context B
> Make Context B current

So, I'd failed to note that there were actually deferred context deletes happening with the glean test, but attached is a small test case (which doesn't delete contexts) which demonstrates that the above sequence of actions does cause a GLXBadContextTag error, even with your patch.

I don't know if you'd consider this a separate issue or not...
Comment 9 Jon Turney 2011-03-26 09:52:04 UTC
The original issue is still present with mesa 7.10.1 and X server 1.10.0

(In reply to comment #6)
> Just tested this and it works.  However,  I went back and looked at older libGL
> versions, and this isn't actually a regression in libGL.  If we go with this
> change, we're essentially changing the protocol between X and libGL.  The GLX
> spec isn't very clear on this point, but the old libGL behaviour I'd guess that
> the context should stay current even if the window goes away, but any other
> operation than making it current with another drawable or making another
> context current should generate GLXBadDrawable.  I think...

I think I agree with this analysis.  Unfortunately, the server side change in 3020b1d4 has already changed the protocol in the sense that it has reduced the scope in which a context tag is valid.

However, since it looks like context tag handling is going to be rewritten a bit, hopefully this will work in 1.11 :-)
Comment 10 Jon Turney 2014-01-09 15:05:51 UTC
Created attachment 91760 [details]
Small testcase to demonstrate GLXBadContextTag error when window is deleted with a current context
Comment 11 Jon Turney 2014-01-09 15:41:57 UTC
Was fixed around 1.11


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.