Bug 29977 - Some glean tests fail with GLXBadContextTag errors with software rendering
Summary: Some glean tests fail with GLXBadContextTag errors with software rendering
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-02 08:01 UTC by Jon Turney
Modified: 2010-09-15 06:54 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
xscope log excerpt (1.88 KB, text/plain)
2010-09-06 06:20 UTC, Jon Turney
no flags Details
Patch to fix indirect context tag handling (1.30 KB, patch)
2010-09-07 04:13 UTC, Jon Turney
no flags Details | Splinter Review
Patch to always update the gl dispatch table when a direct swrast context is made current (1.05 KB, patch)
2010-09-08 05:06 UTC, Jon Turney
no flags Details | Splinter Review

Description Jon Turney 2010-09-02 08:01:47 UTC
Some glean tests fail with GLXBadContextTag errors when run with mesa swrast renderer, with either direct or indirect rendering.

e.g. with git master xserver and mesa, mesa configured --with-dri-drivers=swrast so only swrast dri driver is built and installed.

$ ./glean -r results -o --quick -t makeCurrent
X Error of failed request:  GLXBadContextTag
  Major opcode of failed request:  153 (GLX)
  Minor opcode of failed request:  1 (X_GLXRender)
  Serial number of failed request:  37
  Current serial number in output stream:  38

These tests run file when mesa is configured --with-driver=xlib, perhaps unsurprisingly.
Comment 1 Jon Turney 2010-09-02 08:04:13 UTC
This seems to be related to this xserver commit

commit 3020b1d43e34fca08cd51f7c7c8ed51497d49ef3
Author: Michel Dänzer <daenzer@vmware.com>
Date:   Tue Jun 23 16:45:40 2009 +0200

    glx: Clean up more thoroughly if the drawable of a current context goes away.

    Fixes crash when restarting compiz, due to cl->currentContexts[x] being stale.

Reverting that commit, glean tests run successfully.
Comment 2 Michel Dänzer 2010-09-03 07:59:24 UTC
(In reply to comment #2)
> Some glean tests fail with GLXBadContextTag errors when run with mesa swrast
> renderer, with either direct or indirect rendering.
> 
> e.g. with git master xserver and mesa, mesa configured
> --with-dri-drivers=swrast so only swrast dri driver is built and installed.
> 
> $ ./glean -r results -o --quick -t makeCurrent
> X Error of failed request:  GLXBadContextTag
>   Major opcode of failed request:  153 (GLX)
>   Minor opcode of failed request:  1 (X_GLXRender)
>   Serial number of failed request:  37
>   Current serial number in output stream:  38

Weird that this error would occur with direct rendering.


(In reply to comment #1)
> This seems to be related to this xserver commit
> 
> commit 3020b1d43e34fca08cd51f7c7c8ed51497d49ef3
> Author: Michel Dänzer <daenzer@vmware.com>
> Date:   Tue Jun 23 16:45:40 2009 +0200
> 
>     glx: Clean up more thoroughly if the drawable of a current context goes
> away.
> 
>     Fixes crash when restarting compiz, due to cl->currentContexts[x] being
> stale.
> 
> Reverting that commit, glean tests run successfully.

I don't think simply reverting it is a good solution, as that would probably reintroduce the crashes mentioned in the commit log. Apparently the code introduced there does something wrong though, but I'm not sure offhand what or how to do it better. Kristian, any ideas?
Comment 3 Jon Turney 2010-09-03 08:15:38 UTC
(In reply to comment #2)
> (In reply to comment #2)
> I don't think simply reverting it is a good solution, as that would probably
> reintroduce the crashes mentioned in the commit log. Apparently the code
> introduced there does something wrong though, but I'm not sure offhand what or
> how to do it better. Kristian, any ideas?

At least one thing that seemed to be going wrong when I attempted to debug this for indirect rendering is that this commit invalidates the context tag when a drawable on which it is current is deleted.  A subsequent glXMakeCurrent protocol request sends the old context tag along with the the new context ID, but that old context tag has been made invalid...
Comment 4 Jon Turney 2010-09-06 06:20:05 UTC
Created attachment 38475 [details]
xscope log excerpt

Here's an excerpt from an xscope sesssion which shows the immediate cause of the error:

X server returns BadAccess to a GLX Make Current request (because it believes the context is current elsewhere)

In mesa indirect_glx.c:SendMakeCurrentRequest() doesn't have any error checking, so the error code is misinterpreted as context tag 0x00940005.

The subsequent attempt to use that context tag fails.

Not sure why the X server thinks the context is current elsewhere, but this makes me think my previous analysis is all wrong :-(
Comment 5 Jon Turney 2010-09-07 04:13:49 UTC
Created attachment 38506 [details] [review]
Patch to fix indirect context tag handling

Doing some more debugging for indirect rendering, the sequence of GLX make context current requests send to the server by the glean makeCurrent test is:

context A oldtag 0, returns tag 1
context None oldtag 1
context B oldtag 0, returns tag 1
context C oldtag 0[*] returns tag 2 
context None oldtag 2
context B oldtag 0, returns BadAccess, context B is still current

The line marked * seems to be an error, as the current context tag should be 1, and it's absence causes the server not make context B (which is referred to by tag 1) as no longer current, which causes the subsequent BadAccess error...

Some more debugging on the client side and I find that indirect_unbind_context() always resets the stored context tag to 0, even when we don't send a make current request to the server.

This line seems to have been added in the rewrite commit c491e585, I immediately see that the same thing was done in the previous version.

Attached a patch, which only clears the stored context tag if we actually sent a request to the server to clear the current context.

With this patch, glean makeCurrent test runs successfully with LIBGL_ALWAYS_INDIRECT=1. There is still some problem without it, though.
Comment 6 Kristian Høgsberg 2010-09-07 06:08:10 UTC
(In reply to comment #5)
> Created an attachment (id=38506) [details]
> Patch to fix indirect context tag handling
> 
> Doing some more debugging for indirect rendering, the sequence of GLX make
> context current requests send to the server by the glean makeCurrent test is:
> 
> context A oldtag 0, returns tag 1
> context None oldtag 1
> context B oldtag 0, returns tag 1
> context C oldtag 0[*] returns tag 2 
> context None oldtag 2
> context B oldtag 0, returns BadAccess, context B is still current
> 
> The line marked * seems to be an error, as the current context tag should be 1,
> and it's absence causes the server not make context B (which is referred to by
> tag 1) as no longer current, which causes the subsequent BadAccess error...
> 
> Some more debugging on the client side and I find that
> indirect_unbind_context() always resets the stored context tag to 0, even when
> we don't send a make current request to the server.
> 
> This line seems to have been added in the rewrite commit c491e585, I
> immediately see that the same thing was done in the previous version.
> 
> Attached a patch, which only clears the stored context tag if we actually sent
> a request to the server to clear the current context.
> 
> With this patch, glean makeCurrent test runs successfully with
> LIBGL_ALWAYS_INDIRECT=1. There is still some problem without it, though.

Jon, thanks for figuring this out.  The condition for clearing currentContextTag was actually right; we need to clear it whenever an indirect context is unbound.  The problem was that I switched the order of calling ->bind and ->unbind around, and ->bind needs the currentContextTag that ->unbind clears.

I've added a couple of lines to your patch to also clear the tag in the old context in ->bind when we reassign the tag to a new context.  This gives us the same behaviour as before and clearing the old tag only in the places where we send the request to unbind/rebind it make the code clearer.

(Also, please follow the code convention of the code you're submitting patches to)
Comment 7 Jon Turney 2010-09-08 05:06:55 UTC
Created attachment 38548 [details] [review]
Patch to always update the gl dispatch table when a direct swrast context is made current

But wait, there's more!

(In reply to comment #2)
> Weird that this error would occur with direct rendering.

One of the things that the glean makeCurrent test tests is creating direct and indirect contexts and interleaving them.

The makeCurrent test still fails with direct swrast rendering:

$ LIBGL_ALWAYS_SOFTWARE=1 ./glean.exe -r results -o --quick -t makeCurrent
X Error of failed request:  GLXBadContextTag
  Major opcode of failed request:  153 (GLX)
  Minor opcode of failed request:  108 (X_GLXFinish)
  Serial number of failed request:  69
  Current serial number in output stream:  69

Debugging this a bit, I find that this error occurs because the glFinish is being sent to the Xserver, even thought the current context is direct! 

This suggests that there is a path where the _glapi_set_dispatch() isn't called when context is changed, and looking at swrast.c:dri_make_current() I note that it doesn't do anything if the swrasterizers current context isn't changing.

So presumably if we have a sequence of direct context A, indirect context B, direct context A being made current, the dispatch table isn't set correctly for that last context?

Using the attached patch to always update the dispatch table when a direct swrast context is made current makes this glean test execute successfully, but probably isn't the correct way to fix this problem...
Comment 8 Kristian Høgsberg 2010-09-08 06:07:41 UTC
(In reply to comment #7)
> Created an attachment (id=38548) [details]
> Patch to always update the gl dispatch table when a direct swrast context is
> made current
> 
> But wait, there's more!
> 
> (In reply to comment #2)
> > Weird that this error would occur with direct rendering.
> 
> One of the things that the glean makeCurrent test tests is creating direct and
> indirect contexts and interleaving them.
> 
> The makeCurrent test still fails with direct swrast rendering:
> 
> $ LIBGL_ALWAYS_SOFTWARE=1 ./glean.exe -r results -o --quick -t makeCurrent
> X Error of failed request:  GLXBadContextTag
>   Major opcode of failed request:  153 (GLX)
>   Minor opcode of failed request:  108 (X_GLXFinish)
>   Serial number of failed request:  69
>   Current serial number in output stream:  69
> 
> Debugging this a bit, I find that this error occurs because the glFinish is
> being sent to the Xserver, even thought the current context is direct! 
> 
> This suggests that there is a path where the _glapi_set_dispatch() isn't called
> when context is changed, and looking at swrast.c:dri_make_current() I note that
> it doesn't do anything if the swrasterizers current context isn't changing.
> 
> So presumably if we have a sequence of direct context A, indirect context B,
> direct context A being made current, the dispatch table isn't set correctly for
> that last context?

Hmm, yes, it looks like that's what's happening, good catch.
 
> Using the attached patch to always update the dispatch table when a direct
> swrast context is made current makes this glean test execute successfully, but
> probably isn't the correct way to fix this problem...

I've committed b4bb6680200b5a898583392f4c831c02f41e63f7 which makes sure the dri drivers (dri2 and swrast, at least) unset the current context and dispatch table when they're unbound.

In the future, please open a new bug for new issues.
Comment 9 Jon Turney 2010-09-08 08:25:01 UTC
(In reply to comment #8)
> I've committed b4bb6680200b5a898583392f4c831c02f41e63f7 which makes sure the
> dri drivers (dri2 and swrast, at least) unset the current context and dispatch
> table when they're unbound.

That seems to fix the problem, thanks.

> In the future, please open a new bug for new issues.

I shall remember to do so.
Comment 10 Jon Turney 2010-09-08 08:28:32 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > In the future, please open a new bug for new issues.
> 
> I shall remember to do so.

See bug #30089 for more problems found when trying similar testing.
Comment 11 Jon Turney 2010-09-15 06:27:27 UTC
> --- Comment #2 from Michel Dänzer<michel@daenzer.net>  2010-09-03 15:59:24 BST ---
> (In reply to comment #1)
>> This seems to be related to this xserver commit
>>
>> commit 3020b1d43e34fca08cd51f7c7c8ed51497d49ef3
>> Author: Michel Dänzer<daenzer@vmware.com>
>> Date:   Tue Jun 23 16:45:40 2009 +0200
>>
>>      glx: Clean up more thoroughly if the drawable of a current context goes
>> away.
>>
>>      Fixes crash when restarting compiz, due to cl->currentContexts[x] being
>> stale.
>>
>> Reverting that commit, glean tests run successfully.
> 
> I don't think simply reverting it is a good solution, as that would probably
> reintroduce the crashes mentioned in the commit log. Apparently the code
> introduced there does something wrong though, but I'm not sure offhand what or
> how to do it better. Kristian, any ideas?

Of course, I only mention reverting the change to demonstrate that code has some bearing on the problem, I'm not suggesting reverting it as a solution.

Michel,

I still have this problem (see bug bug #30089), and perhaps you can help me out a bit which understanding the history of this change?

Trying to understand the original problem a bit more, with this change reverted, I've tried to reproduce the compiz crash mentioned in the change description on i915, but have had no luck so far.  The relevant bug seems to be #21132, but that doesn't give a great deal of detail.  Do I need to do anything more than 'compiz --replace' to see the bug?  Also the patch attached there differs from that which has been pushed in that it doesn't include the tag invalidation code which causes my problem.

From the point of view of simply reading the code, I have some difficulty understanding how old tags in cl->currentContexts can have an affect after the client is restarted (surely a new client structure is allocated for a new instance of the client?)
Comment 12 Michel Dänzer 2010-09-15 06:54:39 UTC
(In reply to comment #11)
> I still have this problem (see bug bug #30089), [...]

So why aren't you following up there rather than here?


> Trying to understand the original problem a bit more, with this change
> reverted, I've tried to reproduce the compiz crash mentioned in the change
> description on i915, but have had no luck so far.  The relevant bug seems to be
> #21132, but that doesn't give a great deal of detail.  Do I need to do anything
> more than 'compiz --replace' to see the bug?

AFAIR that should be it. You may need to force indirect rendering via compiz --indirect-rendering or by setting the environment variable LIBGL_ALWAYS_INDIRECT=1 for compiz. Or maybe it's only reproducible with DRI1.

> Also the patch attached there differs from that which has been pushed in that
> it doesn't include the tag invalidation code which causes my problem.

The patch from bug 21132 is commit 2075d4bf9e53b8baef0b919da6c44771220cd4a5, commit 3020b1d43e34fca08cd51f7c7c8ed51497d49ef3 is a followup fix.

> From the point of view of simply reading the code, I have some difficulty
> understanding how old tags in cl->currentContexts can have an affect after the
> client is restarted (surely a new client structure is allocated for a new
> instance of the client?)

Unfortunately I don't remember the details, but maybe the crash didn't happen due to the new client starting up but due the old one shutting down.


As a shot in the dark, does removing the

	    if (!c->idExists) {
		__glXFreeContext(c);
	    }

block fix your problem?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.