Bug 54671

Summary: Wine locks up when running multithreaded applications that touch both OpenGL and X11
Product: XCB Reporter: Erich Hoover <erich.e.hoover>
Component: LibraryAssignee: xcb mailing list dummy <xcb>
Status: RESOLVED FIXED QA Contact: xcb mailing list dummy <xcb>
Severity: normal    
Priority: medium CC: psychon, xangel1
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Hack to work around the problem
Patch to keep Wine applications from locking up
Backtrace of the threads involved in the lockup
Patch to keep Wine applications from locking up
Patch to keep Wine applications from locking up [v2]
xcb-only test case that (hopefully) reproduces this problem
Patch to keep Wine applications from locking up [v2]

Description Erich Hoover 2012-09-08 15:28:05 UTC
Now that Wine is no longer locking around all X calls we're seeing lockups in applications that use both OpenGL (through Direct3D) and X11 (through GDI) simultaneously in separate threads.  It appears that something strange is happening in _XReply that causes the applications to hang indefinitely in xcb_wait_for_reply when both threads happen to use the same sequence number at the same time.

I'm not really an X expert, so I'd really appreciate some help trying to track this down.  On the Wine end we have a bug open (http://bugs.winehq.org/show_bug.cgi?id=31406) to track this issue, but we're pretty sure it's something in xlib or xcb.
Comment 1 Erich Hoover 2012-09-09 20:37:29 UTC
Created attachment 66895 [details] [review]
Hack to work around the problem

Ok, so I've been trying to find enough information about what's going on to provide something useful for fixing this.  Working off of the latest git the problem appears to be solvable using the attached hack and is triggered when Wine calls XCheckIfEvent in combination with OpenGL routines.
Comment 2 Erich Hoover 2012-09-11 17:42:27 UTC
Created attachment 66985 [details] [review]
Patch to keep Wine applications from locking up

Ok, the attached patch seems to fix the problem without killing the framerate of games using OpenGL.  However, I'm concerned that this sets up a ping-pong scenario between the threads and is therefore just obscuring the issue.  Feedback would be greatly appreciated.
Comment 3 Uli Schlachter 2012-09-11 20:46:11 UTC
Since IRC didn't feel helpful: Could you provide backtraces for the deadlock that you are seeing? What exactly happens without this patch?

Also, which versions of libxcb are you using?
Comment 4 Erich Hoover 2012-09-11 21:32:56 UTC
Created attachment 67004 [details]
Backtrace of the threads involved in the lockup

(In reply to comment #3)
> Since IRC didn't feel helpful: Could you provide backtraces for the deadlock
> that you are seeing? What exactly happens without this patch?
> 
> Also, which versions of libxcb are you using?

Sorry, is IRC your preferred mode of communication?  Without the patch we see a lockup in some of Wine's mutexes because the other thread held the mutex and then got stuck in xcb_wait_for_reply (see backtrace).  I'm currently using the libxcb from git.  If it helps then I can add debug symbols to some of the other libraries you see in the backtrace (libx11 and libxrandr).
Comment 5 Uli Schlachter 2012-09-11 22:02:39 UTC
Does it always hang inside of XRRGetCrtcInfo? That function doesn't do anything special and I don't really see why this should indefinitely wait for a reply from the X11 server. Also, unlocking the Xlib display before polling for the reply shouldn't make any difference on this at all. :-/
Comment 6 Erich Hoover 2012-09-11 23:23:06 UTC
(In reply to comment #5)
> Does it always hang inside of XRRGetCrtcInfo? That function doesn't do anything
> special and I don't really see why this should indefinitely wait for a reply
> from the X11 server. 

No, it hangs inside all sorts of different functions.  Sometimes it's just a direct _XReply call (not sure how that happens).  The most common parent (from maybe 10-20 runs) is:
6 0xf6f332d5 in libgl.so.1 (+0x8f2d4) (0x022de488)

> Also, unlocking the Xlib display before polling for the
> reply shouldn't make any difference on this at all. :-/

I'm not sure what you mean here, do you mean in attachment 66895 [details] [review] where I remove the unlock so that it stays locked while it goes through xcb_wait_for_reply?  I suspect what's happening is that one thread is depleting the fd of events, so the other thread hangs inside the poll() virtually indefinitely.  (So if there's a lock around the poll() and recv() then that's not possible)
Comment 7 Erich Hoover 2012-09-12 11:00:44 UTC
Created attachment 67043 [details] [review]
Patch to keep Wine applications from locking up

(In reply to comment #2)
> Created attachment 66985 [details] [review] [review]
> Patch to keep Wine applications from locking up
> ...

Whoops, I attached the wrong patch here (no wonder about the confusion!).  I'm really sorry about that, I'm not sure how I managed to do that.
Comment 8 Uli Schlachter 2012-09-12 11:17:25 UTC
I don't think that patch actually fixes any races, at best it makes them harder to hit, because instead of poll()ing the fd at the same time, the second thread will now just wait for the mutex and then start poll()ing after the first thread is done. This thread then still wouldn't get woken up if there is nothing to be read from the socket, so the same hang can still happen.

Also, this shouldn't be needed. While poll()ing the fd, _xcb_conn_wait will always set c->in.reading to a non-zero value. This means that the next thread that calls _xcb_conn_wait() will sleep in pthread_cond_wait() instead of poll(). (However, it seems like if one thread is only reading and doesn't want to write, another thread could be allowed to read and write at the same time. Could this be the issue at hand? If yes, it should be possible to somehow produce an xcb-only testcase of this...?)
Comment 9 Erich Hoover 2012-09-12 11:28:50 UTC
(In reply to comment #8)
> ... (However, it seems like if one thread is only reading and doesn't want
> to write, another thread could be allowed to read and write at the same time.
> Could this be the issue at hand? If yes, it should be possible to somehow
> produce an xcb-only testcase of this...?)

That seems like a reasonable possibility from the playing around I've done, is there an easy way that I can confirm that this is what's happening?
Comment 10 Uli Schlachter 2012-09-12 11:43:32 UTC
Since you are building your own libxcb anyway, do some printf-debugging: right before the poll() do if(c->in.reading > 1) puts("something is smelly here");
(And double-check that I got the logic right)
(Alternatively use assert())
(Alternatively, check that c->in.reading == 0 before that variable is incremented)
Comment 11 Erich Hoover 2012-09-12 11:53:18 UTC
Created attachment 67048 [details] [review]
Patch to keep Wine applications from locking up [v2]

(In reply to comment #10)
> Since you are building your own libxcb anyway, do some printf-debugging: right
> before the poll() do if(c->in.reading > 1) puts("something is smelly here");
> (And double-check that I got the logic right)
> (Alternatively use assert())
> (Alternatively, check that c->in.reading == 0 before that variable is
> incremented)

Yup, I get a bunch of "c->in.reading != 0, expect funny business!" and then a lockup.  I threw together a quick check so that it would not read if a read is already in progress (attached) and with a small number of tests (3) that seems to be working.
Comment 12 Uli Schlachter 2012-09-12 12:46:05 UTC
Created attachment 67051 [details]
xcb-only test case that (hopefully) reproduces this problem

This test has two threads, one is sending MapWindow requests, the other is sending GetInputFocus requests and waits for the replies. After each request, a "x" or "o" is printed.

The GetInputFocus thread quickly dead-locks in poll().
Comment 13 Uli Schlachter 2012-09-12 12:57:52 UTC
After all, this seems to be a bug in xcb, not in Xlib.

I'm not sure about the idea of the proposed patch. If there are seperate reading and writing threads, the reading thread could stop reading and we end up without any reading thread. In theory this could cause a dead-lock where the server refuses to read until some of the already written data gets handled in the client, but this seems quite unlikely.

For the patch itself: fd.fd is not initialized in the "just writing"-case.
Comment 14 Erich Hoover 2012-09-12 14:41:19 UTC
Created attachment 67053 [details] [review]
Patch to keep Wine applications from locking up [v2]

(In reply to comment #13)
> ...
> I'm not sure about the idea of the proposed patch. If there are seperate
> reading and writing threads, the reading thread could stop reading and we end
> up without any reading thread. In theory this could cause a dead-lock where the
> server refuses to read until some of the already written data gets handled in
> the client, but this seems quite unlikely.

If the reading thread stops reading then wouldn't the writing thread open up to being able to read again?

> For the patch itself: fd.fd is not initialized in the "just writing"-case.

My bad, sorry about that - a corrected version is attached (still works for my test case).
Comment 15 Uli Schlachter 2012-09-30 11:19:59 UTC
Sorry for not using your patch, but there must always be a thread reading from the socket when there is one that writes.

commit 23911a707b8845bff52cd7853fc5d59fb0823cef
Author: Uli Schlachter <psychon@znc.in>
Date:   Mon Sep 24 22:07:51 2012 +0200

    Fix a multi-thread deadlock
    
    This fixes a deadlock which was seen in-the-wild with wine.
    
    It could happen that two threads tried to read from the socket at the same time
    and one of the thread got stuck inside of poll()/select().
    
    The fix works by making sure that the writing thread doesn't steal the reading
    thread's reply.
    
    Debugged-by: Erich Hoover <ehoover@mines.edu>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54671
    Signed-off-by: Uli Schlachter <psychon@znc.in>
Comment 16 Erich Hoover 2012-09-30 15:36:27 UTC
(In reply to comment #15)
> Sorry for not using your patch, but there must always be a thread reading
> from the socket when there is one that writes.
> ...

No problem, I'm just glad to get this fixed!  Would you mind if I contact the Ubuntu folks and ask them to apply this patch to their 1.8.1 package?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.