commit e6a7b70cdb2ae8b713012839a0a0bbb93817b8ef ("Support multiple independent internal sync handlers") added _XPrivSyncFunction with some asserts that trigger on multithreaded code. This gives reproducible crashes in SyncHandle() when running 'ico -threads 10', as reported in http://lists.freedesktop.org/pipermail/xorg/2009-February/043809.html It seems to me that checking/modifying dpy->synchandler and dpy->savedsynchandler should be done with the display lock held.
Our multi-threaded app hit the assert: "(dpy->flags & (1L << 3)) != 0" I agree with the report that checking/modifying dpy->synchandler and dpy->savedsynchandler should be done with the display lock held. I added LockDisplay()/UnlockDisplay() to both _XPrivSyncFunction() and _XSetPrivSyncFunction() as below. The Lock in _XPrivSyncFunction proabably wasn't necesary around h_XIDHandler() & _XSeqSyncFunction as these both also Lock themselves. My only concern is any performance degradation. static int _XPrivSyncFunction (Display *dpy) { +++LockDisplay(dpy); assert(dpy->synchandler == _XPrivSyncFunction); assert((dpy->flags & XlibDisplayPrivSync) != 0); dpy->synchandler = dpy->savedsynchandler; dpy->savedsynchandler = NULL; dpy->flags &= ~XlibDisplayPrivSync; _XIDHandler(dpy); _XSeqSyncFunction(dpy); +++UnlockDisplay(dpy); return 0; } void _XSetPrivSyncFunction(Display *dpy) { +++LockDisplay(dpy); if (!(dpy->flags & XlibDisplayPrivSync)) { dpy->savedsynchandler = dpy->synchandler; dpy->synchandler = _XPrivSyncFunction; dpy->flags |= XlibDisplayPrivSync; } +++UnlockDisplay(dpy); }
Adding the xcb list to cc in the hope someone there can comment, since that was broken by a patch for Xlib/XCB.
Jamey and I had a reason why we couldn't add the locks suggested in comment 1. Currently trying to swap in enough Xlib/XCB state to remember that reason. :)
(In reply to comment #3) > Jamey and I had a reason why we couldn't add the locks suggested in comment 1. > Currently trying to swap in enough Xlib/XCB state to remember that reason. :) So, for one thing, _XSetPrivSyncFunction gets called with the Display lock held, so it shouldn't call LockDisplay. However, _XPrivSyncFunction gets called from SyncHandle, which always happens *after* UnlockDisplay, because Xlib hates threads and wants them to suffer.
OK. Jamey and I have a plan to solve this now. In the XTHREADS case, after you've called XInitThreads, we can hook LockDisplay and UnlockDisplay. We can use that to run _XIDHandler and _XSeqSyncHandler from LockDisplay rather than SyncHandle; we then know that we hold the lock, and thus we can prevent the race. We think it makes sense to do these both from LockDisplay rather than UnlockDisplay, so that you know you have valid sync and a valid XID before you start setting up the request you locked to prepare. In the !XTHREADS case, or if you haven't called XInitThreads, you don't get to use Xlib from multiple threads, so we can use the logic we have now (with synchandler and savedsynchandler) without any concern about races.
We just pushed the following fix: commit a6d974dc59f2722b36e2df9d4f07aeee4f83ce43 Author: Jamey Sharp <jamey@minilop.net> Date: Thu Apr 15 13:05:08 2010 -0700 Move XID and sync handling from SyncHandle to LockDisplay to fix races. XID and sync handling happened via _XPrivSyncHandler, assigned to dpy->synchandler and called from SyncHandle. _XPrivSyncHandler thus ran without the Display lock, so manipulating the Display caused races, and these races led to assertions in multithreaded code (demonstrated via ico). In the XTHREADS case, after you've called XInitThreads, we can hook LockDisplay and UnlockDisplay. Use that to run _XIDHandler and _XSeqSyncHandler from LockDisplay rather than SyncHandle; we then know that we hold the lock, and thus we can avoid races. We think it makes sense to do these both from LockDisplay rather than UnlockDisplay, so that you know you have valid sync and a valid XID before you start setting up the request you locked to prepare. In the !XTHREADS case, or if you haven't called XInitThreads, you don't get to use Xlib from multiple threads, so we can use the logic we have now (with synchandler and savedsynchandler) without any concern about races. This approach gets a bit exciting when the XID and sequence sync handlers drop and re-acquire the Display lock. Reacquisition will re-run the handlers, but they return immediately unless they have work to do, so they can't recurse more than once. In the worst case, if both of them have work to do, we can nest the Display lock three deep. In the case of the _XIDHandler, we drop the lock to call xcb_generate_id, which takes the socket back if it needs to request more XIDs, and taking the socket back will reacquire the lock; we take care to avoid letting _XIDHandler run again and re-enter XCB from the return_socket callback (which causes Very Bad Things, and is Not Allowed). Tested with ico (with 1 and 20 threads), and with several test programs for XID and sequence sync. Tested with and without XInitThreads(), and with and without XCB. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=23192 Signed-off-by: Jamey Sharp <jamey@minilop.net> Signed-off-by: Josh Triplett <josh@freedesktop.org>
Resolving.
> --- Comment #6 from Josh Triplett <josh@freedesktop.org> 2010-04-15 13:44:40 PDT --- > We just pushed the following fix: > > commit a6d974dc59f2722b36e2df9d4f07aeee4f83ce43 > Author: Jamey Sharp <jamey@minilop.net> > Date: Thu Apr 15 13:05:08 2010 -0700 > > Move XID and sync handling from SyncHandle to LockDisplay to fix races. > thanks!
After my comment #1, I too found the problem with the recursive LockDisplay. I managed to fix that issue, but overall my fix only ever worked 95%, so I look forward to testing this out. One other thing I wondered at the time is to do with the assert( (dpy->flags & XlibDisplayPrivSync) != 0 ) Are bitwise flag modifications gaurenteed to be atomic??? Not all the other flags are modified within LockDisplays. ie Could a read/modify/write to one of the other flags make the XlibDisplayPrivSync flag un-thread-safe?
(In reply to comment #9) > One other thing I wondered at the time is to do with the assert( (dpy->flags & > XlibDisplayPrivSync) != 0 ) > Are bitwise flag modifications gaurenteed to be atomic??? > Not all the other flags are modified within LockDisplays. > ie Could a read/modify/write to one of the other flags make the > XlibDisplayPrivSync flag un-thread-safe? If dpy->flags is written to outside the lock, then yes, that's probably unsafe, and theoretically ought to be fixed. However, after the commit Josh mentioned, we don't use XlibDisplayPrivSync or savedsynchandler at all in multi-threaded applications, so it is no longer a problem for *this* part of the code. :-)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.