Bug 23192 - racy access to dpy->synchandler in libX11 >= 1.1.99.2
Summary: racy access to dpy->synchandler in libX11 >= 1.1.99.2
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Lib/Xlib (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2009-08-07 03:35 UTC by Julien Cristau
Modified: 2010-04-21 18:58 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Julien Cristau 2009-08-07 03:35:40 UTC
commit e6a7b70cdb2ae8b713012839a0a0bbb93817b8ef ("Support multiple independent internal sync handlers") added _XPrivSyncFunction with some asserts that trigger on multithreaded code.  This gives reproducible crashes in SyncHandle() when running 'ico -threads 10', as reported in http://lists.freedesktop.org/pipermail/xorg/2009-February/043809.html
It seems to me that checking/modifying dpy->synchandler and dpy->savedsynchandler should be done with the display lock held.
Comment 1 Steve R 2009-10-14 21:42:32 UTC
Our multi-threaded app hit the assert: "(dpy->flags & (1L << 3)) != 0"

I agree with the report that checking/modifying dpy->synchandler and
dpy->savedsynchandler should be done with the display lock held.

I added LockDisplay()/UnlockDisplay() to both _XPrivSyncFunction() and 
_XSetPrivSyncFunction() as below. The Lock in _XPrivSyncFunction proabably wasn't necesary around h_XIDHandler() & _XSeqSyncFunction as these both also Lock themselves.

My only concern is any performance degradation.

static int
_XPrivSyncFunction (Display *dpy)
{
    +++LockDisplay(dpy);
    assert(dpy->synchandler == _XPrivSyncFunction);
    assert((dpy->flags & XlibDisplayPrivSync) != 0);
    dpy->synchandler = dpy->savedsynchandler;
    dpy->savedsynchandler = NULL;
    dpy->flags &= ~XlibDisplayPrivSync;
    _XIDHandler(dpy);
    _XSeqSyncFunction(dpy);
    +++UnlockDisplay(dpy);
    return 0;
}

void _XSetPrivSyncFunction(Display *dpy)
{
    +++LockDisplay(dpy);
    if (!(dpy->flags & XlibDisplayPrivSync)) {
	dpy->savedsynchandler = dpy->synchandler;
	dpy->synchandler = _XPrivSyncFunction;
	dpy->flags |= XlibDisplayPrivSync;
    }
    +++UnlockDisplay(dpy);
}
Comment 2 Julien Cristau 2010-04-15 10:19:17 UTC
Adding the xcb list to cc in the hope someone there can comment, since that was broken by a patch for Xlib/XCB.
Comment 3 Josh Triplett 2010-04-15 10:40:35 UTC
Jamey and I had a reason why we couldn't add the locks suggested in comment 1.  Currently trying to swap in enough Xlib/XCB state to remember that reason. :)
Comment 4 Josh Triplett 2010-04-15 10:47:28 UTC
(In reply to comment #3)
> Jamey and I had a reason why we couldn't add the locks suggested in comment 1. 
> Currently trying to swap in enough Xlib/XCB state to remember that reason. :)

So, for one thing, _XSetPrivSyncFunction gets called with the Display lock held, so it shouldn't call LockDisplay.

However, _XPrivSyncFunction gets called from SyncHandle, which always happens *after* UnlockDisplay, because Xlib hates threads and wants them to suffer.
Comment 5 Josh Triplett 2010-04-15 11:24:51 UTC
OK.  Jamey and I have a plan to solve this now.

In the XTHREADS case, after you've called XInitThreads, we can hook LockDisplay and UnlockDisplay.  We can use that to run _XIDHandler and _XSeqSyncHandler from LockDisplay rather than SyncHandle; we then know that we hold the lock, and thus we can prevent the race.  We think it makes sense to do these both from LockDisplay rather than UnlockDisplay, so that you know you have valid sync and a valid XID before you start setting up the request you locked to prepare.

In the !XTHREADS case, or if you haven't called XInitThreads, you don't get to use Xlib from multiple threads, so we can use the logic we have now (with synchandler and savedsynchandler) without any concern about races.
Comment 6 Josh Triplett 2010-04-15 13:44:40 UTC
We just pushed the following fix:

commit a6d974dc59f2722b36e2df9d4f07aeee4f83ce43
Author: Jamey Sharp <jamey@minilop.net>
Date:   Thu Apr 15 13:05:08 2010 -0700

    Move XID and sync handling from SyncHandle to LockDisplay to fix races.
    
    XID and sync handling happened via _XPrivSyncHandler, assigned to
    dpy->synchandler and called from SyncHandle.  _XPrivSyncHandler thus ran
    without the Display lock, so manipulating the Display caused races, and
    these races led to assertions in multithreaded code (demonstrated via
    ico).
    
    In the XTHREADS case, after you've called XInitThreads, we can hook
    LockDisplay and UnlockDisplay.  Use that to run _XIDHandler and
    _XSeqSyncHandler from LockDisplay rather than SyncHandle; we then know
    that we hold the lock, and thus we can avoid races.  We think it makes
    sense to do these both from LockDisplay rather than UnlockDisplay, so
    that you know you have valid sync and a valid XID before you start
    setting up the request you locked to prepare.
    
    In the !XTHREADS case, or if you haven't called XInitThreads, you don't
    get to use Xlib from multiple threads, so we can use the logic we have
    now (with synchandler and savedsynchandler) without any concern about
    races.
    
    This approach gets a bit exciting when the XID and sequence sync
    handlers drop and re-acquire the Display lock. Reacquisition will re-run
    the handlers, but they return immediately unless they have work to do,
    so they can't recurse more than once.  In the worst case, if both of
    them have work to do, we can nest the Display lock three deep.  In the
    case of the _XIDHandler, we drop the lock to call xcb_generate_id, which
    takes the socket back if it needs to request more XIDs, and taking the
    socket back will reacquire the lock; we take care to avoid letting
    _XIDHandler run again and re-enter XCB from the return_socket callback
    (which causes Very Bad Things, and is Not Allowed).
    
    Tested with ico (with 1 and 20 threads), and with several test programs
    for XID and sequence sync.  Tested with and without XInitThreads(), and
    with and without XCB.
    
    Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=23192
    
    Signed-off-by: Jamey Sharp <jamey@minilop.net>
    Signed-off-by: Josh Triplett <josh@freedesktop.org>
Comment 7 Josh Triplett 2010-04-15 13:45:16 UTC
Resolving.
Comment 8 Julien Cristau 2010-04-15 13:49:48 UTC
> --- Comment #6 from Josh Triplett <josh@freedesktop.org> 2010-04-15 13:44:40 PDT ---
> We just pushed the following fix:
> 
> commit a6d974dc59f2722b36e2df9d4f07aeee4f83ce43
> Author: Jamey Sharp <jamey@minilop.net>
> Date:   Thu Apr 15 13:05:08 2010 -0700
> 
>     Move XID and sync handling from SyncHandle to LockDisplay to fix races.
> 
thanks!
Comment 9 Steve R 2010-04-21 18:30:05 UTC
After my comment #1, I too found the problem with the recursive LockDisplay. I managed to fix that issue, but overall my fix only ever worked 95%, so I look forward to testing this out.

One other thing I wondered at the time is to do with the assert( (dpy->flags & XlibDisplayPrivSync) != 0 )
Are bitwise flag modifications gaurenteed to be atomic??? 
Not all the other flags are modified within LockDisplays.
ie Could a read/modify/write to one of the other flags make the XlibDisplayPrivSync flag un-thread-safe?
Comment 10 Jamey Sharp 2010-04-21 18:58:39 UTC
(In reply to comment #9)
> One other thing I wondered at the time is to do with the assert( (dpy->flags &
> XlibDisplayPrivSync) != 0 )
> Are bitwise flag modifications gaurenteed to be atomic??? 
> Not all the other flags are modified within LockDisplays.
> ie Could a read/modify/write to one of the other flags make the
> XlibDisplayPrivSync flag un-thread-safe?

If dpy->flags is written to outside the lock, then yes, that's probably unsafe, and theoretically ought to be fixed.

However, after the commit Josh mentioned, we don't use XlibDisplayPrivSync or savedsynchandler at all in multi-threaded applications, so it is no longer a problem for *this* part of the code. :-)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.