We came across a locking problem with the XIM when threading support in Xlib is turned on. Here is what happens: XFilterEvent() - an X events filter for input methods - will lock the display when called. However several calls down the input method handler may call another Xlib function that is normally only called from clients (like XGetWindowAttributes() see backtrace attached). I would suggest to split up XGetWindowAttributes() into an 'internal' function _XGetWindowAttributes() which doesn't do any locking and let the original function just wrap the call to this fucntion with locking. I expect that this is just one example of a class of problems we are going to see with multithreaded clients. (Yes, I know, multitread support in Xlib *is* broken - but this problem exists because the implementors of XIM did not worry about multithreading at all).
Created attachment 728 [details] backtrace to problem described above.
Created attachment 729 [details] [review] proposed fix. The patch separates the offending function in an internal and external part. The internal part is called whenever the function is called from within Xlib/XIM. The code has not been fully tested. Also it is possible that more thorough testing will turn up numerous other cases.
I'm not sure if this is really a release blocker - although apps that require multithreading as well as input methods become more and more common - but to bring this to the attention of the release team I mark it as a blocker. We can remove the blocker later again.
This is only one example of the type of failure that can happen when using multithreaded Qt and input methods. I have seen seen other examples (with different stack traces, particularly with _XRegisterFilterByMask() and _XRegisterFilterByType()) of this behavior. I'm sure the attached patch will fix the reported problem, but there are probably many others lurking, and we might not be able to use the same type of band-aid for all of those cases.
As I've said this may not be the only example. Therefore we should collect all cases and see what we do about them one by one. The real solution would be to replace the threading system in Xlib - but until this happens band aids for every singe occurance are better than nothing. Paul: If you have other examples and their stack traces please create entires to this bugzilla so they can be fixed.
Created attachment 735 [details] [review] proposed patch. This one supersedes the previous patch as it fixes another instance of this class of problems.
I've committed this as discussed in the conf call.
Problem still exists in 6.8.2 and 6.8.1 How about to remove LockDisplay call from XFilterEvent - it shouldn't be there. See: http://lists.freedesktop.org/pipermail/release-wranglers/2004-August/000968.html Deadlock stacktrace looks like: Thread 4 #0 __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0 #1 _L_mutex_lock_33 () from /lib/tls/libpthread.so.0 #2 _XLockDisplay () from /usr/X11R6/lib/libX11.so.6 #3 XInternAtom () from /usr/X11R6/lib/libX11.so.6 #4 _XimFilterPropertyNotify #5 XFilterEvent
Paul Anderson, Alan Coopersmith and I discussed this problem extensively. We agreed that the right solution would probably be the following simple patch: --- FilterEv.c 20 Dec 2001 12:37:37 -0000 1.1.1.4 +++ FilterEv.c 15 Jul 2005 10:13:57 -0000 @@ -88,9 +88,9 @@ if (win == p->window) { if ((mask & p->event_mask) || (ev->type >= p->start_type && ev->type <= p->end_type)) { + UnlockDisplay(ev->xany.display); ret = (*(p->filter))(ev->xany.display, p->window, ev, p->client_data); - UnlockDisplay(ev->xany.display); return(ret); } } Which simply moves the unlock of the display lock before the call to the filter function. When doing this patch parts of the earlier patch should probably be removed.
Created attachment 3090 [details] [review] Patch fixing two bugs in libpixman for BGR images
Egbert: Comment #7 indicates something was committed, presumeably the patch from comment #6. Can you confirm that either way? Also, the patch in comment 9, which is supposed to be a better fix, seems to be an unrelated cairo patch. Is this bug perhaps corrupted from a previous hard drive crash? Can someone update it to reflect current status, and if necessary reattach any corrupt patches? TIA
Yeah, the unrelated cairo patch went in there when the RAID went sour which caused our bugzilla data base to get hosed. I lost a bunch of patches this way: First my laptop harddrive died then I discovered that a lot of patches that I had stuck into bugzilla had disappeared also. I have resurrected (rewritten) the XIM patch and will attach it.
Created attachment 4874 [details] [review] better fix This fix reverts parts of attachment #735 [details] [review] as with both patches we would have no locking at all.
Note for those applying this fix to XFree86 4.3.0 era X builds: The latter 2 hunks of the patch in comment #12 aren't needed, as 4.3.0 already contains that. Reviewing the comments in this report preceding that it would appear the code had been that way originally, then been changed in a future release, and this new patch is reverting it back to the way it was in 4.3.0. The one hunk to FilterEv.c still applies clean though. I'll update this with my results with 6.8.x later also.
Mike, thanks for pointing this out. Indeed in order for the FilterEv patch to work properly the earlier patches need to be reversed. There has been a change in praradigms: first it was assumed that the threat locking needed to happen in FilertEvents() now it is assumed that any call to a function below FilterEvents() should do the locking itself. So we need to undo the changes that call the non-locking versions of some functions. When I originally looked at the problem my feeling was that calling the event filter chain within the locks was done on purpose. After some extensive discussions with Paul Anderson (who ran into a deadlock problem on HP/UX that was not as easy to work around as the others) and with Alan Coopersmith who confirmed that SUN had the change to FilterEv() that we were proposing in their code for years finally convinced that the current implementation was actually not correct.
Could someone commit the patch in comment #12 for Xorg 7.2? Recently we have got problem using XIM with wine >= 0.9.23 (http://bugs.winehq.org/show_bug.cgi?id=6547).
Created attachment 7730 [details] reduced test which causes deadlock in Xim This bug prevents SWI-Prolog/XPCE from running on OS X; a developer (Jan Wielemaker) was kind enough to submit a reduced test case, which I've attached. It opens a small window; if you try to type into the window, the process will deadlock. My understanding is that this code snippet will not cause deadlock on all platforms, but it does so consistently on OS X, with this backtrace: #0 0x9002b8a8 in semaphore_wait_signal_trap () #1 0x900019cc in pthread_mutex_lock () #2 0x9ba85b9c in _XLockDisplay () #3 0x9ba902d8 in XkbGetUpdatedMap () #4 0x9ba903a8 in XkbGetMap () #5 0x9ba8d4f4 in _XkbLoadDpy () #6 0x9ba8ce08 in XkbLookupKeySym () #7 0x9ba8d95c in XLookupString () #8 0x9baba004 in _XimLocalFilter () #9 0x9ba8548c in XFilterEvent () #10 0x00002370 in main (argc=1, argv=0xbffff8c4) at xim1.c:103 Recompiling libX11 with -DXTHREAD_WARN exposes the problem: [...] NextEvent.c line 47: thread a000ed88 got display lock _XReadEvents called in thread a000ed88 _XPushReader called in thread a000ed88, pushing 11003c0 XlibInt.c line 496: thread a000ed88 unlocking display XlibInt.c line 504: thread a000ed88 got display lock thread a000ed88 _XWaitForReadable returning _XPopReader called in thread a000ed88, popping 11003c0 NextEvent.c line 54: thread a000ed88 unlocking display FilterEv.c line 94: thread a000ed88 got display lock FilterEv.c line 106: thread a000ed88 unlocking display NextEvent.c line 47: thread a000ed88 got display lock _XReadEvents called in thread a000ed88 _XPushReader called in thread a000ed88, pushing 11003c0 XlibInt.c line 496: thread a000ed88 unlocking display XlibInt.c line 504: thread a000ed88 got display lock thread a000ed88 _XWaitForReadable returning _XPopReader called in thread a000ed88, popping 11003c0 NextEvent.c line 54: thread a000ed88 unlocking display XNextEvent done FilterEv.c line 94: thread a000ed88 got display lock Xlib ERROR: GetPntMap.c line 98 thread a000ed88: locking display already locked at FilterEv.c line 94
Sorry about the phenomenal bug spam, guys. Adding xorg-team@ to the QA contact so bugs don't get lost in future.
Update: The last two hunks of the fix have still not been applied.
I didn't know about this bug report, so we reproduced parts of these fixes independently when Xlib/XCB highlighted the same problem. At this point we have all but the last hunk of attachment #4874 [details] [review] committed, and clearly we should undo calls to _XPutBackEvent as well. Is there any reason why we shouldn't fully revert attachment #735 [details] [review]? Did anyone outside libX11 start using the unlocked versions of XGetWindowAttributes and XPutBackEvent, or could we safely remove them from Xlibint.h again? It's technically an incompatible API/ABI change, but if you use Xlibint.h you deserve what you get...
Egbert?
Another update: The last hunk of the fix has not been applied yet.
Committed to libX11 git head. commit 55782a0a1fe1560f1a9c0ed78bc7f2575c15abcf Author: Stefan Dirsch <sndirsch@suse.de> Date: Sat Nov 22 17:53:06 2008 +0100 Added remaining hunk of Egbert's patch to prevent XIM deadlocks (#1182).
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.