Bug 1182 - Xim deadlocks in certain situations.
Summary: Xim deadlocks in certain situations.
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Lib/Xlib (show other bugs)
Version: 6.8.0
Hardware: All Linux (All)
: high normal
Assignee: Egbert Eich
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 351 1526
  Show dependency treegraph
 
Reported: 2004-08-25 06:41 UTC by Egbert Eich
Modified: 2008-11-22 08:58 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
backtrace to problem described above. (1.85 KB, text/plain)
2004-08-25 06:43 UTC, Egbert Eich
no flags Details
proposed fix. (2.73 KB, patch)
2004-08-25 06:47 UTC, Egbert Eich
no flags Details | Splinter Review
proposed patch. (4.50 KB, patch)
2004-08-26 07:52 UTC, Egbert Eich
no flags Details | Splinter Review
better fix (1.91 KB, patch)
2006-03-10 04:32 UTC, Egbert Eich
no flags Details | Splinter Review
reduced test which causes deadlock in Xim (5.62 KB, text/plain)
2006-11-10 11:38 UTC, Ben Byer
no flags Details

Description Egbert Eich 2004-08-25 06:41:06 UTC
We came across a locking problem with the XIM when
threading support in Xlib is turned on.
Here is what happens:
XFilterEvent() - an X events filter for input methods - will
lock the display when called. However several calls down the
input method handler may call another Xlib function that is
normally only called from clients (like XGetWindowAttributes()
see backtrace attached). 
I would suggest to split up XGetWindowAttributes() into an
'internal' function _XGetWindowAttributes() which doesn't
do any locking and let the original function just wrap
the call to this fucntion with locking.
I expect that this is just one example of a class of problems
we are going to see with multithreaded clients.
(Yes, I know, multitread support in Xlib *is* broken - but
this problem exists because the implementors of XIM did
not worry about multithreading at all).
Comment 1 Egbert Eich 2004-08-25 06:43:41 UTC
Created attachment 728 [details]
backtrace to problem described above.
Comment 2 Egbert Eich 2004-08-25 06:47:30 UTC
Created attachment 729 [details] [review]
proposed fix.

The patch separates the offending function in an internal and external part.
The internal part is called whenever the function is called from within
Xlib/XIM.

The code has not been fully tested. Also it is possible that more thorough
testing will turn up numerous other cases.
Comment 3 Egbert Eich 2004-08-25 07:25:12 UTC
I'm not sure if this is really a release blocker - although apps that require
multithreading as well as input methods become more and more common - but to
bring this to the attention of the release team I mark it as a blocker.
We can remove the blocker later again.
Comment 4 Paul Anderson 2004-08-25 22:46:57 UTC
This is only one example of the type of failure that can
happen when using multithreaded Qt and input methods.  I
have seen seen other examples (with different stack traces,
particularly with _XRegisterFilterByMask() and
_XRegisterFilterByType()) of this behavior.  

I'm sure the attached patch will fix the reported problem,
but there are probably many others lurking, and we might not
be able to use the same type of band-aid for all of those
cases.  
Comment 5 Egbert Eich 2004-08-26 04:53:56 UTC
As I've said this may not be the only example.
Therefore we should collect all cases and see what we do about them one by one.
The real solution would be to replace the threading system in Xlib - but until
this happens band aids for every singe occurance are better than nothing.
Paul: If you have other examples and their stack traces please create entires to
this bugzilla so they can be fixed.
Comment 6 Egbert Eich 2004-08-26 07:52:49 UTC
Created attachment 735 [details] [review]
proposed patch.

This one supersedes the previous patch as it fixes another instance of this
class of problems.
Comment 7 Egbert Eich 2004-08-31 04:43:21 UTC
I've committed this as discussed in the conf call.
Comment 8 FreeDesktop Bugzilla Database Corruption Fix User 2005-07-14 23:42:17 UTC
Problem still exists in 6.8.2 and 6.8.1
How about to remove LockDisplay call from XFilterEvent - it shouldn't be there.
See:
http://lists.freedesktop.org/pipermail/release-wranglers/2004-August/000968.html

Deadlock stacktrace looks like:
Thread 4
#0 __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#1 _L_mutex_lock_33 () from /lib/tls/libpthread.so.0
#2 _XLockDisplay () from /usr/X11R6/lib/libX11.so.6
#3 XInternAtom () from /usr/X11R6/lib/libX11.so.6
#4 _XimFilterPropertyNotify
#5 XFilterEvent

Comment 9 Egbert Eich 2005-07-15 20:14:37 UTC
Paul Anderson, Alan Coopersmith and I discussed this problem extensively. We
agreed that the right solution would probably be the following simple patch:

--- FilterEv.c  20 Dec 2001 12:37:37 -0000      1.1.1.4
+++ FilterEv.c  15 Jul 2005 10:13:57 -0000
@@ -88,9 +88,9 @@
        if (win == p->window) {
            if ((mask & p->event_mask) ||
                (ev->type >= p->start_type && ev->type <= p->end_type)) {
+               UnlockDisplay(ev->xany.display);
                ret = (*(p->filter))(ev->xany.display, p->window, ev,
                                      p->client_data);
-               UnlockDisplay(ev->xany.display);
                return(ret);
            }
        }

Which simply moves the unlock of the display lock before the call to the filter
function.
When doing this patch parts of the earlier patch should probably be removed.
Comment 10 Egbert Eich 2005-07-15 20:15:53 UTC
Created attachment 3090 [details] [review]
Patch fixing two bugs in libpixman for BGR images
Comment 11 Mike A. Harris 2006-03-08 07:36:56 UTC
Egbert:

Comment #7 indicates something was committed, presumeably the patch
from comment #6.  Can you confirm that either way?

Also, the patch in comment 9, which is supposed to be a better fix,
seems to be an unrelated cairo patch.  Is this bug perhaps corrupted
from a previous hard drive crash?

Can someone update it to reflect current status, and if necessary reattach
any corrupt patches?

TIA
Comment 12 Egbert Eich 2006-03-10 04:29:53 UTC
Yeah, the unrelated cairo patch went in there when the RAID went sour which
caused our bugzilla data base to get hosed.
I lost a bunch of patches this way: First my laptop harddrive died then  I
discovered that a lot of patches that I had stuck into bugzilla had disappeared
also.
I have resurrected (rewritten) the XIM patch and will attach it.
Comment 13 Egbert Eich 2006-03-10 04:32:06 UTC
Created attachment 4874 [details] [review]
better fix

This fix reverts parts of attachment #735 [details] [review] as with both patches we would have no
locking at all.
Comment 14 Mike A. Harris 2006-04-25 20:15:46 UTC
Note for those applying this fix to XFree86 4.3.0 era X builds:

The latter 2 hunks of the patch in comment #12 aren't needed, as 4.3.0
already contains that.  Reviewing the comments in this report preceding that
it would appear the code had been that way originally, then been changed in
a future release, and this new patch is reverting it back to the way it
was in 4.3.0.  The one hunk to FilterEv.c still applies clean though.

I'll update this with my results with 6.8.x later also.
Comment 15 Egbert Eich 2006-04-26 01:15:14 UTC
Mike, thanks for pointing this out.
Indeed in order for the FilterEv patch to work properly the earlier patches need
to be reversed.
There has been a change in praradigms: first it was assumed that the threat
locking needed to happen in FilertEvents() now it is assumed that any call to a
function below FilterEvents() should do the locking itself. So we need to undo
the changes that call the non-locking versions of some functions.
When I originally looked at the problem my feeling was that calling the event
filter chain within the locks was done on purpose. After some extensive
discussions with Paul Anderson (who ran into a deadlock problem on HP/UX that
was not as easy to work around as the others) and with Alan Coopersmith who
confirmed that SUN had the change to FilterEv() that we were proposing in their
code for years finally convinced that the current implementation was actually
not correct.
 
Comment 16 Etsushi Kato 2006-11-08 00:25:42 UTC
Could someone commit the patch in comment #12 for Xorg 7.2?

Recently we have got problem using XIM with wine >= 0.9.23
(http://bugs.winehq.org/show_bug.cgi?id=6547).

Comment 17 Ben Byer 2006-11-10 11:38:27 UTC
Created attachment 7730 [details]
reduced test which causes deadlock in Xim

This bug prevents SWI-Prolog/XPCE from running on OS X; a developer (Jan
Wielemaker) was kind enough to submit a reduced test case, which I've attached.
It opens a small window; if you try to type into the window, the process will
deadlock. 

My understanding is that this code snippet will not cause deadlock on all
platforms, but it does so consistently on OS X, with this backtrace:

#0  0x9002b8a8 in semaphore_wait_signal_trap ()
#1  0x900019cc in pthread_mutex_lock ()
#2  0x9ba85b9c in _XLockDisplay ()
#3  0x9ba902d8 in XkbGetUpdatedMap ()
#4  0x9ba903a8 in XkbGetMap ()
#5  0x9ba8d4f4 in _XkbLoadDpy ()
#6  0x9ba8ce08 in XkbLookupKeySym ()
#7  0x9ba8d95c in XLookupString ()
#8  0x9baba004 in _XimLocalFilter ()
#9  0x9ba8548c in XFilterEvent ()
#10 0x00002370 in main (argc=1, argv=0xbffff8c4) at xim1.c:103

Recompiling libX11 with -DXTHREAD_WARN exposes the problem:

[...]
NextEvent.c line 47: thread a000ed88 got display lock
_XReadEvents called in thread a000ed88
_XPushReader called in thread a000ed88, pushing 11003c0
XlibInt.c line 496: thread a000ed88 unlocking display
XlibInt.c line 504: thread a000ed88 got display lock
thread a000ed88 _XWaitForReadable returning
_XPopReader called in thread a000ed88, popping 11003c0
NextEvent.c line 54: thread a000ed88 unlocking display
FilterEv.c line 94: thread a000ed88 got display lock
FilterEv.c line 106: thread a000ed88 unlocking display
NextEvent.c line 47: thread a000ed88 got display lock
_XReadEvents called in thread a000ed88
_XPushReader called in thread a000ed88, pushing 11003c0
XlibInt.c line 496: thread a000ed88 unlocking display
XlibInt.c line 504: thread a000ed88 got display lock
thread a000ed88 _XWaitForReadable returning
_XPopReader called in thread a000ed88, popping 11003c0
NextEvent.c line 54: thread a000ed88 unlocking display
XNextEvent done
FilterEv.c line 94: thread a000ed88 got display lock
Xlib ERROR: GetPntMap.c line 98 thread a000ed88: locking display already locked
at FilterEv.c line 94
Comment 18 Daniel Stone 2007-02-27 01:23:55 UTC
Sorry about the phenomenal bug spam, guys.  Adding xorg-team@ to the QA contact so bugs don't get lost in future.
Comment 19 Stefan Dirsch 2007-05-29 15:30:23 UTC
Update: The last two hunks of the fix have still not been applied.
Comment 20 Jamey Sharp 2007-06-11 10:41:08 UTC
I didn't know about this bug report, so we reproduced parts of these fixes independently when Xlib/XCB highlighted the same problem. At this point we have all but the last hunk of attachment #4874 [details] [review] committed, and clearly we should undo calls to _XPutBackEvent as well.

Is there any reason why we shouldn't fully revert attachment #735 [details] [review]? Did anyone outside libX11 start using the unlocked versions of XGetWindowAttributes and XPutBackEvent, or could we safely remove them from Xlibint.h again? It's technically an incompatible API/ABI change, but if you use Xlibint.h you deserve what you get...
Comment 21 Stefan Dirsch 2007-11-04 08:14:50 UTC
Egbert?
Comment 22 Stefan Dirsch 2008-11-22 06:56:59 UTC
Another update: The last hunk of the fix has not been applied yet.
Comment 23 Stefan Dirsch 2008-11-22 08:58:38 UTC
Committed to libX11 git head.

commit 55782a0a1fe1560f1a9c0ed78bc7f2575c15abcf
Author: Stefan Dirsch <sndirsch@suse.de>
Date:   Sat Nov 22 17:53:06 2008 +0100

    Added remaining hunk of Egbert's patch to prevent XIM deadlocks (#1182).


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.