Created attachment 73011 [details] minimal test case demonstrating the problem The Qt project multiple reports of applications hanging after a while on top of Qt 5.0 / xcb, see e.g. https://bugreports.qt-project.org/browse/QTCREATORBUG-8473 . The stack trace always looked similar, e.g.: 0 pthread_cond_wait@@GLIBC_2.3.2 /lib64/libpthread.so.0 0x7f3bb14a18f4 1 _XReply xcb_io.c 595 0x7f3bb08dd8fd 2 _XSeqSyncFunction XlibInt.c 232 0x7f3bb08e0401 3 _XError XlibInt.c 1585 0x7f3bb08dfbbb 4 handle_error xcb_io.c 212 0x7f3bb08dcdf1 5 handle_response xcb_io.c 324 0x7f3bb08dce35 6 _XReply xcb_io.c 647 0x7f3bb08dd9d3 7 _XSeqSyncFunction XlibInt.c 232 0x7f3bb08e0401 8 _XError XlibInt.c 1585 0x7f3bb08dfbbb 9 handle_error xcb_io.c 212 0x7f3bb08dcdf1 10 handle_response xcb_io.c 324 0x7f3bb08dce35 11 _XReply xcb_io.c 647 0x7f3bb08dd9d3 12 _XSeqSyncFunction XlibInt.c 232 0x7f3bb08e0401 13 XCreateWindow Window.c 117 0x7f3bb08db942 ... There's a recursive call of _XError, which is not supported according to the comments in the source code - see XlibInt.c: > However, XError should NOT perform any operations (directly or indirectly) on the DISPLAY. The recursion happens because of following check in _XSeqSyncFunction: > if ((dpy->request - dpy->last_request_read) >= (65535 - BUFSIZE/SIZEOF(xReq))) { > // ... > } . That is, the number of open requests exceeds the buffer limit. Attached is a small test case that should reproducably show the issue. Stack trace from the test case: #0 _XConditionWait (cv=0x1cbea10, mutex=0x1cbeb60) at ../../src/locking.c:353 #1 0x00007febfa7392e6 in _XReply (dpy=0x1cb8070, rep=0x7fffa2ace340, extra=0, discard=1) at ../../src/xcb_io.c:595 #2 0x00007febfa73bd81 in _XSeqSyncFunction (dpy=0x1cb8070) at ../../src/XlibInt.c:232 #3 0x00007febfa73b579 in _XError (dpy=0x1cb8070, rep=<optimized out>) at ../../src/XlibInt.c:1585 #4 0x00007febfa7385d1 in handle_error (dpy=0x1cb8070, err=0x1cc4a20, in_XReply=<optimized out>) at ../../src/xcb_io.c:212 #5 0x00007febfa738615 in handle_response (dpy=0x1cb8070, response=0x1cc4a20, in_XReply=<optimized out>) at ../../src/xcb_io.c:324 #6 0x00007febfa7393c8 in _XReply (dpy=0x1cb8070, rep=0x7fffa2ace520, extra=0, discard=1) at ../../src/xcb_io.c:647 #7 0x00007febfa73bd81 in _XSeqSyncFunction (dpy=0x1cb8070) at ../../src/XlibInt.c:232 #8 0x00007febfa73739a in XCreateWindow (dpy=0x1cb8070, parent=60817410, x=0, y=0, width=10, height=10, borderWidth=0, depth=24, class=1, visual=0x0, valuemask=0, attributes=0x0) at ../../src/Window.c:117 #9 0x0000000000400b70 in main ()
Main bug report in downstream project (Qt): https://bugreports.qt-project.org/browse/QTBUG-29106
Created attachment 74990 [details] [review] Bugfix? At least this patch makes the test case working here. But, I haven't tested it further and don't know howto. (Except from installing Qt5 stuff.)
Do(In reply to comment #2) > Created attachment 74990 [details] [review] [review] > Bugfix? > > At least this patch makes the test case working here. But, I haven't tested > it further and don't know howto. (Except from installing Qt5 stuff.) Doesn't this patch make Xlib lose track of sequence numbers? After all, this GetInputFocus request is meant as a synchronization point to make sure that something close to the current sequence number is known. (However, I don't know much about Xlib internals)
It looks like commit 83e1ba59c48c79f8b0a7e7aa0b9c9cfd84fa403d introduced the problem. The commit unlocks the display when an application-defined error handler is invoked, but re-locking the display requires re-syncing the sequence numbers which requires a remote call. The commit author couldn't remember why the display lock was held while the error function was called - I guess the sequence number re-sync was the reason. Reverting the commit fixes the problem. In the meantime, an application can use this code to work around the problem - it uses a custom extension error handler to cause XError to return before it unlocks/locks the screen: typedef Bool (*WireToErrorType)(Display*, XErrorEvent*, xError*); const int NUM_HANDLERS = 256; WireToErrorType __oldHandlers[NUM_HANDLERS] = {0}; Bool __xErrorHandler( Display* display, XErrorEvent* event, xError* error ) { // Call any previous handler first in case it needs to do real work. auto code = static_cast<int>(event->error_code); if(__oldHandlers[code] != NULL) { __oldHandlers[code](display, event, error); } // Always return false so the error does not get passed to the normal // application defined handler. return False; } void applyXDeadlockFix(Display* display) { for(auto i = 0; i < NUM_HANDLERS; ++i) { XESetWireToError(display, i, &__xErrorHandler); } }
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/lib/libx11/issues/25.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.