Bug 59361 - Deadlock in _XReply when recursing through _XSeqSyncFunction
Summary: Deadlock in _XReply when recursing through _XSeqSyncFunction
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Lib/Xlib (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-14 14:30 UTC by Kai Koehne
Modified: 2018-08-10 20:10 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
minimal test case demonstrating the problem (1.15 KB, text/plain)
2013-01-14 14:30 UTC, Kai Koehne
no flags Details
Bugfix? (918 bytes, patch)
2013-02-17 16:19 UTC, Daniel Martin
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Kai Koehne 2013-01-14 14:30:58 UTC
Created attachment 73011 [details]
minimal test case demonstrating the problem

The Qt project multiple reports of applications hanging after a while on top of Qt 5.0 / xcb, see e.g. https://bugreports.qt-project.org/browse/QTCREATORBUG-8473 . The stack trace always looked similar, e.g.:

0	pthread_cond_wait@@GLIBC_2.3.2	/lib64/libpthread.so.0		0x7f3bb14a18f4	
1	_XReply	xcb_io.c	595	0x7f3bb08dd8fd	
2	_XSeqSyncFunction	XlibInt.c	232	0x7f3bb08e0401	
3	_XError	XlibInt.c	1585	0x7f3bb08dfbbb	
4	handle_error	xcb_io.c	212	0x7f3bb08dcdf1	
5	handle_response	xcb_io.c	324	0x7f3bb08dce35	
6	_XReply	xcb_io.c	647	0x7f3bb08dd9d3	
7	_XSeqSyncFunction	XlibInt.c	232	0x7f3bb08e0401	
8	_XError	XlibInt.c	1585	0x7f3bb08dfbbb	
9	handle_error	xcb_io.c	212	0x7f3bb08dcdf1	
10	handle_response	xcb_io.c	324	0x7f3bb08dce35	
11	_XReply	xcb_io.c	647	0x7f3bb08dd9d3	
12	_XSeqSyncFunction	XlibInt.c	232	0x7f3bb08e0401	
13	XCreateWindow	Window.c	117	0x7f3bb08db942
...

There's a recursive call of _XError, which is not supported according to the comments in the source code - see XlibInt.c:

>  However, XError should NOT perform any operations (directly or indirectly) on the DISPLAY.


The recursion happens because of following check in _XSeqSyncFunction:

>  if ((dpy->request - dpy->last_request_read) >= (65535 - BUFSIZE/SIZEOF(xReq))) {
>    // ...
>  }

. That is, the number of open requests exceeds the buffer limit.

Attached is a small test case that should reproducably show the issue. Stack trace from the test case:

#0  _XConditionWait (cv=0x1cbea10, mutex=0x1cbeb60) at ../../src/locking.c:353
#1  0x00007febfa7392e6 in _XReply (dpy=0x1cb8070, rep=0x7fffa2ace340, extra=0, discard=1) at ../../src/xcb_io.c:595
#2  0x00007febfa73bd81 in _XSeqSyncFunction (dpy=0x1cb8070) at ../../src/XlibInt.c:232
#3  0x00007febfa73b579 in _XError (dpy=0x1cb8070, rep=<optimized out>) at ../../src/XlibInt.c:1585
#4  0x00007febfa7385d1 in handle_error (dpy=0x1cb8070, err=0x1cc4a20, in_XReply=<optimized out>) at ../../src/xcb_io.c:212
#5  0x00007febfa738615 in handle_response (dpy=0x1cb8070, response=0x1cc4a20, in_XReply=<optimized out>) at ../../src/xcb_io.c:324
#6  0x00007febfa7393c8 in _XReply (dpy=0x1cb8070, rep=0x7fffa2ace520, extra=0, discard=1) at ../../src/xcb_io.c:647
#7  0x00007febfa73bd81 in _XSeqSyncFunction (dpy=0x1cb8070) at ../../src/XlibInt.c:232
#8  0x00007febfa73739a in XCreateWindow (dpy=0x1cb8070, parent=60817410, x=0, y=0, width=10, height=10, borderWidth=0, depth=24, class=1, visual=0x0, valuemask=0, 
    attributes=0x0) at ../../src/Window.c:117
#9  0x0000000000400b70 in main ()
Comment 1 Kai Koehne 2013-01-14 14:33:56 UTC
Main bug report in downstream project (Qt): https://bugreports.qt-project.org/browse/QTBUG-29106
Comment 2 Daniel Martin 2013-02-17 16:19:00 UTC
Created attachment 74990 [details] [review]
Bugfix?

At least this patch makes the test case working here. But, I haven't tested it further and don't know howto. (Except from installing Qt5 stuff.)
Comment 3 Uli Schlachter 2013-02-17 17:10:39 UTC
Do(In reply to comment #2)
> Created attachment 74990 [details] [review] [review]
> Bugfix?
> 
> At least this patch makes the test case working here. But, I haven't tested
> it further and don't know howto. (Except from installing Qt5 stuff.)

Doesn't this patch make Xlib lose track of sequence numbers? After all, this GetInputFocus request is meant as a synchronization point to make sure that something close to the current sequence number is known. (However, I don't know much about Xlib internals)
Comment 4 Peter Horrocks 2014-11-26 22:22:00 UTC
It looks like commit 83e1ba59c48c79f8b0a7e7aa0b9c9cfd84fa403d introduced the problem.

The commit unlocks the display when an application-defined error handler is invoked, but re-locking the display requires re-syncing the sequence numbers which requires a remote call.

The commit author couldn't remember why the display lock was held while the error function was called - I guess the sequence number re-sync was the reason.

Reverting the commit fixes the problem.

In the meantime, an application can use this code to work around the problem - it uses a custom extension error handler to cause XError to return before it unlocks/locks the screen:

typedef Bool (*WireToErrorType)(Display*, XErrorEvent*, xError*);

const int NUM_HANDLERS = 256;
WireToErrorType __oldHandlers[NUM_HANDLERS] = {0};

Bool __xErrorHandler(
        Display*     display,
        XErrorEvent* event,
        xError*      error
        )
{
    // Call any previous handler first in case it needs to do real work.
    auto code = static_cast<int>(event->error_code);
    if(__oldHandlers[code] != NULL)
    {
        __oldHandlers[code](display, event, error);
    }

    // Always return false so the error does not get passed to the normal
    // application defined handler.
    return False;
}

void applyXDeadlockFix(Display* display)
{
    for(auto i = 0; i < NUM_HANDLERS; ++i)
    {
        XESetWireToError(display, i, &__xErrorHandler);
    }
}
Comment 5 GitLab Migration User 2018-08-10 20:10:24 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/lib/libx11/issues/25.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.