Bug 20708 - xcb-enabled libx11 deadlocks with multithreading apps.
xcb-enabled libx11 deadlocks with multithreading apps.
Status: RESOLVED DUPLICATE of bug 30450
Product: xorg
Classification: Unclassified
Component: Lib/Xlib
git
Other All
: high major
Assigned To: Xorg Project Team
Xorg Project Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-17 06:38 UTC by Thomas Hellström
Modified: 2011-10-03 21:28 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Test program. Link with -lpthread -lX11 (4.61 KB, text/x-csrc)
2009-03-17 06:38 UTC, Thomas Hellström
no flags Details
Sample code to trigger the deadlock (1.71 KB, text/x-csrc)
2011-03-13 05:51 UTC, Reinhard Nißl
no flags Details
Proposed patch which fixes Reinhard Nißl's test case for me (1.58 KB, patch)
2011-03-13 07:00 UTC, Uli Schlachter
no flags Details | Splinter Review
Sample code to trigger the stall (3.14 KB, text/x-csrc)
2011-03-19 14:40 UTC, Reinhard Nißl
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Hellström 2009-03-17 06:38:45 UTC
Created attachment 23963 [details]
Test program. Link with -lpthread -lX11

The attached application (which is typical for openGL multithread apps), but with the openGL stuff stripped deadlocks immediately when started with an XCB-enabled xlib, but works fine with a non-XCB-enabled xlib. 

Some of the multithreaded mesa openGL demos (glthreads, sharedtex_mt) 
exhibit the same problems.
Comment 1 Julien Danjou 2009-03-17 06:42:26 UTC
Looks like http://lists.freedesktop.org/archives/xorg/2009-February/043809.html
Comment 2 Reinhard Nißl 2011-03-13 05:51:38 UTC
Created attachment 44413 [details]
Sample code to trigger the deadlock
Comment 3 Reinhard Nißl 2011-03-13 05:54:29 UTC
Hi,

after upgrading to openSUSE 11.4 I see similar deadlocks when using xine-ui. I've attached a much simpler sample programme which triggers the deadlock immediately on my system.

If the programme deadlocks, you'll most likely only see .T on the console. If it doesn't deadlock, you'll get .T.T.T and so on. Depending on the number of CPU cores, the output may be ...TTT or something like that too.

Bye.
Comment 4 Uli Schlachter 2011-03-13 07:00:23 UTC
Created attachment 44414 [details] [review]
Proposed patch which fixes Reinhard Nißl's test case for me

I haven't tested this against Thomas Hellström's test case, but since that one doesn't contain any calls to XLockDisplay(), I think that's another problem (Sorry for stealing this bug report).

Also, sorry for the weird commit message, but I couldn't come up with something better. If anyone has a good idea, feel free to replace the commit message.
Comment 5 Jamey Sharp 2011-03-14 17:32:51 UTC
I hadn't noticed Uli's patch yet, but I committed a patch with the same effect today. Hopefully libX11 master now works for Thomas and Reinhard. Please re-open otherwise. (Sorry for not giving you credit in the commit message, Uli; I would have if I'd seen this first.)
Comment 6 Reinhard Nißl 2011-03-19 14:40:36 UTC
Created attachment 44621 [details]
Sample code to trigger the stall
Comment 7 Reinhard Nißl 2011-03-19 15:00:19 UTC
Hi,

I'm not sure whether the fix is correct as it triggers a new symptom in xine-ui and the new sample xcb_test2.c.

The issue is now, that some thread stalls, i. e. it waits "indefinitely" for a reply, like in this backtrace Thread 2:



(gdb) thread apply all bt

Thread 4 (Thread 0x7fd74409e700 (LWP 23059)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fd744c82970 in _XDisplayLockWait (dpy=0x603070) at locking.c:447
#2  0x00007fd744c82fd9 in _XLockDisplay (dpy=0x603070) at locking.c:462
#3  0x00007fd744c8690f in XPending (dpy=0x603070) at Pending.c:51
#4  0x0000000000400f28 in vdpau_thread3 (param=0x603070) at xcb_test2.c:41
#5  0x00007fd744a37a3f in start_thread (arg=0x7fd74409e700) at pthread_create.c:297
#6  0x00007fd74479767d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7fd74389d700 (LWP 23060)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fd744c82970 in _XDisplayLockWait (dpy=0x603070) at locking.c:447
#2  0x00007fd744c82fd9 in _XLockDisplay (dpy=0x603070) at locking.c:462
#3  0x00007fd744c934e2 in XTranslateCoordinates (dpy=0x603070, src_win=33554433, dest_win=33554433, src_x=50, src_y=50, dst_x=0x7fd74389cebc, dst_y=0x7fd74389ceb8, child=0x7fd74389ceb0) at TrCoords.c:45
#4  0x0000000000401036 in vdpau_thread4 (param=0x603070) at xcb_test2.c:65
#5  0x00007fd744a37a3f in start_thread (arg=0x7fd74389d700) at pthread_create.c:297
#6  0x00007fd74479767d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7fd74309c700 (LWP 23061)):
#0  0x00007fd74478e503 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007fd7444b0c2a in _xcb_conn_wait (c=0x604500, cond=<value optimized out>, vector=0x0, count=0x0) at xcb_conn.c:313
#2  0x00007fd7444b22df in xcb_wait_for_reply (c=0x604500, request=6552, e=0x7fd74309bdd8) at xcb_in.c:378
#3  0x00007fd744c962bd in _XReply (dpy=0x603070, rep=0x7fd74309be20, extra=0, discard=1) at xcb_io.c:541
#4  0x00007fd744c9353d in XTranslateCoordinates (dpy=0x603070, src_win=33554434, dest_win=33554434, src_x=50, src_y=50, dst_x=0x7fd74309bebc, dst_y=0x7fd74309beb8, child=0x7fd74309beb0) at TrCoords.c:51
#5  0x0000000000401145 in vdpau_thread5 (param=0x603070) at xcb_test2.c:90
#6  0x00007fd744a37a3f in start_thread (arg=0x7fd74309c700) at pthread_create.c:297
#7  0x00007fd74479767d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7fd74517e700 (LWP 23058)):
#0  0x00007fd74478a0ad in read () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fd744734138 in _IO_new_file_underflow (fp=0x7fd744a2b6a0) at fileops.c:598
#2  0x00007fd7447351ae in _IO_default_uflow (fp=0x7fd744a2b6a0) at genops.c:440
#3  0x00007fd7447304fd in getchar () at getchar.c:38
#4  0x00000000004012bf in main () at xcb_test2.c:128
(gdb) 



The stall always happens in one of the threads.

The threads do not deadlock and the programme can go on if events appear. In xine-ui I can for example move the mouse over the window or press a key. For the sample programme I've found that switching to a different virtual desktop and back gets it going again. I must admit that I'm hardly used to X coding, so I think the sample can easily be adjusted so that a mouse move or something like that recovers it.

The output of the sample programme on my Core i7 965 (4 cores / 8 virtual CPUs) looks like that (stock openSUSE 11.4 + https://build.opensuse.org/package/binary?arch=x86_64&filename=xorg-x11-libX11-7.6-53.1.x86_64.rpm&package=xorg-x11-libX11&project=X11%3AXOrg&repository=openSUSE_11.4 as mentioned here: https://bugzilla.novell.com/show_bug.cgi?id=679177#c1):

LTeLeETLLLeTLeETLTLeTLeTLELTeLeTLETLeLLLLeTLETLLTeLeTLELTLeTLeETLeTLLeLETLeeETLLLeeTLETLeeTLELTLeeETLTeLeETeLeTLETLeeTLTLTLLETLLTeLeTLETLTLeTLeETLeLeTLETLeETLeeTLeETLTeLeETLeLeTLELTLeTLeETLeLeTLETLLTeLeTLETLLTeLeTLELTLeLLeLETLeETLeLeTLeETLeTLeTLELTLeLeTLETLeETLeTLeeTLELTeLeTLETLLeLLeTLETLTeLeETLeETLeTLeTLeETLeTLeETLeLeTLELTLeTLeETLTLeLeTLELTTLeTLeTLETLLTLeTLeTLTLETLLLLeLLLeTLETLLTeLLLLLLLLeTLLETLLTeLLeTLLETLLTTe

The output stopps when then stall happens and continues when it recovers.

Bye.
Comment 8 Sven Gothel 2011-06-28 00:11:12 UTC
We have experienced the same erroneous multithreading behavior in conjunction w/ OpenGL and Xlib,
<https://jogamp.org/bugzilla/show_bug.cgi?id=502>.

I have tried out the latest git for libX11 and libxcb as well, no remedy.

Using the old libX11 1.3 and libxcb 1.5 works well though.

IMHO bug 30450 is a duplicate of this one.
Comment 9 Jeremy Huddleston 2011-10-03 20:53:02 UTC

*** This bug has been marked as a duplicate of bug 30450 ***
Comment 10 Nick Bowler 2011-10-03 21:28:01 UTC
Please move the CC list when you close a bug that's still relevant as a
duplicate of another so that the rest of us don't get cut out of the loop.

Thanks.