Copy from mailing list post: I was tracking some nasty bug in cinelerra.org software. cinelerra GUI hangt when running locally. sshing to localhost and running cinelerra via x11 forwarding - problem disappears. Tracking reveals that this nasty bug is visible only if libX11 is compiled with xcb enabled. Rebuilding libX11 with xcb disabled and the problem disappears. Problem is described here: http://bugs.cinelerra.org/show_bug.cgi?id=383 the interesting gdb backtrace is here: http://rafb.net/p/T6Hh8b60.html I'm using PLD/Linux with latest software possible (whole xorg is latest releases). The other reporter (see cinelerra bugzilla) was using Arch/Linux.
Created attachment 8283 [details] copy of http://rafb.net/p/T6Hh8b60.html Copy of backtrace from pastebin like site.
No longer happens for me. xorg-xserver-server-1.4-5.i686 xorg-driver-video-ati-6.7.195-2.i686 Mesa-libGL-7.0.1-2.i686 libxcb-1.0-4.i686 xorg-lib-libX11-1.1.3-1.i686 cinelerra is in the same version as before.
Per the original poster's report, I'm resolving this bug as not reproducible (any more). Thanks for the followup! Calling it "INVALID" seems harsh and wrong, but that's the closest resolution I can find in Bugzilla.
This is very definitely still a problem for multiple people on many different platforms. I personally have been trying to track down a bug in Houdini (www.sidefx.com) which causes hangs and deadlocks, and it looks to be caused by this xlib-xcb hang. Simply upgrading my xlib from the regular version to the xcb version started the hanging. At the URL I posted there is a bunch more analysis about this bug.
Some further comments: Not using threads with X causes this problem to disappear (this is probably unsurprising). Houdini, the 3D software package available for free download at www.sidefx.com, triggers this bug. You can vary Houdini's use of threads at runtime by using the environment variable HOUDINI_ENABLE_LINUX_THREADED_UI. I am able to help with all manner of debugging, but I just don't know enough about libx11 and XCB to do this all myself.
Jamey and I just announced a set of changes to XCB and Xlib/XCB which, among other things, should address all the outstanding synchronization problems that we know of. Could anyone experiencing this bug please build XCB and Xlib with the patches found at http://lists.freedesktop.org/archives/xcb/2008-March/003347.html and retest?
Are the patches published here: http://lists.freedesktop.org/archives/xcb/2008-March/003347.html already applied?
(In reply to comment #6) > Jamey and I just announced a set of changes to XCB and Xlib/XCB which, among > other things, should address all the outstanding synchronization problems that > we know of. Could anyone experiencing this bug please build XCB and Xlib with > the patches found at > http://lists.freedesktop.org/archives/xcb/2008-March/003347.html and retest? We still hit this even with the latest xcb and xlib releases (which include the patches as far as I can tell). The only fix I've found is disabling xcb support in xlib. If there are any other patches available, I can test them out, no problem. Let me know if you need any more information. We primarily hit this with Shake (commericial 2D compositing package) and Houdini as mentioned above.
(In reply to comment #8) > (In reply to comment #6) > > Jamey and I just announced a set of changes to XCB and Xlib/XCB which, among > > other things, should address all the outstanding synchronization problems that > > we know of. > > We still hit this even with the latest xcb and xlib releases (which include the > patches as far as I can tell). The only fix I've found is disabling xcb support > in xlib. If there are any other patches available, I can test them out, no > problem. Let me know if you need any more information. > > We primarily hit this with Shake (commericial 2D compositing package) and > Houdini as mentioned above. Although you believe you had the socket-handoff version when you last tested, I can't help hoping the socket-handoff work fixed the problem for you, because I don't know of any hangs we haven't fixed already. So could you confirm that you can still reproduce the bug? I tried to download Houdini to test it myself, but the 200MB download, and that all closed-source, is just too much. So if you can still reproduce the problem, would you start with "thread apply all bt full" in gdb, to show us what was going on when it hung? Preferably, have debug symbols for libX11 and libxcb installed first.
Since I last asked for testing (and nobody responded), we've released a much-improved version of libX11 with regards to multi-threaded applications. There's one known issue (#30450), but I don't know whether it affects this particular application. Please re-test with Xlib 1.3.4/1.4.0 or newer.
I'm using latest releases of xserver, libs etc from freedesktop and cinelerra works fine.
Well, let's try closing this bug again and see if anybody who can reproduce it with current library versions shows up again. Thanks Arkadiusz. If you do want to re-open this bug: Please make sure you attach the output of "thread apply all bt" from gdb, like Arkadiusz did. There's some general documentation on doing that at http://wiki.debian.org/HowToGetABacktrace .
(In reply to comment #10) > Since I last asked for testing (and nobody responded), we've released a > much-improved version of libX11 with regards to multi-threaded applications. > There's one known issue (#30450), but I don't know whether it affects this > particular application. Please re-test with Xlib 1.3.4/1.4.0 or newer. How/Where can I obtain this improved version of the libX11. We are using Suse Linux Enterprise Server 11 SP1. Any help/hint welcome on how to get this integrated.
Tried it out today under Suse Linux Enterprise Server (SLES) 11 SP1. Installed xcb-1.7 X11-1.4.4 which needed xcb-proto-1.6 xproto-7.0.22 Same effect as with plain SLES11 SP1. The glthreads example from: http://www.opensource.apple.com/source/X11server/X11server-85/mesa/Mesa-7.2/progs/xdemos/glthreads.c still locks up when not using the -l option.
This isn't an xcb bug, but rather a bug in libX11. gdb says: (gdb) thread apply all bt Thread 2 (Thread 0x7ffff3bf2700 (LWP 32070)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007ffff7ae6795 in _XReply (dpy=0x605070, rep=0x7ffff3bf1c90, extra=0, discard=0) at xcb_io.c:623 #2 0x00007ffff788a712 in ?? () from /usr/lib/x86_64-linux-gnu/libGL.so.1 #3 0x00007ffff788866c in ?? () from /usr/lib/x86_64-linux-gnu/libGL.so.1 #4 0x00007ffff4ef8987 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so #5 0x00007ffff4ef9705 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so #6 0x00007ffff4ef7c95 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so #7 0x0000000000401b11 in draw_loop (wt=0x6033c0) at glthreads.c:186 #8 0x0000000000402198 in thread_function (p=0x6033c0) at glthreads.c:384 #9 0x00007ffff7633b40 in start_thread (arg=<optimized out>) at pthread_create.c:304 #10 0x00007ffff737e36d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #11 0x0000000000000000 in ?? () Thread 1 (Thread 0x7ffff7fd7720 (LWP 32067)): #0 0x00007ffff7373723 in *__GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007ffff7094fa2 in _xcb_conn_wait (c=0x606780, cond=<optimized out>, vector=0x0, count=0x0) at xcb_conn.c:369 #2 0x00007ffff70966ef in xcb_wait_for_event (c=0x606780) at xcb_in.c:522 #3 0x00007ffff7ae6328 in _XReadEvents (dpy=0x605070) at xcb_io.c:400 #4 0x00007ffff7ad5688 in XNextEvent (dpy=0x605070, event=0x7fffffffe0c0) at NextEvent.c:50 #5 0x0000000000401cae in event_loop (dpy=0x605070) at glthreads.c:238 #6 0x000000000040278d in main (argc=3, argv=0x7fffffffe2c8) at glthreads.c:527 Now if we look at xcb_io.c, shortly before line 623: /* FIXME: That event might be after this reply, * and might never even come--or there might be * multiple threads trying to get events. */ I think we are hitting the "might never come"-case and I also think that there are other bug reports which hit that case, so this might be a dupe now. "Patch" which "fixes" this issue for me is attached, but I guess there is a reason why Xlib tries to do what it is doing, so just removing this code won't help. (What is that reason? How badly would stuff break? Which stuff would break? Could that other stuff be fixed so that this code can be removed?) BTW this hangs in libxcb if I use the libxcb version from debian instead of the current git version, so there really was a bug in xcb, but that one definitely was already fixed.
Created attachment 53778 [details] [review] Just removed the code which caused problems and everything started working
(In reply to comment #16) > Created attachment 53778 [details] [review] [review] > Just removed the code which caused problems and everything started working Didn't work for me. The X-server didn't even show up anymore, while it did before removing the suggested lines. As said before I'm trying this under SLES 11 SP1, maybe this is the reason why it does not work for me.
Re-did my test today with latest and greatest openSuse 12.1 and the libX11-1.4.4 with the same bad results as state in my previous reply
Fixed with patch for bug 40372 that hasn't been relased in an official version of libxcb yet.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.