[copy of a mail to the xcb list] Hi, The following hang was discovered by Darren Salt: Thread 7 (process 7297): #0 0x00002ad88209f756 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00002ad8847b699e in _xcb_conn_wait (c=0xf195f0, cond=0x43805df4, vector=0x0, count=0xffffffffffffffff) at xcb_conn.c:296 #2 0x00002ad8847b8405 in xcb_wait_for_reply (c=0xf195f0, request=623, e=0x43805e88) at xcb_in.c:344 #3 0x00002ad881540e7b in _XReply (dpy=0xf0b600, rep=0x43805ed0, extra=0, discard=1) at ../../src/xcb_io.c:364 #4 0x00002ad8815358da in XSync (dpy=0xf0b600, discard=0) at ../../src/Sync.c:48 <snip> Thread 16 (process 7285): #0 0x00002ad88209f756 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00002ad8847b684b in _xcb_lock_io (c=0xf195f0) at xcb_conn.c:279 #2 0x00002ad8847b69ac in _xcb_conn_wait (c=0xf195f0, cond=<value optimized out>, vector=0x0, count=0x0) at xcb_conn.c:320 #3 0x00002ad8847b8405 in xcb_wait_for_reply (c=0xf195f0, request=621, e=0x7fff2ac5b638) at xcb_in.c:344 #4 0x00002ad881540e7b in _XReply (dpy=0xf0b600, rep=0x7fff2ac5b680, extra=0, discard=1) at ../../src/xcb_io.c:364 #5 0x00002ad881536e84 in XTranslateCoordinates (dpy=0xf0b600, src_win=39845891, dest_win=77, src_x=0, src_y=0, dst_x=0x7fff2ac5b854, dst_y=0x7fff2ac5b850, child=0x7fff2ac5b848) at ../../src/TrCoords.c:53 <snip> Concretly the situation looks like this: 288 int _xcb_conn_wait(xcb_connection_t *c, pthread_cond_t *cond, struct iovec **vector, int *count) 289 { 290 int ret; 291 fd_set rfds, wfds; 292 293 /* If the thing I should be doing is already being done, wait for it. */ 294 if(count ? c->out.writing : c->in.reading) 295 { 296 pthread_cond_wait(cond, &c->iolock); // <--- Thread 16 297 return 1; 298 } 299 300 FD_ZERO(&rfds); 301 FD_SET(c->fd, &rfds); 302 ++c->in.reading; 303 304 FD_ZERO(&wfds); 305 if(count) 306 { 307 FD_SET(c->fd, &wfds); 308 ++c->out.writing; 309 } 310 311 _xcb_unlock_io(c); 312 do { 313 ret = select(c->fd + 1, &rfds, &wfds, 0, 0); 314 } while (ret == -1 && errno == EINTR); 315 if (ret < 0) 316 { 317 _xcb_conn_shutdown(c); 318 ret = 0; 319 } 320 _xcb_lock_io(c); // <--- Thread 7 What happens: Thread 7 is running normally (c->xlib.lock == 0) and waits a bit at line 313. Meanwhile thread 16 is scheduled (c->xlib.lock == 1) and waits at line 296 for thread 7 to complete its operation. When thread 7 reaches line 320 it can't take the lock because c->xlib.lock == 1 and c->xlib.thread != pthread_self() ... 272 void _xcb_lock_io(xcb_connection_t *c) 273 { 274 pthread_mutex_lock(&c->iolock); 275 while(c->xlib.lock) 276 { 277 if(pthread_equal(c->xlib.thread, pthread_self())) 278 break; 279 pthread_cond_wait(&c->xlib.cond, &c->iolock); 280 } 281 } So the next question was why this can happen at all. Let's take a look at _XReply: <snip> 355 /* Internals of UnlockDisplay done by hand here, so that we can 356 insert_pending_request *after* we _XPutXCBBuffer, but before we 357 unlock the display. */ 358 _XPutXCBBuffer(dpy); 359 current = insert_pending_request(dpy); 360 if(!dpy->lock || dpy->lock->locking_level == 0) 361 xcb_xlib_unlock(dpy->xcb->connection); // <--- XXX 362 if(dpy->xcb->lock_fns.unlock_display) 363 dpy->xcb->lock_fns.unlock_display(dpy); 364 reply = xcb_wait_for_reply(c, current->sequence, &error); 365 LockDisplay(dpy); Line 361 had to be executed in thread 7 (impossible to check it, but seems to be the only explanation), so c->xlib.lock became 0 before xcb_wait_for_reply was called. However Thread 16 had dpy->lock->locking_level == 1 (this time verified with gdb and a coredump) so the "lock" wasn't released and caused a part of the trouble. I have no idea were the actual bug is, but I see something like three possible conditions which would avoid this: - xlib.lock has to be released before calling xcb_wait_for_reply - xlib.lock must not be released before calling xcb_wait_for_reply - xcb has to deal with that situation internally Hopefully you can follow my thoughts and have some nice ideas to fix this =) Christoph PS: Hints to reproduce the issue (note that I didn't try personally): libx11-6 1.1.3-1, libxcb* 1.0-3 (Debian) gxine dev, xine-lib 1.2 dev; gxine built --without-xcb vdr 1.4.5, vdr-xine 0.7.9 dev (local builds) Command: ./src/gxine vdr://tmp/vdr-xine/stream#demux:mpeg_pes vdr tuned to BBC News 24 (which is 16:9) http://zap.tartarus.org/~ds/gxine-0.5.900-dev.tar.bz2 http://zap.tartarus.org/~ds/xine-lib-1.1.90hg.tar.bz2
Christoph and I believe this is fixed in current git. See my summary on the XCB list: http://lists.freedesktop.org/archives/xcb/2007-October/003019.html
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.