Bug 16420

Summary: Freeze in _xcb_in_read_block during select()
Product: XCB Reporter: Bryce Harrington <bryce>
Component: LibraryAssignee: Jamey Sharp <jamey>
Status: RESOLVED NOTOURBUG QA Contact: xcb mailing list dummy <xcb>
Severity: critical    
Priority: high CC: cody-somerville, x
Version: 1.1   
Hardware: x86 (IA32)   
OS: Linux (All)   
URL: https://bugs.edge.launchpad.net/ubuntu/+source/libxcb/+bug/232364
Whiteboard:
i915 platform: i915 features:
Attachments: dbus-launch trace
strace after killing process
lsof output
fd/pid listing

Description Bryce Harrington 2008-06-18 17:32:52 UTC
Forwarding a Ubuntu bug:
https://bugs.edge.launchpad.net/ubuntu/+source/libxcb/+bug/232364

A number of Xubuntu users have been experiencing failures on startup when launching dbus-launch.  Backtraces indicate the problem always occurs during a select() call in _xcb_in_read_block.  The freezes are intermittently reproducible (i.e., restart several times and eventually it'll come up).

(gdb) bt
#0 0xb8002424 in __kernel_vsyscall ()
#1 0xb7e8484d in select () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7da309a in _xcb_in_read_block (c=0x80579a8, buf=0x8057040, len=8)
    at xcb_in.c:248
#3 0xb7da2343 in xcb_connect_to_fd (fd=13, auth_info=0xbff1cdf0)
    at xcb_conn.c:133
#4 0xb7da4a51 in xcb_connect (displayname=0x0, screenp=0x0) at xcb_util.c:279
#5 0xb7f43717 in _XConnectXCB () from /usr/lib/libX11.so.6
#6 0xb7f2c029 in XOpenDisplay () from /usr/lib/libX11.so.6
#7 0x0804b3de in x11_init () at dbus-launch-x11.c:218
#8 0x0804abb2 in main (argc=5, argv=0xbff1d5a4) at dbus-launch.c:432
(gdb) quit

strace also shows that the hang is occurring on a select call:

  select(14, [13], NULL, NULL, NULL
Comment 1 Bryce Harrington 2008-06-18 17:35:41 UTC
Some logs...
Xorg.0.log:  http://launchpadlibrarian.net/15420742/Xorg.0.log
xinitrc:  http://launchpadlibrarian.net/15420770/xinitrc
lsof:  http://launchpadlibrarian.net/15315169/lsof

This patch was attempted as a test, but found to make no difference:
http://launchpadlibrarian.net/14669590/xcb_in.diff
Comment 2 Cody A.W. Somerville 2008-06-18 17:56:24 UTC
Hi,

 I'm the Xubuntu Team Lead. Please let me know if I can do anything to assist in fixing/testing this bug.

Cheers,
Comment 3 Bryce Harrington 2008-06-20 15:00:13 UTC
Created attachment 17263 [details]
dbus-launch trace

cody-somerville@mercurial:~$  cat /usr/bin/dbus-launch
#!/bin/sh

exec /usr/bin/strace /usr/bin/dbus-launch.real "$@" 2> /tmp/dbus-launch.out
Comment 4 Bryce Harrington 2008-06-20 15:02:45 UTC
Created attachment 17264 [details]
strace after killing process
Comment 5 Bryce Harrington 2008-06-20 15:03:13 UTC
Created attachment 17265 [details]
lsof output
Comment 6 Bryce Harrington 2008-06-20 15:03:52 UTC
Created attachment 17266 [details]
fd/pid listing
Comment 7 Bryce Harrington 2008-06-20 15:10:08 UTC
From the postkill:

[pid  7877] read(20, 0x8056f3c, 4096)   = -1 EAGAIN (Resource temporarily unavailable)
[pid  7877] ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfd17a18) = -1 ENOTTY (Inappropriate ioctl for device)
[pid  7877] select(21, [20], NULL, [20], NULL) = 1 (in [20])
[pid  7877] read(20, "", 4096)          = 0

Comment 8 Jamey Sharp 2009-10-09 11:10:53 UTC
I can't actually believe this was ever an XCB bug. The strace output posted on the launchpad bug shows that it was waiting for the connection setup response from the X server, and if that never arrived, it's hard to imagine how it could be XCB's fault.

I could believe, though, that two instances of dbus-launch somehow deadlocked against each other. Perhaps one calls XGrabServer, then waits for the other one to finish connecting to the X server?

The fix that Ubuntu seems to have settled on, if I'm reading the launchpad bug correctly, is to ensure that there aren't two dbus-launch instances racing each other. That seems plausible to me.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.