Hi! After upgrading libxcb from 1.7 to 1.8 all X programs started to hang in a busy loop in libxcb. Iǘe created a backtrace demonstrating the situation at [0]. git bisect blames [2]. This has also be reported to the debian bugtracker at [1] [0] http://people.debian.org/~christoph/xrandr.gdb [1] http://bugs.debian.org/659104 [2] 20da10490f8dac75ec9fe1df28cb9e862e171be5
Many thanks for upstreaming the bug.
Hi! I the following patch fixes the problem for me (as well does reverting from recv to read. I'll see if I can work out why it fails here without the patch with debian-bsd@lists.debian.org folks. recv seems to fail (loop) with "Resource temporarily unavailable" (ret=-1, errno=35) according to ktrace http://people.debian.org/~christoph/libxcb.diff
> --- Comment #2 from Christoph Egger <christoph@debian.org> 2012-02-08 11:30:26 PST --- > Hi! > > I the following patch fixes the problem for me (as well does reverting from > recv to read. I'll see if I can work out why it fails here without the patch > with debian-bsd@lists.debian.org folks. > > recv seems to fail (loop) with "Resource temporarily unavailable" (ret=-1, > errno=35) according to ktrace > > http://people.debian.org/~christoph/libxcb.diff > The MSG_WAITALL addition looks very wrong to me. The commit message doesn't explain it, and it's replacing a plain read() which should be equivalent to a recv() with flags == 0, *not* MSG_WAITALL.
I looked up the description of MSG_WAITALL in the recv manpage: MSG_WAITALL (since Linux 2.2) This flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or disconnect occurs, or the next data to be received is of a different type than that returned. This seems entirely wrong for these calls to recv. Passing MSG_WAITALL suggests that we don't want the OS to return until it has filled the entire input buffer, whereas we clearly want the OS to return as soon as it has an X response for us. That would certainly explain the hangs. By all means, remove the MSG_WAITALL flag.
(In reply to comment #4) > I looked up the description of MSG_WAITALL in the recv manpage: [...] > This seems entirely wrong for these calls to recv. Passing MSG_WAITALL > suggests that we don't want the OS to return until it has filled the entire > input buffer, whereas we clearly want the OS to return as soon as it has an X > response for us. That's only true for _xcb_in_read() (which I totally missed, whoops). read_block() would loop until it has read as much data as was asked for anyway. > That would certainly explain the hangs. Well, it doesn't hang, it busy-loops with recv() always returning EAGAIN and this behavior isn't explained yet.
(In reply to comment #5) > (In reply to comment #4) > > I looked up the description of MSG_WAITALL in the recv manpage: > [...] > > This seems entirely wrong for these calls to recv. Passing MSG_WAITALL > > suggests that we don't want the OS to return until it has filled the entire > > input buffer, whereas we clearly want the OS to return as soon as it has an X > > response for us. > > That's only true for _xcb_in_read() (which I totally missed, whoops). > read_block() would loop until it has read as much data as was asked for anyway. Fair enough. Does dropping MSG_WAITALL from _xcb_in_read solve the problem, or do programs still hang in read_block()? > > That would certainly explain the hangs. > > Well, it doesn't hang, it busy-loops with recv() always returning EAGAIN and > this behavior isn't explained yet. That one seems fairly straightforward as well. We put sockets into non-blocking mode, so any read/recv operation that would block will instead return EAGAIN. MSG_WAITALL tells recv to block until it fills the buffer, so it would make sense that if recv couldn't immediately fill the buffer it would return EAGAIN. It also wouldn't surprise me if Linux effectively ignores MSG_WAITALL on non-blocking sockets (since it doesn't seem like an overly sensible combination), which would explain why this hang didn't occur on Linux.
(In reply to comment #6) > Fair enough. Does dropping MSG_WAITALL from _xcb_in_read solve the problem, or > do programs still hang in read_block()? Removing the one in _xcb_in_read is absolutely enough to solve the looping for me
(In reply to comment #5) > (In reply to comment #4) > > I looked up the description of MSG_WAITALL in the recv manpage: > [...] > > This seems entirely wrong for these calls to recv. Passing MSG_WAITALL > > suggests that we don't want the OS to return until it has filled the entire > > input buffer, whereas we clearly want the OS to return as soon as it has an X > > response for us. > > That's only true for _xcb_in_read() (which I totally missed, whoops). > read_block() would loop until it has read as much data as was asked for anyway. > well yes, but as you say read_block() does the looping itself anyway, so the MSG_WAITALL, while not as broken as in _xcb_in_read, is also useless.
(In reply to comment #8) > (In reply to comment #5) > > (In reply to comment #4) > > > I looked up the description of MSG_WAITALL in the recv manpage: > > [...] > > > This seems entirely wrong for these calls to recv. Passing MSG_WAITALL > > > suggests that we don't want the OS to return until it has filled the entire > > > input buffer, whereas we clearly want the OS to return as soon as it has an X > > > response for us. > > > > That's only true for _xcb_in_read() (which I totally missed, whoops). > > read_block() would loop until it has read as much data as was asked for anyway. > > > well yes, but as you say read_block() does the looping itself anyway, so the > MSG_WAITALL, while not as broken as in _xcb_in_read, is also useless. It doesn't do any harm, though, and it allows the OS to do the waiting for us rather than looping every time we get a bit of data. Unless it causes problems, I'd suggest leaving it.
FWIW, fixed in debian with http://anonscm.debian.org/gitweb/?p=collab-maint/libxcb.git;a=commitdiff;h=2b5bc1d3299510e10a1733e5a3b326232c774b75
The hang also affects Gentoo Prefix on OS X. The patch from christoph/jcristau will make the problem go away.
I just noticed that this bug is still open. However, I'm quite sure that this was fixed in libxcb 1.8.1 with commit 236f914ea7205f5f74e87fcc1b06d87bd0789a7a.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.