Bug 45776 - all X programs hanging in libxcb on GNU/kFreeBSD
all X programs hanging in libxcb on GNU/kFreeBSD
Status: RESOLVED FIXED
Product: XCB
Classification: Unclassified
Component: Library
unspecified
x86-64 (AMD64) FreeBSD
: medium normal
Assigned To: xcb mailing list dummy
xcb mailing list dummy
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-08 06:06 UTC by Christoph Egger
Modified: 2012-07-14 23:38 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christoph Egger 2012-02-08 06:06:25 UTC
Hi!

After upgrading libxcb from 1.7 to 1.8 all X programs started to hang in a busy loop in libxcb. Iǘe created a backtrace demonstrating the situation at [0]. git bisect blames [2]. This has also be reported to the debian bugtracker at [1]


[0] http://people.debian.org/~christoph/xrandr.gdb
[1] http://bugs.debian.org/659104
[2] 20da10490f8dac75ec9fe1df28cb9e862e171be5
Comment 1 Cyril Brulebois 2012-02-08 06:13:22 UTC
Many thanks for upstreaming the bug.
Comment 2 Christoph Egger 2012-02-08 11:30:26 UTC
Hi!

I the following patch fixes the problem for me (as well does reverting from recv to read. I'll see if I can work out why it fails here without the patch with debian-bsd@lists.debian.org folks.

recv seems to fail (loop) with "Resource temporarily unavailable" (ret=-1, errno=35) according to ktrace

http://people.debian.org/~christoph/libxcb.diff
Comment 3 Julien Cristau 2012-02-08 11:43:43 UTC
> --- Comment #2 from Christoph Egger <christoph@debian.org> 2012-02-08 11:30:26 PST ---
> Hi!
> 
> I the following patch fixes the problem for me (as well does reverting from
> recv to read. I'll see if I can work out why it fails here without the patch
> with debian-bsd@lists.debian.org folks.
> 
> recv seems to fail (loop) with "Resource temporarily unavailable" (ret=-1,
> errno=35) according to ktrace
> 
> http://people.debian.org/~christoph/libxcb.diff
> 
The MSG_WAITALL addition looks very wrong to me.  The commit message
doesn't explain it, and it's replacing a plain read() which should be
equivalent to a recv() with flags == 0, *not* MSG_WAITALL.
Comment 4 Josh Triplett 2012-02-08 11:50:10 UTC
I looked up the description of MSG_WAITALL in the recv manpage:

MSG_WAITALL (since Linux 2.2)
       This  flag  requests that the operation block until the full
       request is satisfied.  However, the call may still return less
       data than requested if a signal is caught, an error or disconnect
       occurs, or the next data to be received is of a different type
       than that returned.

This seems entirely wrong for these calls to recv.  Passing MSG_WAITALL suggests that we don't want the OS to return until it has filled the entire input buffer, whereas we clearly want the OS to return as soon as it has an X response for us.  That would certainly explain the hangs.

By all means, remove the MSG_WAITALL flag.
Comment 5 Uli Schlachter 2012-02-08 12:05:44 UTC
(In reply to comment #4)
> I looked up the description of MSG_WAITALL in the recv manpage:
[...]
> This seems entirely wrong for these calls to recv.  Passing MSG_WAITALL
> suggests that we don't want the OS to return until it has filled the entire
> input buffer, whereas we clearly want the OS to return as soon as it has an X
> response for us. 

That's only true for _xcb_in_read() (which I totally missed, whoops). read_block() would loop until it has read as much data as was asked for anyway.

> That would certainly explain the hangs.

Well, it doesn't hang, it busy-loops with recv() always returning EAGAIN and this behavior isn't explained yet.
Comment 6 Josh Triplett 2012-02-08 12:37:30 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > I looked up the description of MSG_WAITALL in the recv manpage:
> [...]
> > This seems entirely wrong for these calls to recv.  Passing MSG_WAITALL
> > suggests that we don't want the OS to return until it has filled the entire
> > input buffer, whereas we clearly want the OS to return as soon as it has an X
> > response for us. 
> 
> That's only true for _xcb_in_read() (which I totally missed, whoops).
> read_block() would loop until it has read as much data as was asked for anyway.

Fair enough.  Does dropping MSG_WAITALL from _xcb_in_read solve the problem, or do programs still hang in read_block()?

> > That would certainly explain the hangs.
> 
> Well, it doesn't hang, it busy-loops with recv() always returning EAGAIN and
> this behavior isn't explained yet.

That one seems fairly straightforward as well.  We put sockets into non-blocking mode, so any read/recv operation that would block will instead return EAGAIN.  MSG_WAITALL tells recv to block until it fills the buffer, so it would make sense that if recv couldn't immediately fill the buffer it would return EAGAIN.

It also wouldn't surprise me if Linux effectively ignores MSG_WAITALL on non-blocking sockets (since it doesn't seem like an overly sensible combination), which would explain why this hang didn't occur on Linux.
Comment 7 Christoph Egger 2012-02-08 13:00:01 UTC
(In reply to comment #6)
> Fair enough.  Does dropping MSG_WAITALL from _xcb_in_read solve the problem, or
> do programs still hang in read_block()?

Removing the one in _xcb_in_read is absolutely enough to solve the looping for me
Comment 8 Julien Cristau 2012-02-08 13:06:34 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > I looked up the description of MSG_WAITALL in the recv manpage:
> [...]
> > This seems entirely wrong for these calls to recv.  Passing MSG_WAITALL
> > suggests that we don't want the OS to return until it has filled the entire
> > input buffer, whereas we clearly want the OS to return as soon as it has an X
> > response for us. 
> 
> That's only true for _xcb_in_read() (which I totally missed, whoops).
> read_block() would loop until it has read as much data as was asked for anyway.
> 
well yes, but as you say read_block() does the looping itself anyway, so the MSG_WAITALL, while not as broken as in _xcb_in_read, is also useless.
Comment 9 Josh Triplett 2012-02-08 13:37:53 UTC
(In reply to comment #8)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > I looked up the description of MSG_WAITALL in the recv manpage:
> > [...]
> > > This seems entirely wrong for these calls to recv.  Passing MSG_WAITALL
> > > suggests that we don't want the OS to return until it has filled the entire
> > > input buffer, whereas we clearly want the OS to return as soon as it has an X
> > > response for us. 
> > 
> > That's only true for _xcb_in_read() (which I totally missed, whoops).
> > read_block() would loop until it has read as much data as was asked for anyway.
> > 
> well yes, but as you say read_block() does the looping itself anyway, so the
> MSG_WAITALL, while not as broken as in _xcb_in_read, is also useless.

It doesn't do any harm, though, and it allows the OS to do the waiting for us rather than looping every time we get a bit of data.  Unless it causes problems, I'd suggest leaving it.
Comment 11 Chí-Thanh Christopher Nguyễn 2012-03-03 04:24:25 UTC
The hang also affects Gentoo Prefix on OS X.
The patch from christoph/jcristau will make the problem go away.
Comment 12 Uli Schlachter 2012-07-14 23:38:05 UTC
I just noticed that this bug is still open. However, I'm quite sure that this was fixed in libxcb 1.8.1 with commit 236f914ea7205f5f74e87fcc1b06d87bd0789a7a.