There is an infinite loop in dbus/dbus-spawn.c between lines 311-317 (in both stable 1.6 line and in master): /* If we couldn't reap the child then kill it, and * try again */ if (ret == 0) kill (sitter->sitter_pid, SIGKILL); again: if (ret == 0) ret = waitpid (sitter->sitter_pid, &status, 0); if (ret < 0) { if (errno == EINTR) goto again; else if (errno == ECHILD) Once waitpid() returns -1 with errno set to EINTR, dbus-deamon enters an infinite loop.
(In reply to comment #0) > Once waitpid() returns -1 with errno set to EINTR, dbus-deamon enters an > infinite loop. Which platforms does this happen on, in practice? I think the solution is to set "ret = 0" before "goto again", or add a sensible EINTR wrapper.
It happened few times on my openSUSE 12.3 x86_64 (actually, it is using 100% of one of my two cores right now.) I also found the same bug filed against earlier openSUSE version, so it happens to other people as well: https://bugzilla.novell.com/show_bug.cgi?id=782909
Does the patch I'm about to attach solve this for you?
Created attachment 85236 [details] [review] _dbus_babysitter_unref: avoid infinite loop if waitpid() returns EINTR If waitpid() failed with EINTR, we'd go back for another go, but because ret is nonzero, we'd skip the waitpid() and just keep looping. Also avoid an unnecessary "goto" in favour of a proper loop, to make it more clearly correct.
(In reply to comment #3) > Does the patch I'm about to attach solve this for you? I've applied a fix yesterday and new instance of dbus-daemon is running without problems right now, but it doesn't mean anything: this issue happens once in a few weeks, so it is almost impossible to tell that the problem was fixed. Anyway, we know there is a bug there and it has to be fixed.
(In reply to comment #4) > Created attachment 85236 [details] [review] [review] > _dbus_babysitter_unref: avoid infinite loop if waitpid() returns EINTR > > If waitpid() failed with EINTR, we'd go back for another go, but > because ret is nonzero, we'd skip the waitpid() and just keep looping. > > Also avoid an unnecessary "goto" in favour of a proper loop, to make it > more clearly correct. Ouch, can't believe we've had that bug for so long (since 2003 looks like). Reviewed-by: Colin Walters <walters@verbum.org>
(In reply to comment #6) > Ouch, can't believe we've had that bug for so long (since 2003 looks like). > > Reviewed-by: Colin Walters <walters@verbum.org> Fixed in git for 1.6.14 (which I'll release).
Thanks for the patch. As soon as I am done with more urgent matters, I'll build an openSUSE package with it so it gets more testing.
Fix was integrated into openSUSE Factory.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.