Summary: | DBus daemon hangup | ||
---|---|---|---|
Product: | dbus | Reporter: | Yannick Lanz <yannick.lanz> |
Component: | core | Assignee: | Havoc Pennington <hp> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | chengwei.yang.cn, yannick.lanz |
Version: | unspecified | Keywords: | patch |
Hardware: | ARM | ||
OS: | Linux (All) | ||
Whiteboard: | review? | ||
i915 platform: | i915 features: | ||
Attachments: |
Log and result
[PATCH] Check EINVAL for accept4() [PATCH v2] Check EINVAL for accept4() |
(In reply to comment #0) > socketpair(PF_AX25, SOCK_CLOEXEC|0xbed3a7e0, 3202066396, 0x80000) = -1 > EINVAL (Invalid argument) Either your strace executable is wrong, or your dbus-daemon is wrong, because we shouldn't be using AX25 sockets (whatever those are). All our calls to socketpair() are for AF_UNIX sockets (and that's hard-coded, so it seems unlikely that it could go wrong). I think your cross-compilation may have gone wrong: did you perhaps cross-compile such that headers from a different architecture were included? What was your configure command line for cross-compilation? Which D-Bus version? > write(2, "Failed to accept a client connec"..., 58Failed to accept a client
> connection: Bad file descriptor
The dbus-daemon is probably busy-looping on a poll() of an invalid fd. In principle it shouldn't be possible for an invalid fd to get into the main loop at all, but perhaps some sort of "can't happen" socket syscall failure might have that effect?
I think this is probably miscompilation.
Hello, Thanks for your help. I have finally found where is the problem with GDB. The problem is in the file "dbus-sysdeps-unix.c" in the function "_dbus_accept". The functions accept4 is supported by my cross toolchain and the define SOCK_CLOEXEC too but... the line /* We assume that if accept4 is available SOCK_CLOEXEC is too */ client_fd = accept4 (listen_fd, &addr, &addrlen, SOCK_CLOEXEC); cloexec_done = client_fd >= 0; return always a errno 22 (Invalid argument). I don't know why but I have remplaced this call by accept and I set the variable cloexec_done to 0 like that: #ifdef HAVE_ACCEPT4 /* We assume that if accept4 is available SOCK_CLOEXEC is too */ //client_fd = accept4 (listen_fd, &addr, &addrlen, SOCK_CLOEXEC); //cloexec_done = client_fd >= 0; client_fd = accept (listen_fd, &addr, &addrlen); cloexec_done = 0; //if (client_fd < 0 && errno == ENOSYS) #endif //{ //client_fd = accept (listen_fd, &addr, &addrlen); //} For the moment, there is no problem visibility due to this modification. Thank at all (In reply to comment #3) > Hello, > > Thanks for your help. I have finally found where is the problem with GDB. > The problem is in the file "dbus-sysdeps-unix.c" in the function > "_dbus_accept". > > The functions accept4 is supported by my cross toolchain and the define > SOCK_CLOEXEC too but... the line > > /* We assume that if accept4 is available SOCK_CLOEXEC is too */ > client_fd = accept4 (listen_fd, &addr, &addrlen, SOCK_CLOEXEC); > cloexec_done = client_fd >= 0; > > return always a errno 22 (Invalid argument). I don't know why but I have So seems we need a patch to check EINVAL for the errno of accept4, currently, only ENOSYS checked. According to the above behavior, we need drop the below comment. /* We assume that if accept4 is available SOCK_CLOEXEC is too */ > remplaced this call by accept and I set the variable cloexec_done to 0 like > that: > > #ifdef HAVE_ACCEPT4 > /* We assume that if accept4 is available SOCK_CLOEXEC is too */ > //client_fd = accept4 (listen_fd, &addr, &addrlen, SOCK_CLOEXEC); > //cloexec_done = client_fd >= 0; > > client_fd = accept (listen_fd, &addr, &addrlen); > cloexec_done = 0; > > //if (client_fd < 0 && errno == ENOSYS) > #endif > //{ > //client_fd = accept (listen_fd, &addr, &addrlen); > //} > > For the moment, there is no problem visibility due to this modification. > Thank at all Created attachment 85600 [details] [review] [PATCH] Check EINVAL for accept4() Yannick Lanz, could you help to verify this patch? Thank you in advance. (In reply to comment #4) > According to the above behavior, we need drop the below comment. > /* We assume that if accept4 is available SOCK_CLOEXEC is too */ There are two levels of "available" going on here. One is what's in the libc headers, and will compile successfully: that's the only thing that Autoconf can check. The other is what the running kernel actually supports. I think that comment would be better phrased as: At compile-time, we assume that if accept4() is available in libc headers, SOCK_CLOEXEC is too. At runtime, it is still not necessarily true that either is supported by the running kernel. (In reply to comment #5) > Created attachment 85600 [details] [review] > [PATCH] Check EINVAL for accept4() This is a good start (assuming it works for Yannick - I won't apply it until that's confirmed), but something is still wrong here: when the accept4() fails, it shouldn't result in a busy-loop. I suspect what's going on might be something like this: * we're watching the listening fd * when it becomes readable, we try to accept4() on it * on failure, it's still readable (there's still an incoming connection waiting) * because select() and poll() are level-triggered, as soon as we go back to the main loop, we notice it's still readable and try to accept4() again * busy-loop If the accept operation fails with a "fatal-looking" error, we should probably issue a _dbus_warn() and stop watching the fd. Unfortunately: Linux accept() (and accept4()) passes already-pending network errors on the new socket as an error code from accept(). This behavior differs from other BSD socket implementations. For reliable operation the application should detect the network errors defined for the protocol after accept() and treat them like EAGAIN by retrying. In the case of TCP/IP, these are ENETDOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP, and ENETUNREACH. so we should probably only stop watching the fd on specific errors: from the Linux accept4(2) man page, EBADF, EINVAL, ENOTSOCK, EOPNOTSUPP look like candidates. > Yannick Lanz, could you help to verify this patch? Thank you in advance. (In reply to comment #6) I can't verify it before tonight Concerning the incompatibility, I have found a similar thread about udev (https://github.com/gentoo/eudev/issues/7). It seems that the kernel 2.6.31 doesn't support accept4 so it's not a ARM toolchain problem but definitely the kernel so all architectures are potentially affected ? Concerning the solution, I'm available for testing yours patch (In reply to comment #5) > Created attachment 85600 [details] [review] [review] > [PATCH] Check EINVAL for accept4() > > Yannick Lanz, could you help to verify this patch? Thank you in advance. I have successfully patched dbus-1.6.14 and coss-compiled then ran without problem. (In reply to comment #6) > (In reply to comment #4) > > According to the above behavior, we need drop the below comment. > > /* We assume that if accept4 is available SOCK_CLOEXEC is too */ > > There are two levels of "available" going on here. > > One is what's in the libc headers, and will compile successfully: that's the > only thing that Autoconf can check. The other is what the running kernel > actually supports. Yes, exactly, so I didn't drop the comment in my patch. > > I think that comment would be better phrased as: > > At compile-time, we assume that if accept4() is available in > libc headers, SOCK_CLOEXEC is too. At runtime, it is still > not necessarily true that either is supported by the running kernel. Sure, the above comment looks good to me, I'll adopt it in the patch v2. > > (In reply to comment #5) > > Created attachment 85600 [details] [review] [review] > > [PATCH] Check EINVAL for accept4() > > This is a good start (assuming it works for Yannick - I won't apply it until > that's confirmed), but something is still wrong here: when the accept4() > fails, it shouldn't result in a busy-loop. Yes, neither accept4() or accept(). > > I suspect what's going on might be something like this: > > * we're watching the listening fd > * when it becomes readable, we try to accept4() on it > * on failure, it's still readable (there's still an incoming connection > waiting) > * because select() and poll() are level-triggered, as soon as we go back to > the main loop, we notice it's still readable and try to accept4() again > * busy-loop > > If the accept operation fails with a "fatal-looking" error, we should > probably issue a _dbus_warn() and stop watching the fd. > > Unfortunately: > > Linux accept() (and accept4()) passes already-pending network errors > on > the new socket as an error code from accept(). This behavior > differs > from other BSD socket implementations. For reliable operation > the > application should detect the network errors defined for the > protocol > after accept() and treat them like EAGAIN by retrying. In the case > of > TCP/IP, these are ENETDOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, > ENONET, > EHOSTUNREACH, EOPNOTSUPP, and ENETUNREACH. > > so we should probably only stop watching the fd on specific errors: from the > Linux accept4(2) man page, EBADF, EINVAL, ENOTSOCK, EOPNOTSUPP look like > candidates. That's somehow a fatal error, especially in common using scenario, the dbus-daemon only listen on one address. If the only address removed, it becomes useless. Created attachment 85689 [details] [review] [PATCH v2] Check EINVAL for accept4() applied code comments from Simon. Fixed in git for 1.6.16, 1.7.6 - thanks |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 85321 [details] Log and result Hi, DBus version: 1.6.14 Linux kernel: 2.6.31 I don't know if it's a bug or a bad cross-compilation but I have compiled DBus for arm (IM.X25). I firstly launch the DBus daemon with debug support: $ dbus-daemon --system The output is in the attached file. Then I launch this programm: int main() { DBusError error; DBusConnection *conn; printf("D-Bus bus testing application\n"); dbus_error_init (&error); conn = dbus_bus_get (DBUS_BUS_SYSTEM, &error); if (!conn) { fprintf (stderr, "%s: %s\n", error.name, error.message); return 1; } return 0; } After that, the test application is blocked and the DBus daemon is hangup: PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND 586 1 root R 2756 4.8 0 70.0 dbus-daemon --system When I launch the daemon with strace, I have a infinite loop trying to do a socket pair: fcntl64(-1, F_GETFD) = -1 EBADF (Bad file descriptor) write(2, "595: ", 5595: ) = 5 write(2, "[dbus-server-socket.c(203):socke"..., 48[dbus-server-socket.c(203):socket_handle_watch] ) = 48 write(2, "Failed to accept a client connec"..., 58Failed to accept a client connection: Bad file descriptor ) = 58 clock_gettime(CLOCK_MONOTONIC, {7887, 312515930}) = 0 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}]) clock_gettime(CLOCK_MONOTONIC, {7887, 318238546}) = 0 ewrite(2, "595: ", 5595: ) = 5 write(2, "[dbus-server-socket.c(181):socke"..., 48[dbus-server-socket.c(181):socket_handle_watch] ) = 48 write(2, "Handling client connection, flag"..., 38Handling client connection, flags 0x1 ) = 38 socketpair(PF_AX25, SOCK_CLOEXEC|0xbed3a7e0, 3202066396, 0x80000) = -1 EINVAL (Invalid argument) write(2, "595: ", 5595: ) = 5 write(2, "[dbus-sysdeps-unix.c(1956):_dbus"..., 41[dbus-sysdeps-unix.c(1956):_dbus_accept] ) = 41 write(2, "client fd -1 accepted\n", 22client fd -1 accepted ) = 22