Bug 23194 - Malformed use of poll(2) in the dbus-daemon
Summary: Malformed use of poll(2) in the dbus-daemon
Status: RESOLVED FIXED
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: 1.5
Hardware: All All
: high critical
Assignee: Simon McVittie
QA Contact: John (J5) Palmieri
URL: http://cgit.freedesktop.org/~smcv/dbu...
Whiteboard:
Keywords: patch
: 33837 (view as bug list)
Depends on: 33126 33336 33342
Blocks: 33337 33474 dbus-1.5
  Show dependency treegraph
 
Reported: 2009-08-07 04:57 UTC by Thiago Macieira
Modified: 2011-06-13 09:45 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
DBusLoop: store watches in a hash table keyed by fd (14.81 KB, patch)
2011-01-25 07:58 UTC, Simon McVittie
Details | Splinter Review
bus-test: close fds beyond 2 on startup (2.67 KB, patch)
2011-01-25 07:59 UTC, Simon McVittie
Details | Splinter Review
dbus-mainloop: stylistic fix: don't use link->data directly (1.24 KB, patch)
2011-04-07 10:01 UTC, Simon McVittie
Details | Splinter Review
[3] Rename cull_watches_for_fd() based on Thiago's review (1.05 KB, patch)
2011-06-13 04:51 UTC, Simon McVittie
Details | Splinter Review

Description Thiago Macieira 2009-08-07 04:57:14 UTC
It's possible to make dbus-daemon stop responding at all to any requests, with some effort. It's caused by its buggy use of poll(2).

The manual for poll says:
       EINVAL The nfds value exceeds the RLIMIT_NOFILE value.

The code in dbus-mainloop.c never checks for error conditions returning from poll() / _dbus_poll(), assuming that errors cannot happen. EBADF and EFAULT cannot happen in a properly constructed application, ENOMEM should be self-corrected by the kernel and EINTR is harmless.

However, EINVAL can only be recovered from by trying to poll less file descriptors. It can also be prevented by properly constructing the application, which D-Bus isn't doing.

The reason why nfds got over the limit is because the code is creating one entry for POLLIN and another for POLLOUT. So it's possible to cause the problem to occur by opening more than 512 connections to the bus and making it try to write to most of those.

In the case I ran into, nfds was 1047. That means EINVAL got returned and no file descriptors got acted on. By consequence, the daemon did not deliver any messages nor closed any sockets, so it reentered the same poll(2) call, which failed again. So the daemon was unrecoverable: no new connections were accepted, no messages delivered or accepted. 

Applications with blocking calls started hanging and the desktop became unusable.

The suggested solution for this problem is to create one DBusPollFD per file descriptor, no more.
Comment 1 Simon McVittie 2011-01-11 09:09:14 UTC
(In reply to comment #0)
> The reason why nfds got over the limit is because the code is creating one
> entry for POLLIN and another for POLLOUT. So it's possible to cause the problem
> to occur by opening more than 512 connections to the bus and making it try to
> write to most of those.

I always thought it was pretty weird that DBusConnection has two watches for one fd... unfortunately, it follows from these assumptions:

* DBusWatch's flags are immutable during the watch's lifetime
* we don't want to remove/add a DBusWatch every time whether we want to write changes

This is also irritating if we want to use epoll_wait(2) on Linux, which requires you to watch each fd at most once. Under that regime, we need to adjust the setup for each fd whenever the corresponding read-watch *or* write-watch is added, toggled or removed.
Comment 2 Colin Walters 2011-01-11 10:57:01 UTC
I agree that the poll() code here is busted, however I think there's also a larger issue, which is the desynchronization between the process ulimit for the bus, and its configuration.

Many Linux forks and embeds use 1024 as a per-process default I believe.  However the default bus configuration we ship has:

  parser->limits.max_completed_connections = 2048;

In addition to:
  
  parser->limits.max_incomplete_connections = 64;

For completeness there is also:

  parser->limits.max_connections_per_user = 256;

So:

* Should the system bus instance e.g. ask for more via setrlimit() on startup?
  We'd have to do this before we called setuid(), since on Linux we need CAP_SYS_RESOURCE.
* Should we change the hardcoded defaults here to be below 1024?
Comment 3 Colin Walters 2011-01-13 14:10:16 UTC
See also

https://bugzilla.redhat.com/show_bug.cgi?id=667835
Comment 4 Simon McVittie 2011-01-25 07:44:26 UTC
(In reply to comment #2)
> I agree that the poll() code here is busted, however I think there's also a
> larger issue, which is the desynchronization between the process ulimit for the
> bus, and its configuration.

Let's restrict this bug to the double-polling (for which I now have a patch), and clone it for the ulimit/configuration mismatch.
Comment 5 Simon McVittie 2011-01-25 07:56:45 UTC
(In reply to comment #4)
> clone it for the ulimit/configuration mismatch

Bug #33474.

> The suggested solution for this problem is to create one DBusPollFD per file
> descriptor, no more.

Patch on the way; the branch in my gitweb is based on a merge of all my pending branches, but I think it probably "only" needs the branches from Bug #33342 and Bug #33336, plus Bug #33126 so it will pass the regression tests reliably.

I don't think this is suitable for 1.4 due to its dependencies, but it's my top priority for 1.5.

(Looking at the patch now, I could probably have done this without depending on *all* of Bug #33342, but it requires at least the commits "keep separate lists of watches and timeouts" and "factor out watch_flags_to_poll_events ...", and I think a minimal backport would still be too big for 1.4.)
Comment 6 Simon McVittie 2011-01-25 07:58:04 UTC
Created attachment 42465 [details] [review]
DBusLoop: store watches in a hash table keyed by fd

This means we don't need to build up the watches_for_fds array, because
it's quick to find the watches based on the fd.
    
This also fixes fd.o #23194, because we only put each fd in the
array of DBusPollFD once, with the union of all its watches' flags; we
can no longer get more entries in the array than our number of file
descriptors, which is capped at our RLIMIT_NOFILE, so we can't get
EINVAL from poll(2).
Comment 7 Simon McVittie 2011-01-25 07:59:11 UTC
Created attachment 42466 [details] [review]
bus-test: close fds beyond 2 on startup

If we're going to check for leaks by asserting that we have no open files
beyond fd 2, then we ought to make sure that's the case on startup, in
case our parent process gave us a fd we didn't want (gdb does this, for
instance).

(This is not required to fix this bug, but it was very useful during development, to make the test not fail too early under gdb.)
Comment 8 Colin Walters 2011-01-27 09:36:18 UTC
Review of attachment 42466 [details] [review]:

::: dbus/dbus-message-private.h
@@ +139,3 @@
                                                 va_list          var_args);
 
+void _dbus_close_excess_fds(void);

Would prefer this lived in dbus-sysdeps.h.

::: dbus/dbus-message-util.c
@@ +182,3 @@
+      closedir(d);
+    }
+#endif

This needs a non-#ifdef __linux__ implementation (and should probably fall back if the open of /prof/self/fd fails for some reason too).

This code is definitely scattered around; look at e.g.
dbus-sysdeps-unix.c:3108 where we use sysconf (_SC_OPEN_MAX) and then a loop.  I think it'd be worth at least refactoring that bit to use this common function.

Also I suggest a better name is:

_dbus_close_nonstandard_fds ();

Or possibly

_dbus_close_fds_from (3); ?
Comment 9 Simon McVittie 2011-01-27 09:44:11 UTC
(In reply to comment #8)
> This needs a non-#ifdef __linux__ implementation (and should probably fall back
> if the open of /prof/self/fd fails for some reason too).

For this purpose, it doesn't need a fallback: it's meant to be an exact mirror of the check for leaked fds in the same file. Both are #ifdef __linux__ and only used in the regression tests.

> _dbus_close_nonstandard_fds ();

It'd be reasonable to put that function (and perhaps the leak check too) in sysdeps-util, though, and if placed in that location I agree it should be cross-platform.
Comment 10 Colin Walters 2011-01-31 10:57:15 UTC
Comment on attachment 42465 [details] [review]
DBusLoop: store watches in a hash table keyed by fd

>From 55e1ef617c6d1892dc5b390cf6cb8252ee002a89 Mon Sep 17 00:00:00 2001
>From: Simon McVittie <simon.mcvittie@collabora.co.uk>
>Date: Tue, 25 Jan 2011 15:38:06 +0000
>Subject: [PATCH 1/2] DBusLoop: store watches in a hash table keyed by fd

Reading over this it definitely feels right, but I am not very familiar with the DBusWatch code unfortunately.  I just have a minor style comments:

>+
>+              if (dbus_watch_get_enabled (link->data))

I've been bitten before when using "link->data" style direct dereferences with GList; basically you're trying to refactor some code, and you have to try to remember each time you see link->data what the actual type is.

Can you just hoist it out like you've done elsewhere with:

{
  DBusWatch *watch = link->data;

  if (dbus_watch_get_enabled (watch)) ...

Otherwise if no one else comments and this passes the test suite etc., I think it's probably good to go.
Comment 11 Simon McVittie 2011-01-31 11:11:12 UTC
I can append a fix for the link->data pattern, sure.

Other things that block merging this, from the blocking bugs:

* Bug #33126 needs review: otherwise the tests will fail, because they detect
  a previously undetected memory leak
* Bug #33336 needs review: it provides an invariant that we rely on here
* Bug #33342 needs review: this branch makes use of the refactoring there

Because of Bug #33342, I think this is probably a job for 1.5. I've branched 1.4 and will use that to commit small bugfixes.
Comment 12 Thiago Macieira 2011-02-02 06:58:32 UTC
*** Bug 33837 has been marked as a duplicate of this bug. ***
Comment 13 Simon McVittie 2011-03-10 05:20:10 UTC
Comment on attachment 42466 [details] [review]
bus-test: close fds beyond 2 on startup

I've cloned Bug #35173 for the unimportant patch attached here.

Other than that, I owe Colin a stylistic patch for not using iter->data all over the place, and this bug is also blocked by re-review of Bug #33336 and review of Bug #33342.
Comment 14 Simon McVittie 2011-04-07 10:01:57 UTC
Created attachment 45391 [details] [review]
dbus-mainloop: stylistic fix: don't use link->data directly

This amends Attachment #42465 [details] with Colin's suggested stylistic change. In the git branch whose gitweb is in the URL field, I've squashed it into the previous commit.
Comment 15 Simon McVittie 2011-04-07 10:02:45 UTC
not for 1.4
Comment 16 Thiago Macieira 2011-06-13 03:23:57 UTC
Review of attachment 45391 [details] [review]:

Nothing particular, this just works.
Comment 17 Simon McVittie 2011-06-13 03:28:42 UTC
(In reply to comment #16)
> Review of attachment 45391 [details] [review]:
> 
> Nothing particular, this just works.

Thanks, any thoughts on the bugs that block this one?
Comment 18 Thiago Macieira 2011-06-13 03:55:59 UTC
Review of attachment 42465 [details] [review]:

Simple questions, the patch looks good.

::: dbus/dbus-mainloop.c
@@ +257,3 @@
+  DBusList **watches;
+
+  _dbus_warn ("invalid request, socket fd %d not open\n", fd);

Why is this unconditional warning here?

@@ +268,3 @@
+    }
+
+  _dbus_hash_table_remove_int (loop->watches, fd);

Doesn't this need to free the memory for loop->watches too?

@@ +605,2 @@
           _dbus_wait_for_memory ();
           fds = dbus_new0 (DBusPollFD, loop->watch_count);

Here we allocate memory for loop->watch_count elements, which may be higher than the number of fds.

@@ +844,3 @@
+          watches = _dbus_hash_table_lookup_int (loop->watches, fds[i].fd);
+
+          if (watches == NULL)

Is this condition possible? Is this a case for _dbus_assert?
Comment 19 Simon McVittie 2011-06-13 04:27:15 UTC
(In reply to comment #18)
> Review of attachment 42465 [details] [review]:
> 
> Simple questions, the patch looks good.
> 
> ::: dbus/dbus-mainloop.c
> @@ +257,3 @@
> +  DBusList **watches;
> +
> +  _dbus_warn ("invalid request, socket fd %d not open\n", fd);
> 
> Why is this unconditional warning here?

As discussed on IRC, that function is only called to recover from something that shouldn't happen (it's the fix for Bug #32992, factored out into a function). I've renamed it to cull_watches_for_invalid_fd() as you suggested.

> 
> @@ +268,3 @@
> +    }
> +
> +  _dbus_hash_table_remove_int (loop->watches, fd);
> 
> Doesn't this need to free the memory for loop->watches too?

If you mean "doesn't this need to free the value corresponding to fd?", then no, the hash table has a free-function (free_watch_table_entry) which does that automatically when the key is removed or when the hash table itself is destroyed.

If you mean "shouldn't this free the hash table?", then no, it shouldn't be freed until the last unref of the DBusLoop.

> 
> @@ +605,2 @@
>            _dbus_wait_for_memory ();
>            fds = dbus_new0 (DBusPollFD, loop->watch_count);
> 
> Here we allocate memory for loop->watch_count elements, which may be higher
> than the number of fds.

Right, the important thing is that it's at least the number of fds. You're right that this will typically be using a bit more memory than it needs to; I could use _dbus_hash_table_size() if you think this is significant enough.

Strictly speaking we only actually need enough memory for the number of *enabled* fds, but I don't think the extra complexity of keeping a count of those is justified. Bug #33337 switches to using epoll on modern Linux, at which point it's the kernel's problem (all other OSs get a slow-path using poll, like we do now, but there's infrastructure by which we could plug in a scalable epoll-like interface on other Oss if anyone cares enough).

> @@ +844,3 @@
> +          watches = _dbus_hash_table_lookup_int (loop->watches, fds[i].fd);
> +
> +          if (watches == NULL)
> 
> Is this condition possible? Is this a case for _dbus_assert?

It shouldn't be possible; I'm not sure I want to bet on that though.
Comment 20 Simon McVittie 2011-06-13 04:51:27 UTC
Created attachment 47893 [details] [review]
[3] Rename cull_watches_for_fd() based on Thiago's review

To be squashed into Attachment #42465 [details] for commit, probably.
Comment 21 Simon McVittie 2011-06-13 09:45:55 UTC
Fixed in git for 1.5.6. The refactoring that this required is too big for 1.4, I think.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.