Bug 33336 - remove, invalidate, free DBusWatch (in that order) before closing the socket
Summary: remove, invalidate, free DBusWatch (in that order) before closing the socket
Status: RESOLVED FIXED
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: 1.5
Hardware: Other All
: high enhancement
Assignee: Simon McVittie
QA Contact: John (J5) Palmieri
URL: http://cgit.freedesktop.org/~smcv/dbu...
Whiteboard: review?
Keywords: patch
Depends on:
Blocks: 23194 33337 33342 dbus-1.5
  Show dependency treegraph
 
Reported: 2011-01-21 07:14 UTC by Simon McVittie
Modified: 2011-06-13 09:46 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Always remove, invalidate and free watches before closing watched sockets (13.57 KB, patch)
2011-01-21 08:59 UTC, Simon McVittie
Details | Splinter Review
Check that watches are removed, then invalidated, then unreffed (1.35 KB, patch)
2011-01-21 08:59 UTC, Simon McVittie
Details | Splinter Review
bus: signal_handler: ignore failure to write, and explain why (1.99 KB, patch)
2011-02-24 06:28 UTC, Simon McVittie
Details | Splinter Review
bus signal_handler: don't use _dbus_warn, and don't pretend to be portable (2.71 KB, patch)
2011-02-24 06:28 UTC, Simon McVittie
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Simon McVittie 2011-01-21 07:14:14 UTC
If we're going to have one DBusPollFD (or equivalent) per socket (which is a bug in its own right, and also a prerequisite for using epoll), we need to be able to map from a socket fd to its DBusWatch(es). It'd help if we could rely on DBusWatch always having this pattern:

- remove the DBusWatch
- invalidate the DBusWatch (set its fd to -1)
- unref it
- *then* close the watched socket

This is probably a job for 1.5 rather than 1.4. I'm working on it.
Comment 1 Simon McVittie 2011-01-21 08:59:37 UTC
Created attachment 42275 [details] [review]
Always remove, invalidate and free watches before closing watched sockets

This should mean we don't get invalid fds in the main loop.

The BSD (kqueue) and Windows code paths are untested, but follow the same
patterns as the tested Linux/generic Unix versions.

DBusTransportSocket was already OK (it called free_watches() before
_dbus_close_socket, and that did the remove, invalidate, unref dance).
Comment 2 Simon McVittie 2011-01-21 08:59:55 UTC
Created attachment 42276 [details] [review]
Check that watches are removed, then invalidated, then unreffed
Comment 3 Will Thompson 2011-02-02 07:57:52 UTC
Review of attachment 42275 [details] [review]:

This basically looks fine.

::: bus/dir-watch-inotify.c
@@ +211,2 @@
       _dbus_watch_unref (watch);
       _dbus_loop_unref (loop);

These four calls (or calls very much like them) show up constantly. They seem sane, but I wonder whether it'd be possible to simplify matters.

::: bus/main.c
@@ +66,3 @@
           {
             _dbus_warn ("Unable to write to reload pipe.\n");
+            close_reload_pipe_write ();

I'm prepared to believe that calling close_reload_pipe() here was unsafe; will the remains of the closing turn out to happen later?

::: dbus/dbus-spawn-win.c
@@ +167,3 @@
+      _dbus_watch_unref (sitter->sitter_watch);
+      sitter->sitter_watch = NULL;
+    }

Does this mean that the watch was never removed on Windows?
Comment 4 Will Thompson 2011-02-02 08:01:41 UTC
Review of attachment 42276 [details] [review]:

This seems reasonable, though I'm a little afraid of asserting here.
Comment 5 Simon McVittie 2011-02-02 08:35:05 UTC
(In reply to comment #3)
> ::: bus/main.c
> @@ +66,3 @@
>            {
>              _dbus_warn ("Unable to write to reload pipe.\n");
> +            close_reload_pipe_write ();
> 
> I'm prepared to believe that calling close_reload_pipe() here was unsafe; will
> the remains of the closing turn out to happen later?

(For the record, the reason close_reload_pipe() is unsafe here is that manipulating our DBusWatch may call functions that aren't async-signal-safe. Hardly anything is guaranteed not to crash in a POSIX signal handler, apart from a few syscalls; in particular, malloc might not work.)

We'll close the read end of the pipe shortly afterwards, when reading 1 byte from it in handle_reload_watch() fails, i.e. when the pipe fd reports POLLHUP and we've drained all the bytes in the buffer.

In practice the effect of the (existing and new) code here is that if you send 4,096 SIGHUP signals before the main loop has a chance to deal with any of them, the bus daemon will start ignoring SIGHUP. I think. This is completely theoretical, but not really ideal. Perhaps signal_handler() should ignore failed writes completely, on the basis that config will be reloaded (4,096 times!) while draining the queue, which is what you wanted, and then the signal handler will presumably be allowed to write to the pipe again.

> ::: dbus/dbus-spawn-win.c
> @@ +167,3 @@
> +      _dbus_watch_unref (sitter->sitter_watch);
> +      sitter->sitter_watch = NULL;
> +    }
> 
> Does this mean that the watch was never removed on Windows?

It was probably removed by destruction of the DBusWatchList, which implicitly removes all of its watches (but might do so too late).
Comment 6 Simon McVittie 2011-02-24 06:27:39 UTC
I've rebased badfd-33336 onto master: I think it's 1.5.0 material. The two patches are the same as seen here.

(In reply to comment #4)
> This seems reasonable, though I'm a little afraid of asserting here.

Assertions are disabled by default (unless you --enable-maintainer-mode), so packaged versions won't have this check anyway. I think it's OK to add on a development branch?

(In reply to comment #5)
> In practice the effect of the (existing and new) code here is that if you send
> 4,096 SIGHUP signals before the main loop has a chance to deal with any of
> them, the bus daemon will start ignoring SIGHUP. I think. This is completely
> theoretical, but not really ideal. Perhaps signal_handler() should ignore
> failed writes completely, on the basis that config will be reloaded (4,096
> times!) while draining the queue, which is what you wanted

(I discovered today that it's actually 65,536 times on modern Linux.)

The branch badfd-33336-hup adds two patches (which I'll attach in a moment) to fix this, and to fix use of non-async-signal-safe _dbus_warn in the signal handler. I consider these additional patches to be optional; they don't block any of the bugs blocked by this one.
Comment 7 Simon McVittie 2011-02-24 06:28:27 UTC
Created attachment 43752 [details] [review]
bus: signal_handler: ignore failure to write, and explain why

See the comment in the source code for rationale.
Comment 8 Simon McVittie 2011-02-24 06:28:53 UTC
Created attachment 43753 [details] [review]
bus signal_handler: don't use _dbus_warn, and don't pretend to be portable

_dbus_warn isn't async-signal-safe, so that's out. We can use write()
instead; it's POSIX but not ISO C, but then again, so are signals.
Accordingly, guard it with DBUS_UNIX.

dbus-sysdeps-util-win doesn't actually implement _dbus_set_signal_handler
anyway, so not compiling this code on non-Unix seems more honest.
Comment 9 Simon McVittie 2011-04-07 09:49:40 UTC
unsuitable for 1.4, IMO; review for 1.5.x would be appreciated
Comment 10 Thiago Macieira 2011-06-13 04:17:43 UTC
Review of attachment 42275 [details] [review]:

I agree with Will: why can't we close the fd when the refcount drops to zero?
Comment 11 Thiago Macieira 2011-06-13 04:18:39 UTC
Review of attachment 42276 [details] [review]:

Looks good, same review.
Comment 12 Thiago Macieira 2011-06-13 04:20:36 UTC
Review of attachment 43752 [details] [review]:

Looks ok.
Comment 13 Thiago Macieira 2011-06-13 04:23:34 UTC
Review of attachment 43753 [details] [review]:

Looks good, just one minor, pedantic comment.

::: bus/main.c
@@ +81,3 @@
+             * This is necessarily Unix-specific, but so are POSIX signals,
+             * so... */
+            static const char *message =

Pedantic: make this static const char message[].
Comment 14 Simon McVittie 2011-06-13 04:48:20 UTC
(In reply to comment #10)
> Review of attachment 42275 [details] [review]:
> 
> I agree with Will: why can't we close the fd when the refcount drops to zero?

I think that's the wrong way round: it's necessary that you stop watching a fd before you close it, but it's not necessarily true that you close a fd when you stop watching it (you could use blocking I/O, or spin on a manual _dbus_poll() - as I think we do in some places).

I suppose we *could* have a helper _dbus_tidy_up_socket (DBusLoop *loop, DBusWatch *watch, int *fd) which removes the watch from the loop if both are non-NULL, invalidates the watch, consumes one ref each (?) to the loop and the watch, closes *fd and sets it to -1 - but that seems rather magic, and a bit of a layering violation. It seems better to "say what you mean".

(In reply to comment #11)
> Review of attachment 42276 [details] [review]:
> 
> Looks good, same review.

Do you mean "why can't this be automatic"?

The remove part: _dbus_watch_invalidate doesn't take a DBusLoop (and it can't - you can use _dbus_watch_invalidate for a connection watch that's being handled by the GLib or Qt main loop), so invalidating a watch can't automatically remove it from main loop integration first. The assertion is basically there because we can't key the hash table in #23194 by fd if the fd has gone away, so we should check for it.

The unref part: modules other than the owner of the fd can also ref it, but the single owner of the fd (who is responsible for closing it) is also responsible for holding a ref that can be used to invalidate it, until after they have done so.

Given that, DBusWatch is probably only refcounted so that it can cope with re-entrancy in which the callback for a watch removes that same watch, e.g. when the other end of the socket hangs up.

(In reply to comment #13)
> ::: bus/main.c
> @@ +81,3 @@
> +             * This is necessarily Unix-specific, but so are POSIX signals,
> +             * so... */
> +            static const char *message =
> 
> Pedantic: make this static const char message[].

Any particular reason? But, OK.
Comment 15 Simon McVittie 2011-06-13 09:46:14 UTC
Fixed in git for 1.5.6.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.