Bug 57952

Summary: regression tests should cover systemd activation
Product: dbus Reporter: Simon McVittie <smcv>
Component: coreAssignee: Simon McVittie <smcv>
Status: RESOLVED FIXED QA Contact: D-Bus Maintainers <dbus>
Severity: enhancement    
Priority: medium CC: chengwei.yang.cn, hp, lennart, msniko14, walters
Version: 1.5Keywords: patch
Hardware: All   
OS: All   
Whiteboard: review?
i915 platform: i915 features:
Bug Depends on: 88810    
Bug Blocks: 50199    
Attachments: [Fix] dbus-daemon will crash due to message rejection
Add a regression test for systemd activation

Description Simon McVittie 2012-12-06 17:56:39 UTC
+++ This bug was initially created as a clone of Bug #50199 +++

Bug #50199 is a crash in systemd activation, possibly triggered by, or at least related to, activated services timing out. I've never seen it happen, and presumably neither have any of the other D-Bus maintainers; this means that, in practice, it isn't going to get fixed until or unless we can put together a test case for it.

Bug #50962 was a missing feature that blocked use of systemd activation in user sessions. It is now implemented, and apparently works; but if we break it, "make distcheck" isn't going to catch it. Given that systemd system activation is used in basically every distribution except Ubuntu (it's even in Debian, albeit not used by default), and people are talking fairly seriously about systemd user activation too, it seems like a pretty important thing to test.

In each case we had, and still have, no regression test coverage. Sure, every user of Fedora, Arch, etc. covers the expected/common case every time they reboot - but it's the unexpected/error cases where bugs like #50199 lurk, and right now, we have no systematic testing for those.

Happily, systemd's interaction with dbus is a really easy thing to test - it's just a slightly special system bus client with a well-known name, that receives messages from dbus-daemon - so it's easy to set up whatever situations we want, by having the regression tests pretend to be systemd.

I think someone - based on past history, this probably means me - should write regression tests that use a mock systemd to cover as many code paths as is reasonable. At the very least, we should test the happy path (activation works correctly) and a few of the more obvious failure paths (systemd service doesn't exist, systemd service fails to start, etc.).

This is probably also a good way to find out what is actually going wrong in Bug #50199, and more importantly, confirm that we have fixed it - so I'm setting that as blocked by this.
Comment 1 Simon McVittie 2013-05-27 10:03:28 UTC
Created attachment 79838 [details] [review]
[Fix] dbus-daemon will crash due to message rejection

From: Chengwei Yang <chengwei.yang@intel.com>
Date: Sat, 25 May 2013 18:44:40 +0800

The issue goes in the below situation.

At system early boot up, systemd registers it's dbus service
"org.freedesktop.systemd1" in asynchronous manner since dbus-daemon
hasn't been activated. After dbus-daemon activated, the others may
access services registered to dbus and then dbus will try to activate
the acquired service if there is no owner of it, in addition, if the
owner of service is declaimed as a systemd bus activation service, dbus
will try to activate (in fact, just pending the request) systemd
(org.freedesktop.systemd1) if it's not around here. Otherwise, dbus just
create a new message (signal) and dispatch it to systemd.

The issues caused by the inconsistent way how dbus dispatch the
activation message to systemd1, if systemd is there, the message created
by dbus with connection is NULL, so no security checking in fact because
both systemd and dbus-daemon are running in 'root'. However, if the
orignal request sent by unprivileged process, the message created by
dbus will failed to pass security check due to the original connection
created by unprivileged process.

This some how isn't consistent in the two situations:
1. systemd is already here, the message created by dbus pass security check
2. systemd isn't here, the message created by dbus rejected

The worse situation is it will cause dbus-daemon coredumped later (time
out) because fail to pass assertion (reply_serial != 0), see below backtrace.

[... backtrace deleted ...]

[... more logs deleted ...]

Signed-off-by: Chengwei Yang <chengwei.yang@intel.com>
Comment 2 Simon McVittie 2013-05-27 10:50:50 UTC
Comment on attachment 79838 [details] [review]
[Fix] dbus-daemon will crash due to message rejection

Review of attachment 79838 [details] [review]:
-----------------------------------------------------------------

Your detailed analysis is welcome, but rather long as a commit message. Here is my attempt at something shorter, with even more analysis of why the crash happened. Please confirm whether this is correct and would be OK?

> Subject: when "activating" systemd, handle its special case better
>
> When dbus-daemon receives a request to activate a service before systemd
> has connected to it, it enqueues a fake request to "activate" systemd
> itself (as a way to get a BusPendingActivationEntry to track the
> process of waiting for systemd). When systemd later joins the bus,
> dbus-daemon sends the actual activation message; any future
> activation messages are sent directly to systemd.
>
> In the "pending" code path, the activation messages are currently
> dispatched as though they had been sent by the same process
> that sent the original activation request, which is wrong:
> the bus security policy probably doesn't allow that process
> to talk to systemd directly. They should be dispatched as though
> they had been sent by the dbus-daemon itself (connection == NULL),
> the same as in the non-pending code path.
>
> In the worst case, if the attempt to activate systemd
> timed out, the dbus-daemon would crash with a (fatal) warning,
> because in this special case, activation_message is a signal
> with no serial number, whereas the code to send an error
> reply is expecting a method call with a serial number.

It looks as though try_send_activation_failure() still doesn't handle the (entry->connection == NULL) case. As far as I can tell, it should probably be:

-    if (dbus_connection_get_is_connected (entry->connection))
+    if (entry->connection != NULL && dbus_connection_get_is_connected (entry->connection))

because if the connection is NULL, the message came from dbus-daemon itself (and is a signal, not a method call, so replying to it would be an error).

I think changing BusPendingActivationEntry.connection from "never NULL" to "NULL if adding a fake activation entry for systemd" deserves a comment, and the fact that the activation message is sometimes a signal (although not new) also deserves a comment. Something like this:

struct BusPendingActivationEntry
{
  /* Normally a method call, but if connection is NULL, this is a signal instead. */
  DBusMessage *activation_message;
  /* NULL if this activation entry is for the dbus-daemon itself,
   * waiting for systemd to start. In this case, auto_activation is always TRUE. */
  DBusConnection *connection;

  dbus_bool_t auto_activation;
};
Comment 3 Simon McVittie 2013-05-27 10:53:08 UTC
(In reply to comment #1)
> Created attachment 79838 [details] [review]
> [Fix] dbus-daemon will crash due to message rejection

(In reply to comment #2)
> Review of attachment 79838 [details] [review]

I attached these to the wrong bug. See Bug #50199 for these.
Comment 4 Simon McVittie 2015-01-26 20:44:45 UTC
Created attachment 112848 [details] [review]
Add a regression test for systemd activation

4.5 years after it was implemented, here is the regression test.

---

As written, this depends on Bug #88810, but with a bit of fuzz it should be able to apply without Bug #88810.
Comment 5 Philip Withnall 2015-02-02 12:37:52 UTC
Comment on attachment 112848 [details] [review]
Add a regression test for systemd activation

Review of attachment 112848 [details] [review]:
-----------------------------------------------------------------

::: test/data/systemd-activation/org.freedesktop.systemd1.service
@@ +1,3 @@
> +[D-BUS Service]
> +Name=org.freedesktop.systemd1
> +Exec=/bin/false

Might be worthwhile making this `/bin/false systemd1` for disambiguation purposes.

::: test/sd-activation.c
@@ +1,3 @@
> +/* Unit tests for systemd activation.
> + *
> + * Copyright © 2010-2011 Nokia Corporation

Copyright symbols are broken for me here, but this might just be Bugzilla or Splinter.
Comment 6 Simon McVittie 2015-02-02 19:16:01 UTC
(In reply to Philip Withnall from comment #5)
> Might be worthwhile making this `/bin/false systemd1` for disambiguation
> purposes.

Possibly... but I'd rather have this test do what the real systemd does, and this is what the real systemd does.

> Copyright symbols are broken for me here, but this might just be Bugzilla or
> Splinter.

Yes it is, the actual file is fine (and this is far from the first time I've seen Bugzilla/Splinter interpret UTF-8 as Latin-1).
Comment 7 Simon McVittie 2015-02-03 20:21:32 UTC
Fixed in git for 1.9.8

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.