Bug 84188 - Document kdbus in the D-Bus Specification
Summary: Document kdbus in the D-Bus Specification
Status: RESOLVED WONTFIX
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: 1.5
Hardware: All All
: medium enhancement
Assignee: Simon McVittie
QA Contact: D-Bus Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-22 14:55 UTC by Simon McVittie
Modified: 2016-10-04 22:03 UTC (History)
10 users (show)

See Also:
i915 platform:
i915 features:


Attachments
WiP: initial attempt at documenting kdbus in the D-Bus Specification (41.28 KB, patch)
2014-09-22 19:29 UTC, Simon McVittie
Details | Splinter Review
WiP v2: initial attempt at documenting kdbus in the D-Bus Specification (41.42 KB, patch)
2014-09-22 19:53 UTC, Simon McVittie
Details | Splinter Review
WiP v3: document kdbus in the D-Bus Specification (47.83 KB, patch)
2014-09-30 18:26 UTC, Simon McVittie
Details | Splinter Review
spec: in the table of types, classify them (6.65 KB, patch)
2014-10-01 14:43 UTC, Simon McVittie
Details | Splinter Review
spec: translate arbitrary limits into something more comprehensible (1.28 KB, patch)
2014-10-01 14:44 UTC, Simon McVittie
Details | Splinter Review
WiP v4: document kdbus in the D-Bus Specification (57.11 KB, patch)
2014-10-01 14:44 UTC, Simon McVittie
Details | Splinter Review

Description Simon McVittie 2014-09-22 14:55:56 UTC
kdbus needs to reserve a D-Bus message major version number in the D-Bus Specification: it uses "v2" messages with a GVariant payload, whereas stream-based D-Bus uses "v1" with a D-Bus payload. It would also make sense for the D-Bus Specification to describe the "kernel" transport, and the semantic differences between stream-based D-Bus and kdbus.

I'm currently working on a D-Bus Specification patch that does so, partly for the sake of having it documented, but partly as a way to spot any semantic changes that are likely to break existing D-Bus applications.

I know kdbus isn't finished yet, but it seems to have reached a somewhat steady state, and I certainly don't want to find that it has landed in a kernel.org kernel with incompatibilities that prevent it from being used for D-Bus :-)

Cc'ing an assortment of recent/major kdbus contributors for fact-checking and review.
Comment 1 Simon McVittie 2014-09-22 19:29:46 UTC
Created attachment 106690 [details] [review]
WiP: initial attempt at documenting kdbus in the D-Bus  Specification

---

Comments, corrections welcome.
Comment 2 Simon McVittie 2014-09-22 19:52:55 UTC
Comment on attachment 106690 [details] [review]
WiP: initial attempt at documenting kdbus in the D-Bus  Specification

Review of attachment 106690 [details] [review]:
-----------------------------------------------------------------

::: doc/dbus-specification.xml
@@ +3145,5 @@
> +
> +        <para>
> +          (FIXME: why isn't this more like
> +          kernel:uid=1000,bus=user,endpoint=bus and
> +          kernel:uid=0,bus=system,endpoint=bus?)

(This is between sd-bus, GDBus and libdbus, not a matter for the kernel side)

I think I'd prefer kernel:uid=1000,bus=user,endpoint=bus over kernel:path=/dev/kdbus/1000-user/bus, even if the implementation is in fact just "snprintf() them into a path". endpoint could default to "bus", bus could default to "user" and uid could default to getuid(); then the default session bus address would just be "kernel:;autolaunch:" which is about as short as it possibly can be.

I'm not sure why "kernel" and not "kdbus" given that the strings kdbus and KDBUS appear all over the place in the kdbus API.

@@ +3270,5 @@
> +          <!-- FIXME: how to convert existing libraries between
> +          stream-based D-Bus and kdbus without either breaking correct
> +          applications, or introducing vulnerabilities in correct
> +          applications? Can we do this without introducing parallel
> +          bus connections? -->

We discussed this a bit on the dbus and systemd mailing lists a while ago. I think the best anyone could think of at the time was adding a new GBusType (and analogous things, like DBusBusType) that means "connect to the system bus, and I promise I won't respond to method calls on this connection in a way that assumes /etc/dbus/system.d has already been applied by the dbus-daemon".

@@ +3299,5 @@
> +            SASL EXTERNAL on the unix: transport), feature-negotiation is done
> +            via flags fields in that ioctl, and instead of the Hello method
> +            returning the unique bus name, the ioctl returns a numeric
> +            unique ID, which can be converted into a unique bus name by
> +            writing it in ASCII decimal and prepending <literal>:1.</literal>.

The documentation in sd-bus and/or kdbus (sorry, I forget which) claims this is :0.%llu, but it's actually :1.%llu now.

@@ +3348,5 @@
> +            For general D-Bus messages, the kdbus payload-type field must be
> +            set to 0x4442757344427573 (corresponding to "DBusDBus" in ASCII)
> +            to indicate a D-Bus message. Special messages generated by the
> +            kernel use payload type zero instead. All other values are reserved
> +            for non-D-Bus payloads.

I must admit I'm not quite sure why something called "kdbus" supports non-D-Bus payloads other than the special kernel payloads... but maybe the (probably forever) hypothetical D-Bus 2.0 would want a different payload type.

@@ +3482,5 @@
> +          <para>
> +            A message with no arguments (empty signature) has an empty
> +            body (the serialization of a GVariant of type (), which is 0
> +            bytes long), but padding to an 8-byte boundary is still added
> +            before the zero-length body.

I haven't actually verified this, but as far as I can see it's true?

@@ +3491,5 @@
> +            padded to any particular alignment boundary.
> +          </para>
> +
> +          <para>
> +            The GVariant serialization format is not documented here.

Obviously that's not ideal. I started, but got confused about the precise details and stopped, and I don't think it's the highest-priority thing to document anyway: it doesn't affect semantics.

@@ +3523,5 @@
> +          </para>
> +
> +          <para>
> +            The general principle is that recipients must treat the
> +            kernel-checked kdbus message metadata as canonical. It is an

I don't know whether sd-bus and the GDBus branch actually do this. They should.

@@ +3538,5 @@
> +            flag for a sent message to contradict the absence or
> +            presence of the NO_REPLY_EXPECTED flag in the actual D-Bus
> +            message. The kernel will disallow replies or impose a timeout
> +            according to the value of the EXPECT_REPLY flag, and does
> +            not interpret the NO_REPLY_EXPECTED flag at all. Similarly,

The fact that the EXPECT_REPLY flag has a sense opposite the NO_REPLY_EXPECTED flag is really starting to annoy me... but I can see why (the NO_REPLY_EXPECTED flag really means "no reply expected even if this is a method call", and the kernel side of kdbus doesn't want/need to know about message types, only unicast-fire-and-forget vs unicast-expecting-reply vs unicast-reply vs broadcast).

I should mention the interaction with message types here, and also the interaction with broadcasts.

@@ +3647,5 @@
> +            clients use the KDBUS_CMD_NAME_ACQUIRE ioctl.
> +            Its semantics are similar, but the flags differ, and are
> +            not guaranteed to have corresponding numeric values.
> +            In particular, the DO_NOT_QUEUE flag is replaced by
> +            a QUEUE flag with the opposite sense.

This seems needlessly annoying. I don't not dislike double negatives as much as the next hacker, and I know you don't usually want to queue in practice; but when it seems fairly likely that kdbus will only ever be used to transport D-Bus, I feel as though inverting flags should come with some sort of justification beyond aesthetics.

@@ +3694,5 @@
> +          </para>
> +
> +          <para>
> +            There is no direct equivalent for StartServiceByName
> +            (<xref linkend="bus-messages-start-service-by-name"/>).

... or is there, and I just didn't spot it?

(Yes I know in practice nothing except systemd is likely to support being the user-space creator of a kdbus, and if you know you're talking to systemd then you can use systemd manager calls.)

@@ +3737,5 @@
> +            eavesdroppers. Broadcast messages, and eavesdropped messages,
> +            arrive in a particular consistent order chosen by the kernel,
> +            However, if unicast messages M1, M2 originating in different
> +            processes or threads arrive at their addressed destination in
> +            that order, eavesdroppers might see them in the opposite order.

I think this is true based on looking at the source, but I haven't tried it.

I don't *think* it's going to break real-world applications, but I could see it being a pain for debugging, if dbus-monitor or equivalent says one thing and the behaviour of your application says another.

@@ +3768,5 @@
> +          </para>
> +
> +          <para>
> +            FIXME: how does this interact with architectures which can run
> +            different-endian processes on one kernel (ARM, PowerPC,

I'd be interested to hear people's thoughts about this. (How do syscalls deal with it? Or can these architectures not actually share a kernel between kernel- and opposite-of-kernel-endian processes in practice, and if you want to be other-endian you have to virtualize?)

@@ +3770,5 @@
> +          <para>
> +            FIXME: how does this interact with architectures which can run
> +            different-endian processes on one kernel (ARM, PowerPC,
> +            possibly mips), or with qemu-user running a big-endian ABI
> +            on a little-endian host?

The qemu-user case is easier, because that way qemu-user does syscall translation, and it looks like a same-endianness-as-kernel process from the kernel's point of view.

@@ +3792,5 @@
> +          </para>
> +
> +          <para>
> +            In kdbus, eavesdropping is done by opening a special monitoring
> +            connection, which cannot send messages. Other connections cannot

This is basically what Colin wants dbus-daemon to do, and I approve in principle, we just haven't written the code yet.

@@ +3889,5 @@
> +            used to avoid confusing an old process with ID 1234 with a newer
> +            process that recycled process ID 1234. However, it cannot be
> +            used to distinguish between an unprivileged process, and a
> +            process that replaced that one via exec() (even if the latter
> +            is more privileged due to setuid or similar).

Might be false, but I looked at some kernel source code and it doesn't seem to be reset by exec().

@@ +3915,5 @@
> +            platform support exists and the unix: transport is used,
> +            and any platform-specific credentials for which race-free
> +            platform support exists).
> +            <!-- FIXME: is this metadata from connect-time or from send-time
> +            or a mixture? -->

I still need to check this.
Comment 3 Simon McVittie 2014-09-22 19:53:43 UTC
Created attachment 106691 [details] [review]
WiP v2: initial attempt at documenting kdbus in the D-Bus  Specification

---

Now with correct <footnote> syntax. No textual changes.
Comment 4 Simon McVittie 2014-09-30 18:26:16 UTC
Created attachment 107146 [details] [review]
WiP v3: document kdbus in the D-Bus Specification

---

Now with more writing about access-control.
Comment 5 Simon McVittie 2014-10-01 14:43:53 UTC
Created attachment 107194 [details] [review]
spec: in the table of types, classify them

---

This one is ready for merge, regardless of kdbus' status; it makes it completely clear which types are basic, fixed ("trivial" in systemd terminology) and/or containers.
Comment 6 Simon McVittie 2014-10-01 14:44:26 UTC
Created attachment 107195 [details] [review]
spec: translate arbitrary limits into something more  comprehensible

---

This one is ready to apply too.
Comment 7 Simon McVittie 2014-10-01 14:44:45 UTC
Created attachment 107196 [details] [review]
WiP v4: document kdbus in the D-Bus Specification
Comment 8 Lennart Poettering 2014-10-02 15:44:15 UTC
(In reply to Simon McVittie from comment #0)
> kdbus needs to reserve a D-Bus message major version number in the D-Bus
> Specification: it uses "v2" messages with a GVariant payload, whereas
> stream-based D-Bus uses "v1" with a D-Bus payload. It would also make sense
> for the D-Bus Specification to describe the "kernel" transport, and the
> semantic differences between stream-based D-Bus and kdbus.
> 
> I'm currently working on a D-Bus Specification patch that does so, partly
> for the sake of having it documented, but partly as a way to spot any
> semantic changes that are likely to break existing D-Bus applications.

Oh, this would be perfect if I don't have to do the spec work for this! Much appreciated!
Comment 9 Lennart Poettering 2014-10-02 16:09:54 UTC
(In reply to Simon McVittie from comment #2)

> > +
> > +        <para>
> > +          (FIXME: why isn't this more like
> > +          kernel:uid=1000,bus=user,endpoint=bus and
> > +          kernel:uid=0,bus=system,endpoint=bus?)
> 
> (This is between sd-bus, GDBus and libdbus, not a matter for the kernel side)
> 
> I think I'd prefer kernel:uid=1000,bus=user,endpoint=bus over
> kernel:path=/dev/kdbus/1000-user/bus, even if the implementation is in fact
> just "snprintf() them into a path". endpoint could default to "bus", bus
> could default to "user" and uid could default to getuid(); then the default
> session bus address would just be "kernel:;autolaunch:" which is about as
> short as it possibly can be.

The reason why I'd like to leave this as paths is because we actually have a concept of "domains" that can be stacked. Domains are used for giving containers their own set of busses, independent of the host's busses. A container manager can issue an ioctl to generate a new domain, which will create a new subdirectory in /dev/kdbus/, which the manager should then mount to /dev/kdbus in the container. Such a domain subdirectory looks pretty much like /dev/kdbus itself looks like after boot... Now, with this in mind, we want to keep the option open that programs from the host can freely connect to any domain, if they want to do so, by simply specifiying the full paths, including the domain/subdomain/subsubdomain/....

Of course, we could also come up with with a syntax for denoting domains, but since they can be stacked you'd then have to allow slashes in them, at which point we can just use the path for the full thing, no? (or alternatively, allow domain0=foo,domain1=bar,uid=1000,endpoint=waldo to denote /dev/kdbus/foo/bar/1000-waldo/bus, but I am not sure how much I like that.

Moreover, I think it should be OK for admins to bind mount a kdbus device node somewhere, and then simply point a tool to it, to connect to it. I wouldn't make it so hard to connect to arbitrary paths for this case...

> I'm not sure why "kernel" and not "kdbus" given that the strings kdbus and
> KDBUS appear all over the place in the kdbus API.

No strong opinion on this one. As you prefer... So far we never expose the string "kdbus" in any of sd-bus APIs (at least to my knowledge), simply because I found it so confusing to have identifiers named "kdbus" inside a project that implements "dbus" anyway...

> @@ +3270,5 @@
> > +          <!-- FIXME: how to convert existing libraries between
> > +          stream-based D-Bus and kdbus without either breaking correct
> > +          applications, or introducing vulnerabilities in correct
> > +          applications? Can we do this without introducing parallel
> > +          bus connections? -->
> 
> We discussed this a bit on the dbus and systemd mailing lists a while ago. I
> think the best anyone could think of at the time was adding a new GBusType
> (and analogous things, like DBusBusType) that means "connect to the system
> bus, and I promise I won't respond to method calls on this connection in a
> way that assumes /etc/dbus/system.d has already been applied by the
> dbus-daemon".

Yes, this is what I would suggest, and I think Ryan is onboard with this too.

> @@ +3348,5 @@
> > +            For general D-Bus messages, the kdbus payload-type field must be
> > +            set to 0x4442757344427573 (corresponding to "DBusDBus" in ASCII)
> > +            to indicate a D-Bus message. Special messages generated by the
> > +            kernel use payload type zero instead. All other values are reserved
> > +            for non-D-Bus payloads.
> 
> I must admit I'm not quite sure why something called "kdbus" supports
> non-D-Bus payloads other than the special kernel payloads... but maybe the
> (probably forever) hypothetical D-Bus 2.0 would want a different payload
> type.

Well, we currently have two payloads: the dbus one, and "0" for the kernel's own control messages (which do not use gvariant marshalling, but simply pass around structs).

The idea is that we want to keep the door a tiny bit open though to allow people to use the kernel infrastructure for a completely different IPC system, if they want to, one day... That said, we should not advertise that ever, at least at this point, because I am actually personally not interested in confusing the landscape with that now.

> @@ +3482,5 @@
> > +          <para>
> > +            A message with no arguments (empty signature) has an empty
> > +            body (the serialization of a GVariant of type (), which is 0
> > +            bytes long), but padding to an 8-byte boundary is still added
> > +            before the zero-length body.
> 
> I haven't actually verified this, but as far as I can see it's true?

messages are not padded, really. 

> 
> @@ +3491,5 @@
> > +            padded to any particular alignment boundary.
> > +          </para>
> > +
> > +          <para>
> > +            The GVariant serialization format is not documented here.
> 
> Obviously that's not ideal. I started, but got confused about the precise
> details and stopped, and I don't think it's the highest-priority thing to
> document anyway: it doesn't affect semantics.

I'd propose to simply include a reference to https://people.gnome.org/~desrt/gvariant-serialisation.pdf which I used as normative reference when implementing sd-bus.
> 
> @@ +3523,5 @@
> > +          </para>
> > +
> > +          <para>
> > +            The general principle is that recipients must treat the
> > +            kernel-checked kdbus message metadata as canonical. It is an
> 
> I don't know whether sd-bus and the GDBus branch actually do this. They
> should.

What precisely do you mean?
> 
> @@ +3538,5 @@
> > +            flag for a sent message to contradict the absence or
> > +            presence of the NO_REPLY_EXPECTED flag in the actual D-Bus
> > +            message. The kernel will disallow replies or impose a timeout
> > +            according to the value of the EXPECT_REPLY flag, and does
> > +            not interpret the NO_REPLY_EXPECTED flag at all. Similarly,
> 
> The fact that the EXPECT_REPLY flag has a sense opposite the
> NO_REPLY_EXPECTED flag is really starting to annoy me... but I can see why
> (the NO_REPLY_EXPECTED flag really means "no reply expected even if this is
> a method call", and the kernel side of kdbus doesn't want/need to know about
> message types, only unicast-fire-and-forget vs unicast-expecting-reply vs
> unicast-reply vs broadcast).
> 
> I should mention the interaction with message types here, and also the
> interaction with broadcasts.

Generally we tried to avoid "inverted" flags, i.e. normalize everything to "positive", unless there is a strong reason to do invert it.

> @@ +3647,5 @@
> > +            clients use the KDBUS_CMD_NAME_ACQUIRE ioctl.
> > +            Its semantics are similar, but the flags differ, and are
> > +            not guaranteed to have corresponding numeric values.
> > +            In particular, the DO_NOT_QUEUE flag is replaced by
> > +            a QUEUE flag with the opposite sense.
> 
> This seems needlessly annoying. I don't not dislike double negatives as much
> as the next hacker, and I know you don't usually want to queue in practice;
> but when it seems fairly likely that kdbus will only ever be used to
> transport D-Bus, I feel as though inverting flags should come with some sort
> of justification beyond aesthetics.

Besides the "inverted flags" issue discussed above: We have thought about this for a while, and what mattered most to me is that passing "0" as flags to the API is a suitable default value. And I am pretty strongly of the opinion that when 0 is passed as flags, this should not result in queing. Queing should be the exception, not the default. With this in place, of all places we invoke the call in in systemd we know *always* pass 0 as flags, and never make use of any of the other bits...
> 
> @@ +3694,5 @@
> > +          </para>
> > +
> > +          <para>
> > +            There is no direct equivalent for StartServiceByName
> > +            (<xref linkend="bus-messages-start-service-by-name"/>).
> 
> ... or is there, and I just didn't spot it?

No, this doesn't exist. But you could fake it by enqueing a Ping packet. I am not convinced we really need more than that (and explicit calls to systemd itself...). 

The bus-proxy we wrote as part of systemd, that translates dbus1 traffic into kdbus actually convertes StartServiceByName() into a Ping().

> @@ +3915,5 @@
> > +            platform support exists and the unix: transport is used,
> > +            and any platform-specific credentials for which race-free
> > +            platform support exists).
> > +            <!-- FIXME: is this metadata from connect-time or from send-time
> > +            or a mixture? -->
> 
> I still need to check this.

message metadata is send-time. connection meta data is from connection-time and hence cached, possibly out-of-date. THis is similar to SO_PEERCREDS and SCM_CREDS.
Comment 10 Lennart Poettering 2014-10-02 17:02:39 UTC
Comment on attachment 107194 [details] [review]
spec: in the table of types, classify them

Review of attachment 107194 [details] [review]:
-----------------------------------------------------------------

Looks good.
Comment 11 Lennart Poettering 2014-10-02 17:03:37 UTC
Comment on attachment 107195 [details] [review]
spec: translate arbitrary limits into something more  comprehensible

Review of attachment 107195 [details] [review]:
-----------------------------------------------------------------

Looks good too!
Comment 12 Lennart Poettering 2014-10-02 17:57:23 UTC
Comment on attachment 107196 [details] [review]
WiP v4: document kdbus in the D-Bus Specification

Review of attachment 107196 [details] [review]:
-----------------------------------------------------------------

Pretty good already!

::: doc/dbus-specification.xml
@@ +3115,5 @@
> +    <sect2 id="transports-kernel">
> +      <title>Linux kdbus (kernel-assisted D-Bus)</title>
> +
> +      <para>
> +        kdbus is an module for Linux providing kernel acceleration for D-Bus

"a module", not "an module"

@@ +3153,5 @@
> +            </thead>
> +            <tbody>
> +              <row>
> +                <entry>path</entry>
> +                <entry><literal>/dev/kdbus/</literal>UID<literal>-</literal>SUFFIX<literal>/</literal>ENDPOINT</entry>

Probably should mention the concept of "domains" here, that might make the paths longer.

@@ +3198,5 @@
> +          it is as follows:
> +        </para>
> +
> +        <!-- FIXME: talk about kdbus domains and how they interact with
> +        uid namespaces? -->

They don't interact with uid namespaces actually. Note that we called this "domain" instead of "namespace" to make sure that this is not considered just another kind of namespace. Because it isn't really, since the host sees the whole tree of domains, and the container manager always has to bind mount them on top of /dev/kdbus before they become useful.

I'd probably not discuss domains in the dbus spec too much, since they are mostly just an implementation detail to allow implementation of container managers, but normal userspace should not really have to deal with them, except for the exotic case where userspace wants to directly connect to a subdomain's busses... Hence, mention that they exist and might be part of the bus address, but don't get too specific.

(Note that directly connecting to subdomain busses is not too useful in real-life, since most likely the domain is actually used in conjunction with uid or pid namespacing. But if that's done, then the metadata on messages and connections will be suppressed, since we might not be able to translate uids/pids correctly (since the uids/pids might not exist in a peer's namespace).)

@@ +3268,5 @@
> +            message bus, has extensive access-control facilities. In practice,
> +            this access-control language is too complicated to get right
> +            consistently, and many of its features are not actually used.
> +            kdbus has a simpler and more realistic access-control scheme.
> +          </para>

AFAIK we currently don't document the XML policy language in the spec at all, but only in the man page? Sounds weird mentioning that it is different, but nor referencing any doc that standardises it?

@@ +3295,5 @@
> +            to the name-owner, but may not implement it themselves
> +            unless they also have OWN access; and lack of privileges
> +            indicates that the entity may not send messages to the name-owner
> +            unless they also have TALK access, but may still observe that
> +            it exists.

There' a "SEE" missing somewhere in the last sentence.

@@ +3425,5 @@
> +          <!-- FIXME: how to convert existing libraries between
> +          stream-based D-Bus and kdbus without either breaking correct
> +          applications, or introducing vulnerabilities in correct
> +          applications? Can we do this without introducing parallel
> +          bus connections? -->

For the system bus we can't. Applications (and the bus libraries they use) must be able to do their own per-interface/per-method/per-path access control, and if they don't we cannot allow them to connect to the system bus. This means, apps need to explicitly declare that they are willing to connect to an untrusted bus and take the responsibility for it, which is probably best done, by introducing a new bus connection type.

@@ +3445,5 @@
> +            SASL EXTERNAL on the unix: transport), feature-negotiation is done
> +            via flags fields in that ioctl, and instead of the Hello method
> +            returning the unique bus name, the ioctl returns a numeric
> +            unique ID, which can be converted into a unique bus name by
> +            writing it in ASCII decimal and prepending <literal>:1.</literal>.

Might be worth explaining that a connecting client must select which metadata fields it wants for incoming messages.

Also, the flags fields need to be explained in more detail, in particular that one halve is "incomptible" flags the other ones "compatible" ones.

@@ +3466,5 @@
> +            numeric values. There are flags for EXPECT_REPLY
> +            (note that this is the inverse of NO_REPLY_EXPECTED
> +            in the D-Bus protocol), for NO_AUTO_START, and for SYNC_REPLY
> +            to perform a synchronous call (which may violate message
> +            ordering, but allows some optimizations).

Maybe be more precise here, and say that only the reply is reordered, but the request is orderede as usual.

@@ +3528,5 @@
> +            messages where it would normally be mandatory.
> +            If present, it must match the least significant 32 bits
> +            of the kdbus cookie_reply field, and the remaining
> +            (more-significant) bits of the cookie_reply field must be zero.
> +            <!-- Does sd-bus enforce this? -->

No we don't. sd-bus treats the serial/cookie as 64bit generally, and exposes it as such in the API. However, it will only use 32bit of it on dbus1 transports, and there's no API to alter the serial/cookie explicitly, hence we should be safe.

Of course, if a library exposes the serial/cookie as 32bit value in the API, and some other peer sends a message whith a serial/cookie that doesn't fit in 32bit we have a problem, but I figure that should be easy enough: simply drop the message. sd-bus won't generate messages like this though.

@@ +3536,5 @@
> +            The DESTINATION header field is not mandatory in messages sent
> +            through kdbus. If present, it must match the dst_id,
> +            and the KDBUS_ITEM_DST_NAME if present.
> +            <!-- Does sd-bus enforce this? -->
> +          </para>

We always ignore the user header field, and override it with the kernel supplied value, if both are available.

@@ +3543,5 @@
> +            The SENDER header field is not mandatory in messages sent
> +            through kdbus, and senders should not add it. If present,
> +            it must match the src_id.
> +            <!-- Does sd-bus enforce this? -->
> +          </para>

Same here.

@@ +3827,5 @@
> +          <para>
> +            Similarly, instead of calling ReleaseName to release a
> +            well-known name (see <xref linkend="bus-messages-release-name"/>),
> +            clients send a special NAME_RELEASE message to the kernel.
> +          </para>

No, there's an ioctl for this (KDBUS_CMD_NAME_RELEASE). No message involved.

@@ +3848,5 @@
> +            In stream-based D-Bus, AddMatch and RemoveMatch can also
> +            control eavesdropping, but in kdbus, they do not; see below
> +            (<xref linkend="transports-kernel-eavesdropping"/>)
> +            for details of how eavesdropping changes in kdbus.
> +          </para>

I figure we need to explain the precise generation of the bloom filters here.

@@ +3862,5 @@
> +            <xref linkend="bus-messages-get-connection-selinux-security-context"/>,
> +            <xref linkend="bus-messages-get-connection-credentials"/>).
> +            Like those methods, it returns the credentials that were
> +            current at the time the connection was opened.
> +            <!-- FIXME: fact-check -->

THis is correct.

@@ +4249,5 @@
> +            In kdbus, a recipient can "peek" at the next message in the queue
> +            without de-queuing the message or accepting its attached file
> +            descriptors. In portable stream-based D-Bus, this is not possible.
> +            <!-- FIXME: is that a limitation of the portable subset of the
> +            Unix domain socket API, or of dbus-daemon, or what? -->

Limitation of the socket API really, since there's no way to block until a message is complete, after all this is a stream socket...

But of course, in real-life, the bus library could just read the message from the kernel, and queue it locally...

@@ +4270,5 @@
> +          </para>
> +
> +          <para>
> +            In kdbus, buses are divided into domains, analogous to network
> +            namespaces. In stream-based D-Bus, whether a container has a

Network namespaces are not recursive, but domains are. Hence they don't really comapre too closely.
Comment 13 Simon McVittie 2014-10-03 10:55:47 UTC
(In reply to Lennart Poettering from comment #8)
> Oh, this would be perfect if I don't have to do the spec work for this! Much
> appreciated!

You're welcome to use the patches on this bug as a starting point, but I'm unlikely to have time to do all the work on it; and there is the practical concern that every time I write a patch for dbus, there's a reasonable chance that it will sit in Bugzilla without review for a while. So I would really appreciate it if you could propose new wording for the bits where I've been ambiguous or wrong.

I do think that documenting kdbus in a form suitable for the D-Bus Specification should be a prerequisite for considering it to be stable.
Comment 14 Simon McVittie 2014-10-03 11:48:44 UTC
(In reply to Lennart Poettering from comment #9)
> > I think I'd prefer kernel:uid=1000,bus=user,endpoint=bus over
> > kernel:path=/dev/kdbus/1000-user/bus
>
> The reason why I'd like to leave this as paths is because we actually have a
> concept of "domains" that can be stacked. Domains are used for giving
> containers their own set of busses, independent of the host's busses.

That's a fair point.

I'd still be tempted to say that kdbus: should be an alias for kernel:path=/dev/kdbus/${getuid()}-user/bus (with the obvious substitution), so we can have static defaults like --with-dbus-session-bus-connect-address="kdbus:;autolaunch:".

> > I'm not sure why "kernel" and not "kdbus" given that the strings kdbus and
> > KDBUS appear all over the place in the kdbus API.
> 
> No strong opinion on this one.

Then I'd prefer kdbus, because that seems to be the name of the specific kernel facility we're using. After all, Unix domain sockets are also a kernel feature, and so are TCP sockets :-)

> The idea is that we want to keep the door a tiny bit open though to allow
> people to use the kernel infrastructure for a completely different IPC
> system, if they want to, one day... That said, we should not advertise that
> ever, at least at this point, because I am actually personally not
> interested in confusing the landscape with that now.

OK, I'd be fine with "All other values are reserved" or some such.

> > > +            A message with no arguments (empty signature) has an empty
> > > +            body (the serialization of a GVariant of type (), which is 0
> > > +            bytes long), but padding to an 8-byte boundary is still added
> > > +            before the zero-length body.
> > 
> > I haven't actually verified this, but as far as I can see it's true?
> 
> messages are not padded, really. 

I thought the serialization padding was the same as dbus-1. If we imagine sending two identical little-endian messages where the header fields are 5 bytes of "0x55" excluding the length of the array itself, and the body is 3 bytes of "0x33", then dbus-1 would have:

6c xx xx 01 03 00 00 00    fixed-length header (first 8 bytes)
xx xx xx xx 05 00 00 00    fixed-length header (last 4 bytes), array length
55 55 55 55 55 00 00 00    header fields, padding to 8-byte boundary
33 33 33                   body, *no* padding
6c xx xx 01 03 00 00 00    fixed-length header (first 8)
xx xx xx xx 05 00 00 00    fixed-length header (last 4), array length
55 55 55 55 55 00 00 00    header fields, padding to 8-byte boundary
33 33 33                   body, no padding

whereas a zero-length body would be:

6c xx xx 01 00 00 00 00    fixed-length header (first 8 bytes)
xx xx xx xx 05 00 00 00    fixed-length header (last 4 bytes), array length
55 55 55 55 55 00 00 00    header fields, padding to 8-byte boundary
                           0 bytes of body, no padding
6c xx xx 01 00 00 00 00    fixed-length header (first 8)
xx xx xx xx 05 00 00 00    fixed-length header (last 4), array length
55 55 55 55 55 00 00 00    header fields, padding to 8-byte boundary
                           0 bytes of body, no padding

If that's not true for kdbus, or if it's true but I've explained it poorly in the spec wording, please say what I should have written.

> > > +            The general principle is that recipients must treat the
> > > +            kernel-checked kdbus message metadata as canonical. It is an
> > 
> > I don't know whether sd-bus and the GDBus branch actually do this. They
> > should.
> 
> What precisely do you mean?

What I mean is: whenever redundant information is contradictory between the kdbus_msg or kdbus items and the payload's fixed-length header or header fields, implementations (sd-bus, GDBus, libdbus) should believe the kdbus bits, and either reject the message for being contradictory, or ignore the contradictory information in the payload.

For instance, if the kdbus_msg says the cookie is 0x0000000012345678 but bytes 7-11 of the fixed-length header in the payload say the serial number is 0xabcdef12, then the implementations should either reject that message (ignore or reply with error), or consider the serial number to be 0x12345678.

> > I should mention the interaction with message types here, and also the
> > interaction with broadcasts.

Still to be done. What I (or someone) should add here is basically:

If the kdbus metadata says a reply is expected, but it's a broadcast, then that's invalid.

If the kdbus metadata says a reply is expected, but byte 1 of the payload says it's a SIGNAL, METHOD_RETURN or ERROR, then that's also invalid.

> We have thought about
> this for a while, and what mattered most to me is that passing "0" as flags
> to the API is a suitable default value.

I don't think this is necessarily achievable, because it can only be true for flags that had been thought of when the API was invented. open() with O_CLOEXEC should have been the default too, but when open() was first defined, nobody had thought of O_CLOEXEC, which is why it's O_CLOEXEC, not an O_INHERIT with reversed sense.

So the decision point here is, is this a D-Bus flag that you're inheriting into kdbus, or a new flag that happens to correspond with reversed sense to a D-Bus flag?

If you're sure about this trade-off, I'm willing to be convinced, but my gut feeling is "when you're making a better transport for D-Bus, reversing some but not all of the negative flags from D-Bus is needless complication" :-)

> > > +            There is no direct equivalent for StartServiceByName
> > > +            (<xref linkend="bus-messages-start-service-by-name"/>).
> 
> No, this doesn't exist. But you could fake it by enqueing a Ping packet.

That seems reasonable. In general I prefer activating the Foo service implicitly by sending a message that is actually meaningful for Foo; generic D-Bus bindings obviously can't do that, but Ping is fine for that use-case.

> message metadata is send-time. connection meta data is from connection-time
> and hence cached, possibly out-of-date. THis is similar to SO_PEERCREDS and
> SCM_CREDS.

Thanks for clarifying.

> "a module", not "an module"

Yeah, it was originally "an out-of-tree module"

> Probably should mention the concept of "domains" here, that might make the
> paths longer.

I'd rather move the rules for composing paths into the text, then - the table formatting that the specification uses already makes the path wrap awkwardly.

> Note that we called this
> "domain" instead of "namespace" to make sure that this is not considered
> just another kind of namespace. Because it isn't really, since the host sees
> the whole tree of domains, and the container manager always has to bind
> mount them on top of /dev/kdbus before they become useful.

Any particular rationale for them working like this, and not like namespaces?

> I'd probably not discuss domains in the dbus spec too much
...
> mention that they exist and might be part of
> the bus address, but don't get too specific.

Reasonable.

> (Note that directly connecting to subdomain busses is not too useful in
> real-life, since most likely the domain is actually used in conjunction with
> uid or pid namespacing. But if that's done, then the metadata on messages
> and connections will be suppressed, since we might not be able to translate
> uids/pids correctly (since the uids/pids might not exist in a peer's
> namespace).)

Right, that's the sort of interaction I was worried about: "not keeping uid/pid namespaces in sync with the domain you're using leads to issues". (See also bind-mounting the system bus's Unix socket into a container, which would probably be A Bad Thing, or bind-mounting the system bus's Unix socket into a chroot, which might do what you want or be a terrible idea or both, because nobody really knows what the creator of a chroot is trying to achieve.)

> AFAIK we currently don't document the XML policy language in the spec at
> all, but only in the man page? Sounds weird mentioning that it is different,
> but nor referencing any doc that standardises it?

Yes, it's weird. It's a historical design flaw that the XML policy language is an implementation detail of the reference dbus-daemon, but at the same time, services that we consider to be correct rely on it for their security, and would have instant security flaws if it was taken away.

This patch is partly documentation for kdbus, and partially a way to get my thoughts into order about differences between kdbus and current D-Bus, and the ways in which those differences could break existing code. I started this work thinking that the potentially incompatible jump here was from stream-based D-Bus to kdbus-as-transport, but it's only partly that; it's also partly the jump from dbus-daemon to kdbus-as-message-bus, which would be equally problematic if it was a jump from dbus-daemon to some other implementation of the abstract "message bus" functionality described in the Specification, which does not specify security rules at all.

I think it's worth documenting that the security model is different, and a sketch of the new one (in particular, things it deliberately doesn't do, like parsing payloads), even though we never documented the old one here.

> > +            unless they also have OWN access; and lack of privileges
> > +            indicates that the entity may not send messages to the name-owner
> > +            unless they also have TALK access, but may still observe that
> > +            it exists.
> 
> There' a "SEE" missing somewhere in the last sentence.

What I was trying to get at here was the difference in rules between the "bus" endpoint and other endpoints, which could perhaps be summarized better as: on a "bus", every unique or well-known name implicitly has "SEE access by world" (and because of the additive nature of kdbus permissions, that cannot be revoked).

> Might be worth explaining that a connecting client must select which
> metadata fields it wants for incoming messages.

I think that's an implementation detail of the kdbus ioctls: the conceptual model is that every client sees everything, but as an optimization, not everything is delivered unless the client asks for it?

> Also, the flags fields need to be explained in more detail, in particular
> that one halve is "incomptible" flags the other ones "compatible" ones.

I think that's also an implementation detail of the kdbus ioctls, but if you or another kdbus expert wanted to propose wording, I wouldn't object.

> Maybe be more precise here, and say that only the reply is reordered, but
> the request is orderede as usual.

Wording welcome. I didn't want to get too far into implementation details here: the important thing, from my point of view, is that the normal "total order" and "causal order" guarantees can be violated by this API.

> No we don't. sd-bus treats the serial/cookie as 64bit generally, and exposes
> it as such in the API. However, it will only use 32bit of it on dbus1
> transports, and there's no API to alter the serial/cookie explicitly, hence
> we should be safe.

I think sd-bus (and other implementations) should either drop messages where the 64-bit kdbus cookie does not equal the 32-bit D-Bus serial in the payload (with high bits zero), or explicitly not provide access to the 32-bit D-Bus serial from the payload for kdbus messages, only the 64-bit kdbus cookie (and on stream-based D-Bus, behave in its APIs as if the 64-bit cookie was the 32-bit D-Bus serial with high bits zero-filled). It's possible that it does this already, I haven't dug into the implementation - but I think the spec needs to insist that implementations do it. Otherwise, we can get reliability or (maybe) even security bugs where an implementation is not consistent about which duplicate field it looks in.

> Of course, if a library exposes the serial/cookie as 32bit value in the API,
> and some other peer sends a message whith a serial/cookie that doesn't fit
> in 32bit we have a problem, but I figure that should be easy enough: simply
> drop the message.

> > +            The DESTINATION header field is not mandatory in messages sent
> > +            through kdbus. If present, it must match the dst_id,
> > +            and the KDBUS_ITEM_DST_NAME if present.
> > +            <!-- Does sd-bus enforce this? -->
> > +          </para>
> 
> We always ignore the user header field, and override it with the kernel
> supplied value, if both are available.

Good.

> > +            clients send a special NAME_RELEASE message to the kernel.
> 
> No, there's an ioctl for this (KDBUS_CMD_NAME_RELEASE). No message involved.

Noted.

> I figure we need to explain the precise generation of the bloom filters here.

Wording welcome.

> > +            <xref linkend="bus-messages-get-connection-selinux-security-context"/>,
> > +            <xref linkend="bus-messages-get-connection-credentials"/>).
> > +            Like those methods, it returns the credentials that were
> > +            current at the time the connection was opened.
> > +            <!-- FIXME: fact-check -->
> 
> THis is correct.

Thanks.

> > +            In kdbus, a recipient can "peek" at the next message in the queue
> 
> Limitation of the socket API really, since there's no way to block until a
> message is complete, after all this is a stream socket...

Well, in principle it could MSG_PEEK repeatedly, using the header to tell how many bytes ahead it would need to peek to have the whole thing; but my understanding is that that interacts poorly with SCM_RIGHTS.

> But of course, in real-life, the bus library could just read the message
> from the kernel, and queue it locally...

Not without accepting the fds, unfortunately.

> > +            In kdbus, buses are divided into domains, analogous to network
> > +            namespaces. In stream-based D-Bus, whether a container has a
> 
> Network namespaces are not recursive, but domains are. Hence they don't
> really comapre too closely.

Noted, better phrasing welcome.
Comment 15 Simon McVittie 2014-10-06 11:32:19 UTC
Comment on attachment 107194 [details] [review]
spec: in the table of types, classify them

(In reply to Lennart Poettering from comment #10)
> Review of attachment 107194 [details] [review]:
> 
> Looks good.

Applied that one for 1.9.2, thanks
Comment 16 Simon McVittie 2014-10-06 11:32:39 UTC
Comment on attachment 107195 [details] [review]
spec: translate arbitrary limits into something more  comprehensible

Applied for 1.9.2 too, thanks
Comment 17 Philip Withnall 2016-10-01 12:19:23 UTC
As kdbus is dead now, should this bug be closed?
Comment 18 Lennart Poettering 2016-10-04 22:03:01 UTC
Yes, I think so.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.