Bug 100344

Summary: Add restricted, identifiable bus servers for use in containers
Product: dbus Reporter: Simon McVittie <smcv>
Component: coreAssignee: Simon McVittie <smcv>
Status: ASSIGNED --- QA Contact: D-Bus Maintainers <dbus>
Severity: enhancement    
Priority: medium CC: alexl, desrt
Version: git master   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 101898, 101899, 101902, 101903, 103457, 103458, 101257, 101354    
Bug Blocks:    

Description Simon McVittie 2017-03-22 17:53:43 UTC
In app-container projects like Flatpak, Snap, Firejail and AppImage, there's a need for a confined/contained/sandboxed app to be able to reach services on the normal session and system buses. However, unrestricted access to those buses is scary (and in particular, the session bus is not considered to be a security boundary, so various services there have APIs that are arbitrary code execution).

The motivating use cases right now are:

* Portals: Portal authors need to be able to identify whether the
  container is being contacted by an uncontained process (running with
  the user's full privileges), or whether it is being contacted by a
  contained process (in a container created by Flatpak or Snap).

* dconf Currently, a contained app either has full read/write access
  to dconf, or no access. It should have read/write access to its own
  subtree of dconf configuration space, and no access to the rest.

At the moment, Flatpak runs a D-Bus proxy for each app instance that has access to D-Bus, connects to the appropriate bus on the app's behalf, and passes messages through. That proxy is in a container similar to the actual app instance, but not actually the same container; it is trusted to not pass messages through that it shouldn't pass through. Portals identify the contained app by reading the proxy's `/proc/$pid/root/.flatpak-info`, which is Flatpak-specific. Reading through `/proc/$pid/root` like that also has an inherent race
condition: if the process exits at exactly the wrong time, and the process ID space (which is only 15 bits) wraps around such that an uncontained or differently-contained process gets the same process ID allocated for it, then the portal would see the differently-contained process's identity instead. Flatpak is effectively trusting the proxy not to exit at a sufficiently inconvenient time.

Meanwhile, Snap does its sandboxing with AppArmor, on kernels where it is enabled both at compile-time (Ubuntu, openSUSE, Debian, Debian derivatives like Tails) and at runtime (Ubuntu, openSUSE and Tails, but not Debian by default). Ubuntu's kernel has extra AppArmor features that haven't yet gone upstream, some of which provide reliable app identification via LSM labels, which `dbus-daemon` can learn by querying its `AF_UNIX` socket; however, other kernels like the ones in openSUSE and Debian don't have those. The access-control (AppArmor *mediation*) is implemented in upstream dbus-daemon, but again doesn't work portably: the kernel needs to have the necessary non-upstream features available, and they need to have been enabled at compile time and at runtime. The AppArmor rules are also somewhat inflexible: they are fixed at load time, and do not have access to a lot of useful domain-specific information. They're enough for rules like "allow talking to portals A, B and C", but not really enough for dconf's needs.

I've been discussing this with Allison Lortie and Alexander Larsson at the GTK Hackfest, and we've come up with a plan for obsoleting the Flatpak proxy by enhancing dbus-daemon.
Comment 1 Simon McVittie 2017-03-22 22:14:33 UTC
The plan so far:

Identity model
==============

tl;dr:
    o.fd.DBus.Containers.GetContainerInfo(s: bus_name) ->
        s: opaque_container_id,  # opaque, e.g. "47", or "" if uncontained
        s: container_type,       # e.g. "org.flatpak", or "" if uncontained
        s: app_id,               # e.g. "org.gnome.Recipes" under Flatpak
        a{sv}: metadata          # keys, values defined by Flatpak in this case

        # returns "", "", "", {} if the given bus_name is unconfined
        # raises an error if this dbus-daemon doesn't implement containers

    o.fd.DBus.Containers.ContainerRemoved(s: opaque_container_id)
        # Guaranteed not to be emitted until all unique names sharing the
        # container ID have fallen off the bus *and* no more can connect

Each user (uid) has some *uncontained* processes, plus 0 or more *containers*.

The uncontained processes include dbus-daemon itself, the desktop environment, the container managers like Flatpak and Snap, and so on. They have the user's full privileges, and in particular they are allowed to do privileged things on the user's session bus (like running dbus-monitor), and act with the user's full privileges on the system bus.

The containers are Flatpak apps, or Snap apps, or other app-container technologies like Firejail and AppImage, or even a mixture (different app-container technologies can coexist on a single system). They are containers and not "apps", because in principle, you could install com.example.MyApp 1.0, run it, and while it's still running, upgrade to com.example.MyApp 2.0 and run that; you'd have two containers for the same app, perhaps with different permissions.

Each container has an *app identifier*, consisting of a reversed DNS name like org.flatpak or io.snapcraft representing the container technology, and an arbitrary non-empty string whose meaning is defined by the container technology. For Flatpak, that string would be another reversed DNS name like com.example.MyGreatApp; for Snap, as far as I can tell it would look like example-my-great-app.

The container technology can also put arbitrary metadata on the D-Bus representation of a container, again defined and namespaced by the container technology. For instance, Flatpak would use some serialization of the same fields that go in the Flatpak metadata file at the moment.

Finally, the container has an opaque *container identifier* identifying a particular container instance.

We probably want the container identifier to be a string or object path, because "arg0" matching only works on strings; unless we enhance arg0 matching to also work on integer types and use a 64-bit integer.

Container server sockets (endpoints)
====================================

tl;dr:

    o.fd.DBus.Containers.AddContainerServer(
        s: container_type,           # e.g. "org.flatpak"
        s: app_identifier,           # e.g. "org.gnome.Recipes"
        a{sv}: metadata,             # defined by Flatpak in this case
        h: socket_to_accept,         # see below
        h: ready_notification,       # see below
        h: close_notification,       # see below
        a(usos): rules,              # see #Access control
    ) -> s: container_id             # opaque, e.g. "47"

    o.fd.DBus.GetConnectionCredentials(s: bus_name) -> a{sv}
        {
            ... all the current fields ...,
            "ContainerID": <"47">,
            "ContainerType": <"org.flatpak">,
            "ContainerApp": <"org.gnome.Recipes">,
        }

App-container managers like Flatpak and Snap create an AF_UNIX socket inside the container, bind() it to an address that will be made available to the contained processes, and listen(), but don't accept() any new connections. Instead, they fd-pass the new socket to the dbus-daemon by calling a new method, and the dbus-daemon uses a DBusServer to accept() connections after the app-container manager has signalled that it has called both bind() and listen().

Processes inside the container must not be allowed to contact the AF_UNIX socket used by the wider, uncontained system - if they could, the dbus-daemon wouldn't be able to distinguish between them and uncontained processes (except by using LSMs, which we can't rely on, or by using something with a race condition) and we'd be back where we started. Instead, they should have the new socket bind-mounted into their container's XDG_RUNTIME_DIR and connect to that, or have the new socket set as their DBUS_SESSION_BUS_ADDRESS and be prevented from connecting to the uncontained socket in some other way.

The current plan for the app-container to notify the dbus-daemon that it can actually start listening is to fd-pass the read end of a pipe, and close the write end (so the read end polls as readable) when the app-container has called bind() and listen(). This lets the container manager delegate the actual work to something like Flatpak's bubblewrap, so that the listening socket *only* exists inside the container, and has no presence outside. It isn't a D-Bus method call because we don't want to require bubblewrap (which is privileged and sometimes setuid, and operates in terms of simple syscalls rather than using GLib or similar) to implement D-Bus.

Similarly, the current plan for the app-container to notify the dbus-daemon that it should close the DBusServer is to fd-pass the read end of another pipe, and keep the write end open. When the app-container's supervisor process (pid 1 inside the bubblewrap sandbox used by Flatpak) exits, the write end is closed, the pipe polls as readable, and the dbus-daemon responds by closing the DBusServer. In particular this means that the kernel automatically tells us when the supervisor process exits, without the supervisor process needing to do anything D-Bus-specific (again, good for bubblewrap).

This is a lot like *endpoints* in the kdbus proposal, but we think the container manager should create the socket and fd-pass it in, rather than the central infrastructure (kernel in kdbus, dbus-daemon here) creating the socket and telling the container manager which one to use.

Along with the socket, the container manager passes in the container's identity and metadata, and the method returns a unique, opaque identifier for this particular container instance. The basic fields (container technology, technology-specific app ID, container ID) should probably be added to the result of GetConnectionCredentials(), and there should be a new API call (GetContainerInfo(), above) to get all of those plus the arbitrary technology-specific metadata.

When a process from a container connects to the contained server socket, every message that it sends should also have the container instance ID in a new header field. This is OK even though dbus-daemon does not (in general) forbid sender-specified future header fields (Bug #100317), because any dbus-daemon that supported this new feature would guarantee to set that header field correctly, and the existing Flatpak D-Bus proxy already rejects header fields not known to it in messages sent by a contained process.

The reasoning for using the sender's container instance ID (as opposed to the sender's unique name) is for services like dconf to be able to treat multiple unique bus names as belonging to the same equivalence class of contained processes: instead of having to look up the container metadata once per unique name, dconf can look it up once per container instance the first time it sees a new identifier in a header field. For the second and subsequent unique names in the container, dconf can know that the container metadata and permissions are identical to the one it already saw.

Finally, we need a signal for which dconf (or similar) can listen, to tell it that it can discard its cached container metadata for a particular container.

Access control
==============

tl;dr:
      a(usos) array of access-control rules
          u: flags, to include:
              (no flags): contained process may see the given bus name
                  in NameOwnerChanged, GetConnectionCredentials(), etc.
              OBJECT_PATH_IS_SUBTREE: /foo in object path field also matches
                  /foo/bar and /foo/bar/baz (but not /foolish)
              BUS_NAME_IS_SUBTREE: foo.bar in bus name field also matches
                  foo.bar.baz and foo.bar.baz.blah.blah.blah (but not
                  foo.barn)
              CALL_METHODS: contained process can call methods on
                  matching (bus name, object path, interface)
              RECEIVE_BROADCASTS: contained process can receive broadcasts
                  from matching (bus name, object path, interface)
              SEND_UNIX_FDS: method calls can contain file descriptors
              OWN_BUS_NAME: bus may own (or get in the queue for) the given
                  name, or the given names if BUS_NAME_IS_SUBTREE
          s: bus name, or "" for any bus name
              (actually matches any connection that owns or is in the queue
              for the given bus name, as with match rules)
          o: object path, or "/" with OBJECT_PATH_IS_SUBTREE for any
          s: interface, or (most commonly) "" for any interface

    o.fd.Containers.GrantAccess(a(uos): rules)
          Rules are as above, but bus name is implicitly the caller's
          unique name

    o.fd.Containers.RevokeAccess(a(uos): rules)
          Rules are as for GrantAccess. If a rule was added more than
          once, each revocation removes one of the identical copies.

Messages from containers to the outside world will be mediated by a new access control mechanism, in parallel with dbus-daemon's current support for firewall-style rules in the XML bus configuration, AppArmor mediation, and SELinux mediation. A message would only be allowed through if the XML configuration, the new container access control mechanism, and the LSM (if any) all agree it should be allowed.

By default, processes in a container can send broadcast signals, and send method calls and unicast signals to other processes in the same container. They can also receive method calls and unicast signals from outside the container (so that interfaces like `org.freedesktop.Application` can work), and send exactly one reply to each of those method calls. They cannot own bus names, communicate with other containers, or receive broadcasts.

Obviously, that's not going to be enough for a lot of contained apps, so we need a way to add more access. I'm intending this to be purely additive (start by denying everything except what is always allowed, then add new rules), not a mixture of adding and removing access like the current XML policy language.

There are two ways we've identified for rules to be added:

* The container manager can pass a list of rules into the dbus-daemon
  at the time it attaches the contained server socket, and they'll be
  allowed. The obvious example is that an `org.freedesktop.Application`
  needs to be allowed to own its own bus name. Flatpak apps'
  implicit permission to talk to portals, and Flatpak metadata
  like org.gnome.SessionManager=talk, could also be added this way.

* System or session services that are specifically designed to be used by
  untrusted clients, like the version of dconf that Allison is working
  on, can opt-in to having contained apps allowed to talk to them
  (effectively making them a generalization of Flatpak portals).
  The simplest such request, for something like a portal,
  is "allow connections from any container to contact this service"; but
  for dconf, we want to go a bit finer-grained, with all containers
  allowed to contact a single well-known rendezvous object path, and each
  container allowed to contact an additional object path subtree that is
  allocated by dconf on-demand for that app. Probably these rules won't
  contain the bus name; instead, the unique name of the caller is
  used, and the BUS_NAME_IS_SUBTREE flag is not allowed.

Initially, many contained apps would work in the first way (and in
particular sockets=session-bus would add a rule that allows everything),
while over time we'll probably want to head towards recommending more use
of the second.

Flatpak's SEE (which was itself borrowed from kdbus) is a rule with neither CALL_METHODS nor RECEIVE_BROADCASTS.

Flatpak's TALK (also borrowed from kdbus) is a rule with CALL_METHODS and RECEIVE_BROADCASTS.

Flatpak's OWN (ditto) is a call with CALL_METHODS, RECEIVE_BROADCASTS and OWN_BUS_NAME.

If container A owns a bus name, container B can be given the ability to contact container A by being given CALL_METHODS and RECEIVE_BROADCASTS access to that name. (TODO: do A or B or both also need a CONTACT_OTHER_CONTAINER flag to be allowed to do this, or is bus-name-based control enough?)

Access to org.freedesktop.DBus is special-cased. In general, "query" methods like GetNameOwner() are always allowed, but might return error if the container is not allowed to know the answer (we will have to be careful to return the same error whether the service exists or not), while privileged methods like o.fd.DBus.UpdateActivationEnvironment() are unconditionally denied. There are probably other cases (see Flatpak's proxy source code) but I can't think of them right now.

TODO: Flatpak's proxy has a special case where if a peer outside the container or in another container calls a method on a contained service, the contained service automatically gets SEE access to the unique name (but possibly only the unique name?) of the peer outside the container, so that the contained service can use NameOwnerChanged to track when it goes away. Do we also want this?
Comment 2 Simon McVittie 2017-03-23 16:34:53 UTC
(In reply to Simon McVittie from comment #1)
> We probably want the container identifier to be a string or object path,
> because "arg0" matching only works on strings; unless we enhance arg0
> matching to also work on integer types and use a 64-bit integer.

It occurs to me that D-Bus already has a way to hand out opaque identifiers: use an object path. We could use "/c47" or "/org/freedesktop/DBus/Containers/47" or some such.

>     o.fd.DBus.Containers.ContainerRemoved(s: opaque_container_id)
>         # Guaranteed not to be emitted until all unique names sharing the
>         # container ID have fallen off the bus *and* no more can connect

This has the usual change-notification/state-recovery race condition. Clients like dconf would have to either bind to this globally (no use of arg0 matching), or bind to it with an arg0 match and then call some method on the path, or some method that took the path as an argument, to verify that it was still there.

Allison, which of those would you prefer to do in dconf?
Comment 3 Simon McVittie 2017-03-23 16:45:23 UTC
(In reply to Simon McVittie from comment #1)
> Messages from containers to the outside world will be mediated by a new
> access control mechanism

The justification for why we want this:

Services like dconf send broadcast signals. We don't necessarily want those signals to go to every contained app, because if they do, they will leak information about what other apps are installed/active (good for fingerprinting, bad for privacy).

Fixing that at the service side is quite hard: the service would essentially have to reimplement the whole match rule and broadcast mechanism, and send multiple unicasts. Allison quite reasonably doesn't want to implement that in dconf. At some point she looked into adding kdbus-style multicast, with an array of recipients in a message header, but that turns out to be harder than it sounds. So we want to continue to send broadcasts, and have the dbus-daemon mediate who can receive them.

Having introduced an access control mechanism for receiving broadcasts, we might as well also use it to mediate sending method calls.

We will also want a way for the container manager to tell the dbus-daemon which services the contained app is allowed to access even if those services are not themselves container-aware - if we don't, then we can't replicate Flatpak's current functionality. This might as well be the same mechanism that's used by services that "opt in" to being container-accessible.

We don't want to use a separate "untrusted bus" because if we did that, we'd lose guaranteed ordering, the global namespace for bus names, and other nice properties of the system and session buses.
Comment 4 Simon McVittie 2017-03-24 15:54:41 UTC
(In reply to Simon McVittie from comment #1)
>     o.fd.DBus.Containers.AddContainerServer(
>         s: container_type,           # e.g. "org.flatpak"
>         s: app_identifier,           # e.g. "org.gnome.Recipes"
>         a{sv}: metadata,             # defined by Flatpak in this case
>         h: socket_to_accept,         # see below
>         h: ready_notification,       # see below
>         h: close_notification,       # see below
>         a(usos): rules,              # see #Access control
>     ) -> s: container_id             # opaque, e.g. "47"

Resource limiting
=================

We probably want an extra a{sv} argument for extending this API.

A maximum number of connections to this server is example we discussed at the hackfest for something to put in that a{sv}. If absent, there could be some sensible default configurable as a limit in system.conf and session.conf (perhaps default_connections_per_container_server). It would also be capped at max_connections_per_container_server, another configurable limit (we'd use the caller-specified value or the maximum, whichever is lower).

Each container server should cost its initiator one "slot" in their connections-per-uid quota, to represent the fd used by the server itself (or possibly two slots, to account for the close notification). Each connection to the container server should also cost one "slot".

This would also obsolete Bug #82346 by giving us a coherent story for how contained processes are prevented from DoS'ing the bus.
Comment 5 Simon McVittie 2017-03-24 15:58:09 UTC
The ready_notification pipe could potentially move into the a{sv}; in that case, if absent, we would assume the socket is already ready to accept().

The close_notification pipe could potentially move into the a{sv}; in that case, if absent, we would assume the socket is valid until the caller's unique name falls off the bus.

We discussed whether keys in the a{sv} should be treated as critical (dbus-daemon rejects an a{sv} with unknown keys) or non-critical (dbus-daemon silently accepts and ignores unknown keys) or whether we should have two maps, without really coming to a conclusion. Thinking about it some more, I think they should be treated as critical, and we should offer capability discovery so callers can know what can be in the map. We already have to have capability discovery anyway, so callers can know which flags they can use in their access control rules.

We probably want either D-Bus properties with GetAll(), or getter methods, on the Containers interface for:

* which keys are supported in the extension a{sv}
* which flags are supported in the access control rules

It might be finally time to teach dbus-daemon to implement o.fd.Properties for this purpose.
Comment 6 Simon McVittie 2017-03-24 15:58:29 UTC
(In reply to Simon McVittie from comment #4)
> This would also obsolete Bug #82346

and Bug #81469
Comment 7 Simon McVittie 2017-03-24 16:10:58 UTC
(In reply to Simon McVittie from comment #4)
> >         a(usos): rules,              # see #Access control

In fact... these could move into the extensibility a{sv} too!

That would give us an obvious intermediate step where we have the private sockets but they do not actually restrict, which in turn would give us reliable identification for Flatpak's proxies. I like this from the point of view of combining "any prefix of the patch series should be something we could merge" with "I shouldn't have to write a single giant commit".

Another thing we might want in the extensibility a{sv}:

VirtualizeMachineId: string

If given, rewrite all successful responses to org.freedesktop.Peer.GetMachineId() so they contain this string instead. (This ties in to proposals I've seen for containers to use a virtualized machine ID per container.)

or maybe:

HideMachineId: boolean

If given and true, all attempts to call org.freedesktop.Peer.GetMachineId() will fail with "access denied".
Comment 8 Philip Withnall 2017-03-24 16:16:07 UTC
(In reply to Simon McVittie from comment #1)
> Identity model
> ==============
>
> Each user (uid) has some *uncontained* processes, plus 0 or more
> *containers*.

Are you sure you want to use the term ‘containers’ here? It could get confused with ‘container technology’ in the sense that ‘flatpak is a container’, ‘firejail is a container’, etc. How about ‘app instances’ or ‘app containers’?

> unless we enhance arg0
> matching to also work on integer types and use a 64-bit integer.

That sounds like a generally good idea anyway. Although the GDBus API for arg0 matching at the moment is quite gchar* specific, so this might not be trivial. So yes, perhaps just add it if it’s actually needed for container IDs.

(In reply to Simon McVittie from comment #5)
> We discussed whether keys in the a{sv} should be treated as critical
> (dbus-daemon rejects an a{sv} with unknown keys) or non-critical
> (dbus-daemon silently accepts and ignores unknown keys) or whether we should
> have two maps, without really coming to a conclusion. Thinking about it some
> more, I think they should be treated as critical, and we should offer
> capability discovery so callers can know what can be in the map. We already
> have to have capability discovery anyway, so callers can know which flags
> they can use in their access control rules.
> 
> We probably want either D-Bus properties with GetAll(), or getter methods,
> on the Containers interface for:
> 
> * which keys are supported in the extension a{sv}
> * which flags are supported in the access control rules

Making the keys critical and exposing the supported set of keys in a property or getter seems sensible. It avoids the horrible potential future which we are experiencing with the header fields (bug #100317). I’m on the fence about o.fd.D.Properties vs getter methods. It seems a little bit overkill to implement o.fd.D.Properties just for two read-only properties.
Comment 9 Simon McVittie 2017-03-24 16:20:07 UTC
(In reply to Simon McVittie from comment #1)
> Access to org.freedesktop.DBus is special-cased. In general, "query" methods
> like GetNameOwner() are always allowed, but might return error if the
> container is not allowed to know the answer (we will have to be careful to
> return the same error whether the service exists or not), while privileged
> methods like o.fd.DBus.UpdateActivationEnvironment() are unconditionally
> denied. There are probably other cases (see Flatpak's proxy source code) but
> I can't think of them right now.

Hello: allowed.

RequestName, ReleaseName, ListQueuedOwners: contained caller must be allowed to own the given name. If not, they get an error; this must be checked before checking whether the name is already owned.

AddMatch, RemoveMatch: always allowed, but eavesdropping is forbidden, and broadcasts from outside the app-bundle only arrive if they match the match-rule and also the ACL.

ListNames: reply is filtered to only contain names that the caller is allowed to see.

ListActivatableNames: reply is filtered to only contain names that the caller is allowed to see.

NameHasOwner, GetNameOwner, NameOwnerChanged, StartServiceByName, GetConnectionWhatever: contained caller must be allowed to see the name.

UpdateActivationEnvironment: always forbidden

GetId: always forbidden until we find an app that uses it for anything :-)

Monitoring.BecomeMonitor: always forbidden

Stats.*: always forbidden
Comment 10 Simon McVittie 2017-03-24 16:27:24 UTC
(In reply to Philip Withnall from comment #8)
> Are you sure you want to use the term ‘containers’ here? It could get
> confused with ‘container technology’ in the sense that ‘flatpak is a
> container’, ‘firejail is a container’, etc.

Well, they aren't. :-P Flatpak isn't a container, it's something that runs/creates/manages containers... in the same way that qemu isn't a virtual machine and Microsoft Word isn't a document.

I do get your point though, and my initial writeup had "container instance [ID]" until I decided that might be too verbose. Perhaps I should go back to that.

> > unless we enhance arg0
> > matching to also work on integer types and use a 64-bit integer.
> 
> That sounds like a generally good idea anyway. Although the GDBus API for
> arg0 matching at the moment is quite gchar* specific, so this might not be
> trivial. So yes, perhaps just add it if it’s actually needed for container
> IDs.

I'm leaning towards using an object path, because we keep telling users of D-Bus that this is how they should identify their opaque objects.

> Making the keys critical and exposing the supported set of keys in a
> property or getter seems sensible. It avoids the horrible potential future
> which we are experiencing with the header fields (bug #100317).

Right.

> I’m on the
> fence about o.fd.D.Properties vs getter methods. It seems a little bit
> overkill to implement o.fd.D.Properties just for two read-only properties.

I do see your point, but I think this might be a JFDI situation (much like when we added support for making the dbus-daemon implement more than one interface). I don't expect it to be all that difficult, and it seems silly that we strongly recommend Properties to D-Bus API designers but aren't willing/able to use it in our own APIs.
Comment 11 Simon McVittie 2017-04-05 16:54:27 UTC
(In reply to Simon McVittie from comment #1)
>     o.fd.DBus.Containers.GetContainerInfo(s: bus_name) -> ...

Better as GetConnectionContainer(), consistent with GetConnectionEverythingElse().
Comment 12 Simon McVittie 2017-04-06 11:48:47 UTC
(In reply to Simon McVittie from comment #1)
> We probably want the container identifier to be a string or object path,
> because "arg0" matching only works on strings

I think an object path would make sense, then dconf would use arg0path if desired.

(In reply to Simon McVittie from comment #11)
> (In reply to Simon McVittie from comment #1)
> >     o.fd.DBus.Containers.GetContainerInfo(s: bus_name) -> ...
> 
> Better as GetConnectionContainer(), consistent with
> GetConnectionEverythingElse().

That way, we could also have GetContainerInstanceInfo(o: path) -> ... if we wanted to.

(In reply to Simon McVittie from comment #2)
> >     o.fd.DBus.Containers.ContainerRemoved(s: opaque_container_id)
> >         # Guaranteed not to be emitted until all unique names sharing the
> >         # container ID have fallen off the bus *and* no more can connect
> 
> This has the usual change-notification/state-recovery race condition.
> Clients like dconf would have to either bind to this globally (no use of
> arg0 matching), or bind to it with an arg0 match and then call some method
> on the path, or some method that took the path as an argument, to verify
> that it was still there.

Actually no, but it's a bit subtle: if dconf had this sequence

* receive message from contained client :1.23 with CONTAINER_INSTANCE='/c47'
  in its message header
* add match for ContainerRemoved with arg0path='/c47'
* call GetContainerInstanceInfo('/c47') or GetConnectionContainer(':1.23')

then it could infer that success from the method call implies that the container has not yet been removed, whereas an error "no such container instance" or "no such name" from the method call implies that the container-instance has gone away (because we guarantee that the container-instance lives as long as the DBusServer or the longest-lived connection to it, whichever is longer).
Comment 13 Simon McVittie 2017-06-09 17:32:09 UTC
(In reply to Simon McVittie from comment #1)
> App-container managers like Flatpak and Snap create an AF_UNIX socket inside
> the container, bind() it to an address that will be made available to the
> contained processes, and listen(), but don't accept() any new connections.
> Instead, they fd-pass the new socket to the dbus-daemon by calling a new
> method, and the dbus-daemon uses a DBusServer to accept() connections after
> the app-container manager has signalled that it has called both bind() and
> listen().

Unfortunately, this is really annoying to do safely, even without the annoyance of getting bwrap to tell us the socket is ready to be accept()ed without it having to have a persistent D-Bus connection.

We probably want AddContainerServer() to be considered an unprivileged operation, because Flatpak will want to use it on the system bus, and runs as an ordinary user.

However, if the caller is responsible for bind()ing and listen()ing, the only way the dbus-daemon can tell that a malicious caller hasn't done that is when accept() raises EINVAL. Non-listening AF_UNIX sockets poll as readable (at least on Linux 4.9), so current dbus-daemon will busy-loop trying to accept() repeatedly from a socket that will never work.

We can resolve that by having a concept of errno values that are fatal to the DBusServer, and forcibly disconnecting the DBusServer when accept() raises one of those... but, ugh.

I'm probably going to make this Linux-specific for the moment, because Flatpak and Snap both are, and this is all really subtle even without accounting for non-Linux.
Comment 14 Simon McVittie 2017-06-20 11:15:36 UTC
(In reply to Simon McVittie from comment #13)
> (In reply to Simon McVittie from comment #1)
> > App-container managers like Flatpak and Snap create an AF_UNIX socket inside
> > the container, bind() it to an address that will be made available to the
> > contained processes, and listen(), but don't accept() any new connections.
> 
> Unfortunately, this is really annoying to do safely, even without the
> annoyance of getting bwrap to tell us the socket is ready to be accept()ed
> without it having to have a persistent D-Bus connection.

For Bug #101354, I think I'm going to do the lowest-common-denominator thing: the dbus-daemon is responsible for listening on a socket in some suitable location (/run/dbus/containers if started as uid 0, or $XDG_RUNTIME_DIR/dbus-1/containers if set, or the feature just doesn't work), and return the resulting path as a bytestring and address as a string.

We can use the "named arguments" dict to do the fd-passing trick that Allison proposed, as a separate add-on; in that case we'd probably want to return an empty bytestring. We could still return a non-empty address, because DBusServer knows how to do that.
Comment 15 Simon McVittie 2017-07-21 13:46:07 UTC
desrt, pwithnall: Let's use this bug for general conceptual design, Bug #101354 for the specifics of the minimal implementation, and separate bugs for things beyond the minimum (message filtering, fd-passing cleverness) to try to stop one bug becoming monstrously large.

System bus, metadata and GetConnectionCredentials
=================================================

In <https://bugs.freedesktop.org/show_bug.cgi?id=101354#c51> I worried about the scope and trustworthiness of the metadata when on the system bus.

At the moment, a container instance has metadata like this (GVariant notation):

    (
        /* type (container technology) */
        "org.flatpak",
        /* name (some sort of app ID, in this case a Flatpak app ID) */
        "org.gnome.Weather",
        /* metadata (a{sv} defined by, in this case, Flatpak) */
        {
            "System Bus Policy": <{
                "org.freedesktop.Geoclue2": "talk",
            }>,
            "Application": <{
                "name": <"org.gnome.Weather">,
                "runtime": <"runtime/org.gnome.Platform/x86_64/3.24">,
            }>,
            ... variant translations of the rest of the .flatpak-info file ...
        },
    )

of which only the type and name are copied into GetConnectionCredentials.

On the session bus, this is absolutely fine: if you have the technical ability to connect to a session bus and create a container instance, then you already have arbitrary code execution under the session bus owner's uid (e.g. by telling their systemd --user instance to run whatever you want it to). Therefore there is no point in distrusting a container manager.

However, let's suppose Alice and Bob share a system. Alice is running something that claims to be Flatpak, which launched a container instance /.../c1 on the *system* bus, claiming that it is for org.gnome.Weather. The assertion "this server is for org.flatpak/org.gnome.Weather" is not necessarily really true: the only thing we can say for sure is "one of Alice's processes, that is not in a container, claims this server is for org.flatpak/org.gnome.Weather".

Suppose a client communicates with a hypothetical future version of Geoclue2 that has been enhanced to use this API. Geoclue2 calls GetConnectionCredentials on it, which says the client has Alice's uid and is in Alice's container instance /.../c1. This seems fine: Geoclue2 can reasonably assume that it is some app-container that Alice wants to treat as being the Flatpak app org.gnome.Weather. Geoclue2 must give it privileges equal to or less than those that it would give to a connection from Alice that is not in any container. This might include using a Geoclue /agent/ to ask Alice, via her UI session: "GNOME Weather wants to see your location. Allow? y/n".

Now suppose Geoclue2 gets a connection from a second client, which is Bob's uid but is somehow coming from /.../c1. This would have to be one of Bob's processes that has somehow connected to the wrong server, either by being tricked or deliberately. What is going on here, and what should Geoclue2 do? Should it ask Alice, via her UI session, because this is one of her container instances? Or should it ask Bob, via his UI session? If it asks Bob, then it would be misleading to say "GNOME Weather wants to see your location", because neither Geoclue2 nor Bob knows for sure that this is really GNOME Weather: we only have Alice's word for it. It might actually be some other app trying to spy on Bob (com.facebook?), and trying to trick Bob with Alice's help.

Note that the fd-passing cleverness that is missing from Bug #101354 would not help us here: Alice created /.../c1, so if Alice has control over its bind() call, she is free to put it somewhere that will be accessible by Bob's maybe-malicious process, for example an abstract AF_UNIX socket.

One possible answer is that the tuple identifying /.../c1 should additionally contain the uid that initiated /.../c1, like this:

    (
        uint32 1000, /* initiating uid = Alice */
        "org.flatpak", /* Alice claims this is a Flatpak */
        "org.gnome.Weather", /* Alice claims this is Weather */
        { ... },
    )

or even, as Philip suggested, the complete GetConnectionCredentials() result for the initiator (container manager), like this:

    (
        {
            "UnixUserID": <uint32 1000>, /* initiating uid = Alice */
            "ProcessID": <uint32 42>,
            "LinuxSecurityLabel": <b"/usr/bin/flatpak (enforce)">,
        },
        "org.flatpak", /* Alice claims this is a Flatpak */
        "org.gnome.Weather", /* Alice claims this is Weather */
        { ... },
    )

... and then either Geoclue2 or Bob's UI agent would compare the initiator with the connection being considered, and if there is a mismatch, say "an unknown app" or reject the request entirely? If we went this route, we would probably have to omit the type and uid from GetConnectionCredentials(), because in this world-view they are no use (and potentially actively harmful) when not used in conjunction with the container instance initiator's identity, and putting the container instance initiator's identity in GetConnectionCredentials() seems easy to confuse with the actual connection's identity.

Another possible answer is that the dbus-daemon should not let Bob connect to Alice's server at all, because Alice "owns" that server (this might seem arbitrary, but the concept of a server being "owned" by a uid is exactly how the session bus has always worked). That is what is implemented in Bug #101354 out of conservatism/paranoia - we can add a named parameter to make it less strict later on if we want one, but we can't make the default more strict while remaining compatible. However, if we want to be able to use this mechanism for things that are less app-oriented than Flatpak (privileged Snaps? Docker containers?) then we might indeed want to open it up to other uids later, at which point we will need a way to distinguish between "Alice claims this is io.docker/my:awesome:app but you maybe shouldn't believe her" and "root claims this is io.docker/my:awesome:app and you can be confident that it's true".

A third possible answer is that the flag that makes the server accept connections from other uids should be a privileged action (only available to root or messagebus), analogous to the default behaviour of the allow_other mount option for FUSE filesystems.

A fourth possibility is to say that creating one of these servers is a privileged action (only root or the bus owner can do it), and if Flatpak apps want to use system services like Geoclue or NetworkManager, they must do so via a portal on the session bus (which presents its own "allow or not?" UI, re-submits the request to Geoclue as though it had originated outside the container if allowed, and then acts as a Geoclue agent that will silently allow the resubmitted request instead of letting the normal agent re-prompt, or something). However, this requires special code to be written for every system service that the apps will interact with; notably, Flatpak's bus proxy does not require that.

Do you have any cleverer ideas about this?
Comment 16 Simon McVittie 2017-07-24 14:36:01 UTC
(In reply to Simon McVittie from comment #13)
> (In reply to Simon McVittie from comment #1)
> > the dbus-daemon uses a DBusServer to accept() connections after
> > the app-container manager has signalled that it has called both bind() and
> > listen()
> 
> Unfortunately, this is really annoying to do safely, even without the
> annoyance of getting bwrap to tell us the socket is ready to be accept()ed
> without it having to have a persistent D-Bus connection.

This feature is now Bug #101898.

> However, if the caller is responsible for bind()ing and listen()ing, the
> only way the dbus-daemon can tell that a malicious caller hasn't done that
> is when accept() raises EINVAL.

Actually it looks like we can getsockopt() for SO_ACCEPTCONN, at least on Linux. As a safety-catch we probably still want to make sure the dbus-daemon won't busy-loop on an invalid server fd, though.
Comment 17 Philip Withnall 2017-08-01 11:26:35 UTC
(In reply to Simon McVittie from comment #15)
> One possible answer is that the tuple identifying /.../c1 should
> additionally contain the uid that initiated /.../c1, like this:

...

> or even, as Philip suggested, the complete GetConnectionCredentials() result
> for the initiator (container manager), like this:

I would suggest that’s unconditionally better than just including the uid (as above).

> Another possible answer is that the dbus-daemon should not let Bob connect
> to Alice's server at all, because Alice "owns" that server (this might seem
> arbitrary, but the concept of a server being "owned" by a uid is exactly how
> the session bus has always worked). ...

I think we should go with this strict approach to start with, *along with* exposing the initiator credentials.

Being even more strict, do we want to bind app-container instances to session IDs rather than user IDs? One could imagine a variant of this situation where Bob is actually Alice in another session. While this isn’t a security problem, it would be mighty confusing if the geoclue agent dialogue popped up in the wrong graphical session.

(I realise this is a major change to the branch, but I think it’s worth discussing and I can’t remember it being discussed before.)

> A third possible answer is that the flag that makes the server accept
> connections from other uids should be a privileged action (only available to
> root or messagebus), analogous to the default behaviour of the allow_other
> mount option for FUSE filesystems.

That seems unnecessarily restrictive, although I think we should probably wait until a use case arrives from snap (etc.) which requires loosening of the currently-implemented initiator UID check.

> A fourth possibility is to say that creating one of these servers is a
> privileged action (only root or the bus owner can do it), and if Flatpak
> apps want to use system services like Geoclue or NetworkManager, they must
> do so via a portal on the session bus (which presents its own "allow or
> not?" UI, re-submits the request to Geoclue as though it had originated
> outside the container if allowed, and then acts as a Geoclue agent that will
> silently allow the resubmitted request instead of letting the normal agent
> re-prompt, or something). However, this requires special code to be written
> for every system service that the apps will interact with; notably,
> Flatpak's bus proxy does not require that.

Yuck no.
Comment 18 Simon McVittie 2017-08-02 10:45:32 UTC
(In reply to Philip Withnall from comment #17)
> Being even more strict, do we want to bind app-container instances to
> session IDs rather than user IDs?

Short term: no, for two reasons:

* Login sessions are not, in practice, a security boundary. They can ptrace each other, they can write configuration files in $HOME that (in practice) the other login session is going to trust, they have the same AppArmor profiles in effect, and so on. So there seems little point.

* dbus has no concept of what a login session is, other than for X11 autolaunching (which pretends it is synonymous with "X11 display", which is of course false). We don't ask systemd-logind or ConsoleKit or anything similar what is going on with login sessions, we don't look at the Linux audit-session ID, and so on.

Longer term: we can always add it as a new connection credential alongside the uid, pid and LSM label, and services can pay attention to it if they want to. We could also have an opt-in mechanism (perhaps RestrictByLoginSession: b) to sidestep the reasoning about backwards compat that led me to restrict by uid by default.

> One could imagine a variant of this
> situation where Bob is actually Alice in another session. While this isn’t a
> security problem, it would be mighty confusing if the geoclue agent dialogue
> popped up in the wrong graphical session.

If this is only for information and not a security boundary, then rummaging in /proc/$pid is acceptable, because the execve() attack described in Bug #83499 and the pid-reuse attack described in the same bug are not applicable between cooperating and mutually trusting processes.

> > A third possible answer is that the flag that makes the server accept
> > connections from other uids should be a privileged action (only available to
> > root or messagebus), analogous to the default behaviour of the allow_other
> > mount option for FUSE filesystems.
> 
> That seems unnecessarily restrictive, although I think we should probably
> wait until a use case arrives from snap (etc.) which requires loosening of
> the currently-implemented initiator UID check.

Sure. At the time we add it, we can decide whether it's a privileged action or not. All we need to know at this stage is that it's implementable.

> > A fourth possibility is [...] if Flatpak
> > apps want to use system services like Geoclue or NetworkManager, they must
> > do so via a portal on the session bus
> 
> Yuck no.

I am inclined to agree.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.