101903 – Containers (#100344): allow container to live longer than container manager

Bug 101903 - Containers (#100344): allow container to live longer than container manager

Summary: Containers (#100344): allow container to live longer than container manager

Status:	RESOLVED MOVED

Alias:	None

Product:	dbus
Classification:	Unclassified
Component:	core (show other bugs)
Version:	git master
Hardware:	All All

Importance:	medium enhancement
Assignee:	Simon McVittie
QA Contact:	D-Bus Maintainers

URL:
Whiteboard:
Keywords:

Depends on:	101354
Blocks:	100344
	Show dependency tree / graph

Reported:	2017-07-24 20:14 UTC by Simon McVittie
Modified:	2018-10-12 21:31 UTC (History)
CC List:	5 users (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Simon McVittie 2017-07-24 20:14:21 UTC

+++ This bug was initially created as a clone of Bug #100344 +++

Following on from Bug #101354, we want to let short-lived container manager helper processes create a container with a lifetime and then get out of the way. In particular, Flatpak does not have a "supervisor" process once the sandboxed app is running, except for bwrap (which is minimal, sometimes setuid, and does not use D-Bus) and the flatpak-dbus-proxy (which, long term, we want to remove).

In Bug #101354, the container would stop listening when its creator falls off the bus. This is not suitable for Flatpak long-term.

The design we had at the GTK hackfest was to fd-pass in the read end of a pipe. Until it polls readable, the container server keeps listening. When it polls readable, the container server stops.

This can go in the named parameters a{sv}.

Comment 1 James Henstridge 2018-01-04 08:33:23 UTC

From the snapd point of view, it would be useful to have a mode of operation where the app container remains active until it is explicitly deleted (possibly by some other D-Bus connection).

While the snapd process is usually running, it's lifecycle is not tied to the the runtime of confined applications.  It's designed so it can be restarted (e.g. when it is upgraded) without interrupting anything else.  This is done by having state stored in the file system rather than trying to pass resources across generations of the daemon directly.

So tying the lifetime to a pipe file descriptor rather than a D-Bus connection isn't much better.  In contrast, we can easily store the container ID for a future StopInstance call (possibly made by a future generation of the daemon).

Comment 2 Simon McVittie 2018-01-04 14:49:48 UTC

(In reply to James Henstridge from comment #1)
> From the snapd point of view, it would be useful to have a mode of operation
> where the app container remains active until it is explicitly deleted
> (possibly by some other D-Bus connection).

Straw man: a named parameter { "KeepListeningAfterInitiatorExit": <true> }?

I want the default to be "when the initiator dies, the server stops listening" so that in simple uses of the API, we know the server will eventually get cleaned up (they're a limited resource, particularly on the system bus, so avoiding accidental or deliberate denial of service is desirable).

If snapd explicitly passes some parameter to say "I promise I will clean this up eventually", then it's clear whose bug it is if it fails to do so :-)

(The cleanup action currently taken when the container manager dies is the equivalent of StopListening(), not StopInstance() - the server stops, but existing connections stay alive.)

> This is done by having state stored in the file system rather than trying to
> pass resources across generations of the daemon directly.

In /run so it won't persist between boots, I hope?

If you're doing this, and you have arranged things so that you can be confident that a crashed or otherwise inoperative snapd will restart reasonably soon and will detect app exit in a timely way, then I think it's OK to opt-in to taking responsibility for cleaning up the per-app servers.

Comment 3 James Henstridge 2018-01-08 12:39:15 UTC

I think defaulting to guaranteed cleanup is fine.  I agree that if snapd wants to enable manual cleanup it can ask for it.

We are already using /run to store some runtime data (mainly to do with the mount namespaces for sandboxed apps), so the path of least resistance would be to store app container IDs there too.

Comment 4 David Herrmann 2018-01-12 11:42:36 UTC

(In reply to James Henstridge from comment #1)
> So tying the lifetime to a pipe file descriptor rather than a D-Bus
> connection isn't much better.  In contrast, we can easily store the
> container ID for a future StopInstance call (possibly made by a future
> generation of the daemon).

So you're saying your apps have no "managing parent process" that survives the lifetime of the App?

In general, I like the proposal from Simon. It is very similar to kernel APIs: You *want* to return a file-descriptor that defines the lifetime of an object you created. But unlike kernel-APIs, we cannot create our own FDs. Hence, we simply pass one in, chosen by the client (always make sure to pass them in, rather than return them from the daemon, so they're accounted on the right uid by the kernel).

Note that you could just "leak" (clear O_CLOEXEC) this FD into your client App, and thus naturally limit its lifetime to the App. The only problem would be if the client App iterates /proc/self/fd/ and closes anything it doesn't care about. That sounds highly unlikely, though, since even glibc is allowed to open FDs behind your back.

Comment 5 Alexander Larsson 2018-01-12 14:15:05 UTC

Leaking an fd means your app dies if it double-forks though, or if for some other reason the initial process dies.

Comment 6 Alexander Larsson 2018-01-12 14:15:41 UTC

Well, the "sandbox" dies.

Comment 7 David Herrmann 2018-01-12 14:33:22 UTC

(In reply to Alexander Larsson from comment #5)
> Leaking an fd means your app dies if it double-forks though, or if for some
> other reason the initial process dies.

You would leak recursively, unless you take precaution and close them after fork(). Not saying this is nice, just saying it is possible.

I still don't see why it is a good idea to have an app outlive its babysitter? You introduce here a feature that allows pinning resources arbitrarily in dbus-daemon, with no way of automatic cleanup. The sandboxed app segfaults (or its manager) and you leak the resources in dbus-daemon until reboot. Nobody likes SYS-V-IPC for behavior like this.

By requiring an FD, you also require a process to be alive, and thus you pin any related resources and get natural responsibility and accounting.

By not requiring an FD, you might get dangling resources in dbus-daemon with not real-world connection whatsoever. What if all related processes to that App died, and systemd recycles the UID? What if someone exploits a sandbox bug to DoS dbus-daemon and exceed resources?

Sure, if nothing has bugs and nothing fails, pinning resources in dbus-daemon without an FD is fine. But... really?

I will not oppose this feature, but I really think it is a bad idea.

Comment 8 Simon McVittie 2018-03-02 14:44:28 UTC

For the session bus, Snap would presumably need to have a per-user "agent" (the same design pattern used in BlueZ and polkit) that is responsible for announcing itself to snapd, receiving requests from snapd, creating container instances (sandboxes), and reporting the addresses of those container instances back to snapd - otherwise Snap would be violating the design principle that more-privileged services (snapd) shouldn't be direct clients of less-privileged services (dbus-daemon).

If it needs that *anyway*, would it perhaps make sense for that agent to be responsible for creating *all* container instances used by snap apps (at least those that run as an ordinary user account)? That would avoid David's concern by having the lifetime of the container instance be tied to that agent.

For snaps that operate at system level (which is not a concept that exists in Flatpak) snapd might still need to create the container instances directly though.

(In reply to David Herrmann from comment #7)
> I still don't see why it is a good idea to have an app outlive its
> babysitter?

Per Comment #1, snapd is designed to be restartable, unlike dbus-daemon itself. It's a system-wide privileged singleton that manages every snap app, not a per-container-instance babysitter (because it needs to be highly privileged anyway, to be able to exercise CAP_MAC_ADMIN to install its own AppArmor profiles).

On the system bus, we could maybe make this an operation that is only allowed to "privileged" connections (those with uid 0 or the bus's own uid), because snapd is presumably uid 0 anyway?

> You introduce here a feature that allows pinning resources
> arbitrarily in dbus-daemon, with no way of automatic cleanup.

If it's clear that snapd is responsible for that pinning, and there is a way to clean it up manually (which there is - StopInstance() and StopListening() are already implemented), then I think I'm OK with "privileged" connections being able to do this. After all, those are the connections that have write access to /usr/bin and /etc, and can overwrite dbus-daemon or its configuration...

> By not requiring an FD, you might get dangling resources in dbus-daemon with
> not real-world connection whatsoever. What if all related processes to that
> App died, and systemd recycles the UID?

In the snapd case I think the initiator uid would be snapd's uid, not the app's.

> What if someone exploits a sandbox
> bug to DoS dbus-daemon and exceed resources?

Flatpak doesn't attempt to protect availability anyway, only confidentiality and integrity, and I doubt Snap tries to protect availability either. There are so many things that need to be locked down or limited to prevent local DoS by a process with arbitrary code execution and "normal desktop app" expectations that it's quite an intractable problem space.

A lot of the design of Containers1 is based on optional named parameters (like Python kwargs) with feature-discovery support, so if the KeepListeningAfterInitiatorExit named parameter turns out to have been an awful idea, we can remove it and the only thing it'll break will be container managers that don't have a fallback path for not being able to use it - so at worst, removing it would break snapd. If Flatpak never tries to use it then Flatpak would be unaffected by its removal.

Comment 9 Simon McVittie 2018-03-02 15:33:54 UTC

(In reply to Simon McVittie from comment #8)
> For the session bus, Snap would presumably need to have a per-user "agent"
> (the same design pattern used in BlueZ and polkit) that is responsible for
> announcing itself to snapd, receiving requests from snapd, creating
> container instances (sandboxes), and reporting the addresses of those
> container instances back to snapd - otherwise Snap would be violating the
> design principle that more-privileged services (snapd) shouldn't be direct
> clients of less-privileged services (dbus-daemon).

Let me rephrase that, since that was really unclear:

... otherwise Snap would be violating the design principle that system-level services (snapd) shouldn't be clients of per-user services (dbus-daemon --session).

Comment 10 GitLab Migration User 2018-10-12 21:31:50 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/dbus/dbus/issues/186.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.