Bug 75566 - RFE: make /run/user/$UID tmpfs of its own
Summary: RFE: make /run/user/$UID tmpfs of its own
Status: RESOLVED FIXED
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-27 09:32 UTC by Michael Stapelberg
Modified: 2014-03-06 01:25 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Michael Stapelberg 2014-02-27 09:32:51 UTC
In http://bugs.debian.org/739574, a user is concerned that the amount of RAM that can be used by the sum of all his tmpfs mounts is bigger than the available amount of RAM. Specifically, he has 4G of RAM in the machine, and the following tmpfs mounts:

tmpfs           1,9G     0    1,9G   0% /dev/shm
tmpfs           1,9G  464K    1,9G   1% /run
tmpfs           1,9G     0    1,9G   0% /sys/fs/cgroup
tmpfs           5,0M     0    5,0M   0% /run/lock
tmpfs           100M     0    100M   0% /run/user

Now, specifically for /sys/fs/cgroup, which contains no files and only seems to be used for mounting cgroups in subdirectories, I suppose we could use a small-ish size= parameter?

What’s the story with regards to /dev/shm and /run? I see that on (older?) Ubuntu, /dev/shm points to /run/shm. Why don’t we use the same setup in systemd? Is there any benefit in having two separate tmpfs mounts?

And is the concern about RAM exhaustion actually a real one or are we missing something? (Personally, I think it’s unlikely that some process will entirely exhaust more than one tmpfs mount, but it _could_ happen, right?)
Comment 1 Lennart Poettering 2014-02-27 21:57:40 UTC
A tmpfs doesn't take up any memory if it is empty. Write access to /sys/fs/cgroup is restricted, it is not world-writable. Only root can write there. If you are concerned about the resources root might consume, then, well, there are bigger fishes to fry.

The tmpfs size= parameter is a limit that is useful for world-writable (or at least user-writable) mounts, and otherwise not too interesting.

/dev/shm/ is world-writable. /run is not (well with the exception of /run/user, but we are working on mounting /run/user/$UID as tmpfs that is lifecycle tracked by logind). Hence /dev/shm really should come from a different pool than /run. Hence you really should mount things seperately. And that's what systemd does. If you don't mount them seperately you basically invite unpriviliged users to fill up /run thus blocking system services to work.

/dev/shm and /tmp are security problems since they know no per-user quota (and also are a shared namespace). That means that a user can fuck with another user on the same system. THis has been brought up a couple of times to the kernel guys, there have been patches to add quota to tmpfs, but so far nobody really cared too much to put enough pressure on it to push it upstream for good.

There's no point in turning /run/lock into a tmpfs of its own. It's a moronic interface anyway, and should be phased out. Also, it should be considered one user of /run like any other. 

Summary:

a) the size= param is just a limit, as long as nobody uses the memory it doesn't matter. It doesn't allocate anything when you pick a larger value, it just puts a limit on allocation within it.
b) /dev/shm and /run on the same tmpfs is really dumb, actively makes things less secure
c) adding size= to /sys/fs/cgroup doesn't hurt, but doesn't bring benefit either
d) we should introduce separate tmpfs mounts for /run/user/$UID, this would actively make things more secure
e) /dev/shm and /tmp are simply badly designed since no quota is applied and it is a shared namespace

I'll now rename this bug to request item d), if that's OK with you, since that is the only thing that would really be a benefit i think.
Comment 2 Michael Stapelberg 2014-02-27 22:21:41 UTC
(In reply to comment #1)
> c) adding size= to /sys/fs/cgroup doesn't hurt, but doesn't bring benefit
> either
Agreed. I’d argue we should still do it, just to make the mount list more assuring to users such as the original reporter of this bug :).

> d) we should introduce separate tmpfs mounts for /run/user/$UID, this would
> actively make things more secure

> I'll now rename this bug to request item d), if that's OK with you, since
> that is the only thing that would really be a benefit i think.
Sounds good, and thank you for the explanation!
Comment 3 Lennart Poettering 2014-02-27 23:12:55 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > c) adding size= to /sys/fs/cgroup doesn't hurt, but doesn't bring benefit
> > either
> Agreed. I’d argue we should still do it, just to make the mount list more
> assuring to users such as the original reporter of this bug :).

Well, it's still snake oil. And suggests there was something to make more secure while there isn't really. I'd always go for simplicity instead of trying to fix assumed security holes that don't actually exist...
Comment 4 tiposchi 2014-02-28 07:12:34 UTC
I don't think that being writable only by root makes it completely safe. Bugs don't really depend on uid, and a buggy software (running as root for perfectly valid reasons) could crash an entire machine, while this wouldn't really be the case if only on disk filesystems are filled up.
Comment 5 Lennart Poettering 2014-02-28 13:15:30 UTC
(In reply to comment #4)
> I don't think that being writable only by root makes it completely safe.
> Bugs don't really depend on uid, and a buggy software (running as root for
> perfectly valid reasons) could crash an entire machine, while this wouldn't
> really be the case if only on disk filesystems are filled up.

The mount has a default size limit anyway, and it is at half the installed RAM. That should be pretty OK as safety measure.

We are not arguing here whether there should be a limit on /sys/fs/cgroup or not, because there always and unconditionally is. We are arguing what it should be. And I see little reason to change the kernel default here.
Comment 6 tiposchi 2014-02-28 13:38:45 UTC
Well several tmpfs all defaulting at half the ram, sum up to more than the ram.
Comment 7 Lennart Poettering 2014-02-28 13:42:13 UTC
(In reply to comment #6)
> Well several tmpfs all defaulting at half the ram, sum up to more than the
> ram.

If you want to protect yourself from a priviliged program that is buggy in the way that it actually goes through all tmpfs it can find and fills them *all* up to the end then you really have bigger problems to solve...

Also, note that tmpfs is swappable but swap is not included in the default size selection. Hence what you write is not even true...
Comment 8 Lennart Poettering 2014-03-06 01:25:43 UTC
The per-UID tmpfs runtime dir is now implemented in git.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.