Bug 68232 - Cleaning up of /tmp directories PrivateTmp=yes from PID 1 might be *very* slow if the private directory contains many many files on slow disks thus blocking PID1 from continuing
Summary: Cleaning up of /tmp directories PrivateTmp=yes from PID 1 might be *very* slo...
Status: RESOLVED FIXED
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-17 21:16 UTC by Petr Ročkai
Modified: 2013-09-17 15:28 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Petr Ročkai 2013-08-17 21:16:44 UTC
systemd got stuck on me (or dbus got, can't really tell) and I had to kill -TERM it. Now it's stuck in a loop doing something with temporary files, presumably in /var/tmp which has ~ 200k directory entries). An excerpt from strace -p 1:

unlinkat(19, "nix-ssh.xEovPs", AT_REMOVEDIR) = 0
openat(19, "nix-ssh.1mtYv4", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW|O_NOATIME|O_CLOEXEC) = 28
fstat(28, {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
fcntl(28, F_GETFL)                      = 0xf8800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW|O_NOATIME|O_CLOEXEC)
getdents(28, /* 2 entries */, 32768)    = 48
getdents(28, /* 0 entries */, 32768)    = 0
close(28)                               = 0

this goes on and on and on and systemd won't do anything else; I'm nuking those directories from a shell now, in the hopes it'll get unstuck when they are gone...
Comment 1 Petr Ročkai 2013-08-17 22:35:43 UTC
The loop is still running and I have checked that fd 19 points to /tmp/systemd-private-0BjFH0/tmp ... I am cleaning that one now as well. Don't know where the content of that came from, but it seems to mirror either /tmp or /var/tmp.
Comment 2 Petr Ročkai 2013-08-17 22:41:12 UTC
Wrong. It's most likely a private /tmp for a service which left behind 2.5M temporary directories. While the behaviour of that service is unfortunate, this probably shouldn't knock systemd out of service for hours...
Comment 3 Petr Ročkai 2013-08-18 09:37:26 UTC
It's been well over 12 hours and the cleanup is still running. Also, I can't create new sessions on the machine, presumably because session creation is waiting for the synchronous cleanup process.
Comment 4 Lennart Poettering 2013-09-12 17:12:56 UTC
Which binary is that strace of? systemd-tmpfiles? This certainly doesn't look like systemd/PID 1 itself?
Comment 5 Petr Ročkai 2013-09-12 17:52:05 UTC
Sorry, you missed the window while I still had enough in my head to answer questions by about three weeks. But you can easily reproduce the bug yourself, just create a unit with private /tmp, have it load up with lots of directories and SIGTERM systemd. You should be able to see what's going wrong. Whether it's PID 1 or not is largely irrelevant if the process locks out session creation.
Comment 6 Lennart Poettering 2013-09-12 18:39:02 UTC
Ah, hmm, I think I get it now.

If PrivateTmp is used and the private tmp dir is really really full of little files, then we will try to delete them all synchronously stopping everything else in PID 1 for that time and that might take ages. And that sucks hard.

Not sure what we can do about this... We could fork off the clean-up routine so that PID 1 is unaffected by slow disk. But, brrr.
Comment 7 Zbigniew Jedrzejewski-Szmek 2013-09-13 00:18:58 UTC
*** Bug 68217 has been marked as a duplicate of this bug. ***
Comment 8 Zbigniew Jedrzejewski-Szmek 2013-09-17 15:28:33 UTC
Fixed in http://cgit.freedesktop.org/systemd/systemd/commit/?id=f485606.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.