Bug 68232

Summary: Cleaning up of /tmp directories PrivateTmp=yes from PID 1 might be *very* slow if the private directory contains many many files on slow disks thus blocking PID1 from continuing
Product: systemd Reporter: Petr Ročkai <me>
Component: generalAssignee: systemd-bugs
Status: RESOLVED FIXED QA Contact: systemd-bugs
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Petr Ročkai 2013-08-17 21:16:44 UTC
systemd got stuck on me (or dbus got, can't really tell) and I had to kill -TERM it. Now it's stuck in a loop doing something with temporary files, presumably in /var/tmp which has ~ 200k directory entries). An excerpt from strace -p 1:

unlinkat(19, "nix-ssh.xEovPs", AT_REMOVEDIR) = 0
openat(19, "nix-ssh.1mtYv4", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW|O_NOATIME|O_CLOEXEC) = 28
fstat(28, {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
fcntl(28, F_GETFL)                      = 0xf8800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW|O_NOATIME|O_CLOEXEC)
getdents(28, /* 2 entries */, 32768)    = 48
getdents(28, /* 0 entries */, 32768)    = 0
close(28)                               = 0

this goes on and on and on and systemd won't do anything else; I'm nuking those directories from a shell now, in the hopes it'll get unstuck when they are gone...
Comment 1 Petr Ročkai 2013-08-17 22:35:43 UTC
The loop is still running and I have checked that fd 19 points to /tmp/systemd-private-0BjFH0/tmp ... I am cleaning that one now as well. Don't know where the content of that came from, but it seems to mirror either /tmp or /var/tmp.
Comment 2 Petr Ročkai 2013-08-17 22:41:12 UTC
Wrong. It's most likely a private /tmp for a service which left behind 2.5M temporary directories. While the behaviour of that service is unfortunate, this probably shouldn't knock systemd out of service for hours...
Comment 3 Petr Ročkai 2013-08-18 09:37:26 UTC
It's been well over 12 hours and the cleanup is still running. Also, I can't create new sessions on the machine, presumably because session creation is waiting for the synchronous cleanup process.
Comment 4 Lennart Poettering 2013-09-12 17:12:56 UTC
Which binary is that strace of? systemd-tmpfiles? This certainly doesn't look like systemd/PID 1 itself?
Comment 5 Petr Ročkai 2013-09-12 17:52:05 UTC
Sorry, you missed the window while I still had enough in my head to answer questions by about three weeks. But you can easily reproduce the bug yourself, just create a unit with private /tmp, have it load up with lots of directories and SIGTERM systemd. You should be able to see what's going wrong. Whether it's PID 1 or not is largely irrelevant if the process locks out session creation.
Comment 6 Lennart Poettering 2013-09-12 18:39:02 UTC
Ah, hmm, I think I get it now.

If PrivateTmp is used and the private tmp dir is really really full of little files, then we will try to delete them all synchronously stopping everything else in PID 1 for that time and that might take ages. And that sucks hard.

Not sure what we can do about this... We could fork off the clean-up routine so that PID 1 is unaffected by slow disk. But, brrr.
Comment 7 Zbigniew Jedrzejewski-Szmek 2013-09-13 00:18:58 UTC
*** Bug 68217 has been marked as a duplicate of this bug. ***
Comment 8 Zbigniew Jedrzejewski-Szmek 2013-09-17 15:28:33 UTC
Fixed in http://cgit.freedesktop.org/systemd/systemd/commit/?id=f485606.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.