Bug 68370

Summary: systemd-nspawn -b doesn't cleanup machine slice if systemd inside container
Product: systemd Reporter: Maksim Melnikau <maxposedon>
Component: generalAssignee: systemd-bugs
Status: RESOLVED FIXED QA Contact: systemd-bugs
Severity: normal    
Priority: medium CC: brandon, cedric.bosdonnat.ooo, leho, vmlinuz386
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Maksim Melnikau 2013-08-21 07:08:07 UTC
systemd-nspawn doesn't clean all machine slices, so restarting doesn't work (see below). After removing slice in /sys/fs/cgroup, it starts fine.

m_melnikau-M11xR3 kvms # systemd-nspawn -D /media/gsoho -b
systemd 206 running in system mode. (+PAM +LIBWRAP -AUDIT -SELINUX +IMA -SYSVINIT -LIBCRYPTSETUP -GCRYPT +ACL -XZ)
Detected virtualization 'systemd-nspawn'.
Welcome to Gentoo/Linux!
...
All filesystems unmounted.
Storage is finalized.
Container has been shut down.

m_melnikau-M11xR3 kvms # systemd-nspawn -D /media/gsoho -b
Spawning namespace container on /media/gsoho (console is /dev/pts/4).
Init process in the container running as PID 4706.
Failed to register machine: File exists
Container failed with error code 239.

m_melnikau-M11xR3 kvms # wc -l /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/tasks
0 /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/tasks

m_melnikau-M11xR3 kvms # rmdir /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-journald.service/

m_melnikau-M11xR3 kvms # wc -l /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/tasks                                 
wc: /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/tasks: No such file or directory

# systemd-nspawn -D /media/gsoho -b                                                                 
...
Welcome to Gentoo/Linux!
Comment 1 Harald Hoyer 2013-08-30 08:29:59 UTC
Did this commit http://cgit.freedesktop.org/systemd/systemd/commit/?id=b58b8e11c5f769e3c80d5169fdcc4bd04b882b7d
fix your issue?
Comment 2 Maksim Melnikau 2013-08-30 10:56:23 UTC
(In reply to comment #1)
> Did this commit
> http://cgit.freedesktop.org/systemd/systemd/commit/
> ?id=b58b8e11c5f769e3c80d5169fdcc4bd04b882b7d
> fix your issue?
No, it doesn't

I applied this patch on "host" on top of systemd-206, it doesn't change anything.
Comment 3 Maksim Melnikau 2013-08-30 11:11:09 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Did this commit
> > http://cgit.freedesktop.org/systemd/systemd/commit/
> > ?id=b58b8e11c5f769e3c80d5169fdcc4bd04b882b7d
> > fix your issue?
> No, it doesn't
> 
> I applied this patch on "host" on top of systemd-206, it doesn't change
> anything.
Hmm, one thing changed, before I had to delete:
a) /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-journald.service/
With patch, I had to delete 2 directories:
a) /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-initctl.service
b) /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-journald.service/
Not sure it because of this patch, or smth else.
Comment 4 Lennart Poettering 2013-09-12 17:10:28 UTC
Hmm, does "systemctl" still list the container's scope unit when the cgroup is still there after nspawn exited?
Comment 5 Maksim Melnikau 2013-09-12 17:22:28 UTC
(In reply to comment #4)
> Hmm, does "systemctl" still list the container's scope unit when the cgroup
> is still there after nspawn exited?
Yes, it is:
# systemctl status machine-gsoho.scope
machine-gsoho.scope - Container gsoho
   Loaded: loaded (/run/systemd/system/machine-gsoho.scope; static)
  Drop-In: /run/systemd/system/machine-gsoho.scope.d
           └─90-Description.conf
   Active: active (running) since Thu 2013-09-12 20:16:36 FET; 1min 23s ago

Sep 12 20:16:36 m_melnikau-M11xR3 systemd[1]: Started Container gsoho.

and even more

# rmdir /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-initctl.service
m_melnikau-M11xR3 kvms # systemctl status machine-gsoho.scope                                                              
machine-gsoho.scope - Container gsoho
   Loaded: loaded (/run/systemd/system/machine-gsoho.scope; static)
  Drop-In: /run/systemd/system/machine-gsoho.scope.d
           └─90-Description.conf
   Active: active (running) since Thu 2013-09-12 20:16:36 FET; 2min 51s ago

Sep 12 20:16:36 m_melnikau-M11xR3 systemd[1]: Started Container gsoho.

# rmdir /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-journald.service/
m_melnikau-M11xR3 kvms # systemctl status machine-gsoho.scope                                                              
machine-gsoho.scope
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

Sep 12 20:16:36 m_melnikau-M11xR3 systemd[1]: Starting Container gsoho.
Sep 12 20:16:36 m_melnikau-M11xR3 systemd[1]: Started Container gsoho.
Comment 6 Lennart Poettering 2013-09-12 17:46:54 UTC
So yeah, I figure you ran into an kernel issue. It appears that the cgroups release agent is not properly called by the kernel in some cases when a cgroup runs empty.
Comment 7 Maksim Melnikau 2013-09-12 18:15:26 UTC
How I can help to fix it?
Its 100% reproducible on my laptop.
Comment 8 Lennart Poettering 2013-09-12 18:34:46 UTC
The kernel folks are working on giving us a much better notifier scheme for this, so that we don't need the release_agent stuff anymore. Alas, that's not done yet and will take some time. In the meantime, it doesn't look like anyone wants to fix release_agent anymore...

My guess is that release_agent gets confused by CLONE_NEWPID in some way...
Comment 9 Maksim Melnikau 2013-10-07 09:50:47 UTC
I couldn't reproduce it anymore with systemd-208, was it a systemd bug?
Comment 11 Leho Kraav (:macmaN :lkraav) 2014-05-06 11:51:16 UTC
I am still getting this on 208

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.