68370 – systemd-nspawn -b doesn't cleanup machine slice if systemd inside container

Bug 68370 - systemd-nspawn -b doesn't cleanup machine slice if systemd inside container

Summary: systemd-nspawn -b doesn't cleanup machine slice if systemd inside container

Status:	RESOLVED FIXED

Alias:	None

Product:	systemd
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	systemd-bugs
QA Contact:	systemd-bugs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-08-21 07:08 UTC by Maksim Melnikau
Modified:	2014-05-06 11:51 UTC (History)
CC List:	4 users (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Maksim Melnikau 2013-08-21 07:08:07 UTC

systemd-nspawn doesn't clean all machine slices, so restarting doesn't work (see below). After removing slice in /sys/fs/cgroup, it starts fine.

m_melnikau-M11xR3 kvms # systemd-nspawn -D /media/gsoho -b
systemd 206 running in system mode. (+PAM +LIBWRAP -AUDIT -SELINUX +IMA -SYSVINIT -LIBCRYPTSETUP -GCRYPT +ACL -XZ)
Detected virtualization 'systemd-nspawn'.
Welcome to Gentoo/Linux!
...
All filesystems unmounted.
Storage is finalized.
Container has been shut down.

m_melnikau-M11xR3 kvms # systemd-nspawn -D /media/gsoho -b
Spawning namespace container on /media/gsoho (console is /dev/pts/4).
Init process in the container running as PID 4706.
Failed to register machine: File exists
Container failed with error code 239.

m_melnikau-M11xR3 kvms # wc -l /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/tasks
0 /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/tasks

m_melnikau-M11xR3 kvms # rmdir /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-journald.service/

m_melnikau-M11xR3 kvms # wc -l /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/tasks                                 
wc: /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/tasks: No such file or directory

# systemd-nspawn -D /media/gsoho -b                                                                 
...
Welcome to Gentoo/Linux!

Comment 1 Harald Hoyer 2013-08-30 08:29:59 UTC

Did this commit http://cgit.freedesktop.org/systemd/systemd/commit/?id=b58b8e11c5f769e3c80d5169fdcc4bd04b882b7d
fix your issue?

Comment 2 Maksim Melnikau 2013-08-30 10:56:23 UTC

(In reply to comment #1)
> Did this commit
> http://cgit.freedesktop.org/systemd/systemd/commit/
> ?id=b58b8e11c5f769e3c80d5169fdcc4bd04b882b7d
> fix your issue?
No, it doesn't

I applied this patch on "host" on top of systemd-206, it doesn't change anything.

Comment 3 Maksim Melnikau 2013-08-30 11:11:09 UTC

(In reply to comment #2)
> (In reply to comment #1)
> > Did this commit
> > http://cgit.freedesktop.org/systemd/systemd/commit/
> > ?id=b58b8e11c5f769e3c80d5169fdcc4bd04b882b7d
> > fix your issue?
> No, it doesn't
> 
> I applied this patch on "host" on top of systemd-206, it doesn't change
> anything.
Hmm, one thing changed, before I had to delete:
a) /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-journald.service/
With patch, I had to delete 2 directories:
a) /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-initctl.service
b) /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-journald.service/
Not sure it because of this patch, or smth else.

Comment 4 Lennart Poettering 2013-09-12 17:10:28 UTC

Hmm, does "systemctl" still list the container's scope unit when the cgroup is still there after nspawn exited?

Comment 5 Maksim Melnikau 2013-09-12 17:22:28 UTC

(In reply to comment #4)
> Hmm, does "systemctl" still list the container's scope unit when the cgroup
> is still there after nspawn exited?
Yes, it is:
# systemctl status machine-gsoho.scope
machine-gsoho.scope - Container gsoho
   Loaded: loaded (/run/systemd/system/machine-gsoho.scope; static)
  Drop-In: /run/systemd/system/machine-gsoho.scope.d
           └─90-Description.conf
   Active: active (running) since Thu 2013-09-12 20:16:36 FET; 1min 23s ago

Sep 12 20:16:36 m_melnikau-M11xR3 systemd[1]: Started Container gsoho.

and even more

# rmdir /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-initctl.service
m_melnikau-M11xR3 kvms # systemctl status machine-gsoho.scope                                                              
machine-gsoho.scope - Container gsoho
   Loaded: loaded (/run/systemd/system/machine-gsoho.scope; static)
  Drop-In: /run/systemd/system/machine-gsoho.scope.d
           └─90-Description.conf
   Active: active (running) since Thu 2013-09-12 20:16:36 FET; 2min 51s ago

Sep 12 20:16:36 m_melnikau-M11xR3 systemd[1]: Started Container gsoho.

# rmdir /sys/fs/cgroup/systemd/system.slice/machine-gsoho.scope/system.slice/systemd-journald.service/
m_melnikau-M11xR3 kvms # systemctl status machine-gsoho.scope                                                              
machine-gsoho.scope
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

Sep 12 20:16:36 m_melnikau-M11xR3 systemd[1]: Starting Container gsoho.
Sep 12 20:16:36 m_melnikau-M11xR3 systemd[1]: Started Container gsoho.

Comment 6 Lennart Poettering 2013-09-12 17:46:54 UTC

So yeah, I figure you ran into an kernel issue. It appears that the cgroups release agent is not properly called by the kernel in some cases when a cgroup runs empty.

Comment 7 Maksim Melnikau 2013-09-12 18:15:26 UTC

How I can help to fix it?
Its 100% reproducible on my laptop.

Comment 8 Lennart Poettering 2013-09-12 18:34:46 UTC

The kernel folks are working on giving us a much better notifier scheme for this, so that we don't need the release_agent stuff anymore. Alas, that's not done yet and will take some time. In the meantime, it doesn't look like anyone wants to fix release_agent anymore...

My guess is that release_agent gets confused by CLONE_NEWPID in some way...

Comment 9 Maksim Melnikau 2013-10-07 09:50:47 UTC

I couldn't reproduce it anymore with systemd-208, was it a systemd bug?

Comment 10 Zbigniew Jedrzejewski-Szmek 2013-11-06 15:19:38 UTC

Should be fixed:

http://cgit.freedesktop.org/systemd/systemd/commit/?id=41f854
http://cgit.freedesktop.org/systemd/systemd/commit/?id=1f0cd8

Comment 11 Leho Kraav (:macmaN :lkraav) 2014-05-06 11:51:16 UTC

I am still getting this on 208

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.