Bug 65973 - when a unit file is removed, systemd abandons running processes
Summary: when a unit file is removed, systemd abandons running processes
Status: RESOLVED FIXED
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-20 14:43 UTC by Zbigniew Jedrzejewski-Szmek
Modified: 2019-02-28 10:57 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Zbigniew Jedrzejewski-Szmek 2013-06-20 14:43:17 UTC
% systemctl status libvirtd
libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: inactive (dead) since Thu 2013-06-20 10:35:57 EDT; 10s ago
 Main PID: 1182 (code=exited, status=0/SUCCESS)
   CGroup: name=systemd:/system/libvirtd.service
           ├─1278 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf
           └─2593 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name schlemiel -S -machine pc-i...

% sudo rpm -e libvirt-daemon --nodeps

% sudo systemctl status libvirtd
libvirtd.service
   Loaded: error (Reason: No such file or directory)
   Active: inactive (dead)

% ps 2593
 2593 ?        Sl    11:26 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name schlemiel -S -machin

So systemd has forgotten that 2593 is a part of a unit... I know that this might be hard to fix, but I think we shouldn't gc a unit until it's really stopped and all processes are gone.
Comment 1 Lennart Poettering 2013-06-20 16:47:28 UTC
Hmm, my guess is that libvirt moved qemu into its own cgroup and hence outside of the control of systemd. Of course, libvirt really should have killed the VM when it was uninstalled... 

So, not sure we can do anything about this, except filing a bug against libvirt to fix its scriptlets?
Comment 2 Zbigniew Jedrzejewski-Szmek 2013-06-20 17:13:59 UTC
(In reply to comment #1)
> Hmm, my guess is that libvirt moved qemu into its own cgroup and hence
> outside of the control of systemd. Of course, libvirt really should have
> killed the VM when it was uninstalled... 
It seems that everything is still in the same cgroup (this is after restarting libvirtd.service).

   CGroup: name=systemd:/system/libvirtd.service
           ├─ 1278 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf
           ├─ 2593 /usr/bin/qemu-system-x86_64 -machine accel=kvm ...
           └─26765 /usr/sbin/libvirtd

> So, not sure we can do anything about this, except filing a bug against
> libvirt to fix its scriptlets?
It's on purpose, so that running VM are not interrupted by libvirtd restarts. Tying the VMs to libvirtd lifecycle would be a huge drawback. The service file has:
    KillMode=process
Comment 3 Kay Sievers 2013-06-20 17:20:44 UTC
(In reply to comment #2)

> > So, not sure we can do anything about this, except filing a bug against
> > libvirt to fix its scriptlets?
> It's on purpose, so that running VM are not interrupted by libvirtd
> restarts. Tying the VMs to libvirtd lifecycle would be a huge drawback.

How? A restart is not an un-install.

I don't see any valid reason to leave the service running after the process
binary is removed from the system.

It really does not sound like a systemd issue.
Comment 4 Zbigniew Jedrzejewski-Szmek 2013-06-20 17:32:56 UTC
(In reply to comment #3)
> (In reply to comment #2)
> 
> > > So, not sure we can do anything about this, except filing a bug against
> > > libvirt to fix its scriptlets?
> > It's on purpose, so that running VM are not interrupted by libvirtd
> > restarts. Tying the VMs to libvirtd lifecycle would be a huge drawback.
> 
> How? A restart is not an un-install.
> 
> I don't see any valid reason to leave the service running after the process
> binary is removed from the system.
> 
> It really does not sound like a systemd issue.
But a user might remove the file at any time... and systemd should do it's best to behave gracefully. I think it's an issue with all KillMode=process units.

Let's try:
% systemctl status avahi-daemon.service
avahi-daemon.service - Avahi mDNS/DNS-SD Stack
   Loaded: loaded (/usr/lib/systemd/system/avahi-daemon.service; enabled)
   Active: active (running) since Thu 2013-06-20 03:17:56 EDT; 10h ago
 Main PID: 641 (avahi-daemon)
   Status: "Server startup complete. Host name is bupkis.local. Local service cookie is 4162726802."
   CGroup: name=systemd:/system/avahi-daemon.service
           ├─641 avahi-daemon: running [bupkis.local]
           └─652 avahi-daemon: chroot helper

% sudo mv /usr/lib/systemd/system/avahi-daemon.service /tmp/
% sudo systemctl daemon-reload
% systemctl status avahi-daemon.service
avahi-daemon.service
   Loaded: error (Reason: No such file or directory)
   Active: active (running) since Thu 2013-06-20 03:17:56 EDT; 10h ago
 Main PID: 641 (avahi-daemon)
   Status: "Server startup complete. Host name is bupkis.local. Local service cookie is 4162726802."

% sudo systemctl stop avahi-daemon.service
Warning: Stopping avahi-daemon.service, but it can still be activated by:
  avahi-daemon.socket

% systemctl status avahi-daemon.service
avahi-daemon.service
   Loaded: error (Reason: No such file or directory)
   Active: inactive (dead) since Thu 2013-06-20 13:30:51 EDT; 33s ago
 Main PID: 641 (code=exited, status=0/SUCCESS)
   Status: "Server startup complete. Host name is bupkis.local. Local service cookie is 4162726802."

...so it seems that systemd has only partial amnesia when a unit file is removed ;)
Comment 5 Lennart Poettering 2013-06-20 17:39:56 UTC
So, if libvirt doesn't move qemu away, then this indeed looks like a bug in systemd.

When you remove the service description file and the service is still running at that time, and you reload systemd, then systemd should show the thing as failed to load, but should still show all its running processes, and it should allow shutting the service down (which would use the default shutting down logic of SIGTERM+SIGKILL given that no other config is available). It wouldn't allow restarting, reloading or starting it, only stopping it, since that is kinda necessary, and that's the only thing we actually can do without service description file.

Zbigniew's pastes suggest that for some reason systemd completely loses track of the processes when it is reloaded. And that definitely shouldn't happen. And I am sure this did work once upon a time, because I remember hacking this up and testing it.
Comment 6 Lennart Poettering 2019-02-28 10:57:52 UTC
I am pretty sure this has long been fixed. If this is reproducible, please file a new bug on github.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.