Bug 84110 - rc-local's TimeoutSec=0 cause shutdown to hang if rc.local spawned any daemons
Summary: rc-local's TimeoutSec=0 cause shutdown to hang if rc.local spawned any daemons
Status: RESOLVED FIXED
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All All
: medium normal
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-19 22:06 UTC by XANi
Modified: 2014-10-28 02:30 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description XANi 2014-09-19 22:06:42 UTC
If rc.local contains forking daemon, in my case it was:

    /usr/sbin/tgtd
    /usr/sbin/tgt-admin -e

reboot/shutdown will hang until that daemon is manually killed

I've fixed that on my machine by setting TimeoutStopSec to > 0 value, but that's probably a bad idea as that 'service' should not be 'stopped' at shutdown at all as it is supposed to be one-shot script run at start of level
Comment 1 Lennart Poettering 2014-10-09 16:33:20 UTC
We have to kill all daemons so that we can properly unmount the various file systems, there isn't really a way around this.

For compatibility with sysv we explicitly turn off the timeout, since that's more like this worked on sysvinit (and rc-local is really just about compat here, at least to the point where it doesn't interfere with other concepts).

While we try to stay as compatible with sysvinit as we can, there are limitations, this is one of them, this specific behaviour of sysvinit is something that actively destabilizes the system on shutdown, hence I don't think this is something we should change (we could change it, by doing KillMode=none by default...)

I hope that makes sense,

Sorry!
Comment 2 XANi 2014-10-09 20:39:35 UTC
(In reply to Lennart Poettering from comment #1)
> We have to kill all daemons so that we can properly unmount the various file
> systems, there isn't really a way around this.
> 
> For compatibility with sysv we explicitly turn off the timeout, since that's
> more like this worked on sysvinit (and rc-local is really just about compat
> here, at least to the point where it doesn't interfere with other concepts).
> 
> While we try to stay as compatible with sysvinit as we can, there are
> limitations, this is one of them, this specific behaviour of sysvinit is
> something that actively destabilizes the system on shutdown, hence I don't
> think this is something we should change (we could change it, by doing
> KillMode=none by default...)
> 
> I hope that makes sense,
> 
> Sorry!

The problem is a bit different and not only rc.local (which I admit should definitely not be a place to start random daemons), if that ( or any other daemon that do not have timeout set and "hangs" ) stops system from rebooting/halting, a number of situations might happen, all of which worse than killing random daemon like:

* UPS got 5-10 minutes of battery left, if system does not umount = power off without unmouting/remounting ro. And fsck on 1TB partition.
* Someone is logged in remotely, types "reboot" and exits. He can't ssh back to fix it (because SSH daemon already shut down), not everyone have KVM so only option is to hard reboot via remote power or drive to reboot machine.
* Type "poweroff" on laptop and put it into a backpack, dead battery in few hours

What I am saying is that in a lot of cases not being able to reboot system in decent time (5-10 minutes) can potentiall have much greater consequences than sending sigkill to to some random daemon
Comment 3 Lennart Poettering 2014-10-21 18:42:42 UTC
Hanging shutdowns are a real problem, no doubt. But I figure we should probably deal with this in a different way.

Recently I added an overall StartTimeoutSec= setting in system.conf which is useful for devices to make sure that we don't end up hanging up forever on boot. It's probably a good idea to add a similar setting to enforce a time limit on shutdown. If the limit passes we'd go immediately into a more burtal reboot mode, similar to behaviour of StartTimeoutSec= already.

I added that to the TODO list. Does that make sense?
Comment 4 XANi 2014-10-22 09:23:52 UTC
Yup, it should deal with most cases of where that can be a problem.

There probably always will be edge cases (like DB servers taking 15 minutes to shutdown cleanlu) but those will be less common than other mistakes like developer/package maintainer not explictly setting timeout and assuming daemon will always exit on stop or sigterm.
Comment 5 Zbigniew Jedrzejewski-Szmek 2014-10-28 02:30:02 UTC
http://cgit.freedesktop.org/systemd/systemd/commit/?id=f189ab18de.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.