Downstream in Ubuntu we have gotten several reports of shutdown hangs due to remote file systems trying to unmount when the wifi is already down, and simlar issues. In journals [1][2][3] you see things like systemd[1]: Stopping D-Bus System Message Bus... avahi-daemon[694]: Disconnected from D-Bus, exiting. whoopsie[685]: g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting. NetworkManager[689]: g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting. wpa_supplicant[1246]: wlan0: CTRL-EVENT-DISCONNECTED bssid=00:0f:66:57:07:7c reason=3 locally_generated=1 NetworkManager[689]: <info> (wlan0): roamed from BSSID 00:0F:66:57:07:7C (RedKite) to (none) ((none)) kernel: CIFS VFS: Server 192.168.1.93 has not responded in 120 seconds. Reconnecting... systemd[1]: Unmounted <some remote file system path> → this is usually timing out as the network is not reachable any more Stopped target Remote File Systems. systemd[1]: Stopping Remote File Systems. systemd[1]: Stopped target Remote File Systems (Pre). systemd[1]: Stopping Remote File Systems (Pre). systemd[1]: Stopping Network Manager... systemd[1]: Stopped Network Manager. systemd[1]: Stopping D-Bus System Message Bus Socket. The main problem is that there is nothing that prevents D-Bus from stopping very early, way earlier than all of the Type=dbus services. There is an attempt to prevent that as systemd implies "After=dbus.socket" for Type=dbus units, but that doesn't save us: you can't re-start D-Bus after shutting it down and expect things to just work, as it loses all of its state. And even if that would work, it's rather pointless (and slow) to try and shut it down early just to have it reactivate several times during shutdown. So in short, I think dbus.socket doesn't really help us with anything except not having to specify precise startup requirements; but we still need them for shutdown. So what we really need is some way to say "Don't stop dbus.service before any D-Bus client" (i. e. *.service of Type=dbus). We could make dbus.service start before basic.target and stop after basic.target and add DefaultDependencies=no, so that it stops after most stuff; or change the implied After=dbus.socket in systemd to After=dbus.service. But that would also be prone to causing cyclic dependencies, if you e. g. have /var on NFS and need it to start dbus. (I didn't test that, it's just a gut feeling as remote file systems that you need to boot are notoriously susceptible to dependency loops.) Michael Biebl came up with a crazy, but neat idea for a workaround: ExecStop=/bin/true KillMode=none i. e. don't actually stop dbus through the unit, and just let it get killed at the bitter end. But on second thought it's actually quite tempting: Stopping/restarting D-Bus during runtime has never been supported, and always caused your desktop session and half of your services to get killed, so in the distro we went out of our way to prevent maintainer scripts etc. from doing that. So it's at least a nice initial hack which isn't introducing dependency loops. But I figure we should at least discuss a cleaner solution. Of course this will all be obsolete with kdbus, as then dbus always works. (Incidentally, not ever stopping it gets us rather close in that regard :-) ). [1] https://launchpadlibrarian.net/201680859/journal.txt [2] http://paste.ubuntu.com/10711795/ [3] https://launchpadlibrarian.net/200211036/journal.txt
ExecStop=/bin/true was my first idea, but Martin rightfully pointed out, that this doesn't influence KillMode, i.e. we need KillMode=none. With that KillMode setting, I think we can actually drop ExecStop=/bin/true, so what remains is KillMode=none systemd has a final killing spree, before it unmounts the file systems and I don't think dbus needs to do something special on stop? We might have a problem, if /usr is on NFS and (at least on Debian) dbus-daemon being installed in /usr/bin, which would keep the FS busy.
(In reply to Michael Biebl from comment #1) > ExecStop=/bin/true was my first idea, but Martin rightfully pointed out, > that this doesn't influence KillMode, i.e. we need KillMode=none. > With that KillMode setting, I think we can actually drop ExecStop=/bin/true, > so what remains is > > KillMode=none This doesn't actually work in my tests, the ExecStop= is needed (tested with systemd 219). I was quite surprised that this fixed a lot more than just the hangs on unmounting network file systems. I also got reports that other vague shutdown hangs now disappeared (https://launchpad.net/bugs/1427672) and that shutdown hangs with libvirt's virbr0-nic interfaces are now gone as well.
(In reply to Michael Biebl from comment #1) > We might have a problem, if /usr is on NFS and (at least on Debian) > dbus-daemon being installed in /usr/bin, which would keep the FS busy. If dbus-daemon really badly needs to be moved to the rootfs, then it can be... but in Debian, some libraries need to move first before that becomes useful (libcap-ng0, libapparmor1). If this is something we support upstream then I'd like to do it by introducing a ${rootbindir} like systemd has, rather than moving binaries around and patching systemd units etc. to compensate for it like Ubuntu has traditionally done. But I would prefer not to have to support it tbh. (Also, is there any reason to prefer NFS /usr that doesn't apply equally to preferring NFS rootfs?) (In reply to Martin Pitt from comment #0) > So what we really need is some way to say "Don't stop dbus.service before > any D-Bus client" (i. e. *.service of Type=dbus). We could make dbus.service > start before basic.target and stop after basic.target and add > DefaultDependencies=no, so that it stops after most stuff; or change the > implied After=dbus.socket in systemd to After=dbus.service. But that would > also be prone to causing cyclic dependencies, if you e. g. have /var on NFS > and need it to start dbus. (I didn't test that, it's just a gut feeling as > remote file systems that you need to boot are notoriously susceptible to > dependency loops.) The fewer constraints we have to apply to startup, the better, to make it more likely that dependency loops can be avoided in arbitrarily weird situations (NFS /usr and/or /var, or networking infrastructure that needs D-Bus - you can have one or the other but you cannot have both). So I would be OK with giving dbus.service DefaultDependencies=no, but I would not necessarily encourage requiring it to start before basic.target. If we give it DefaultDependencies=no and RequiresMountsFor=/usr /var /proc, but do not order it either before or after basic.target, would that help? Then we'd get: startup: /usr and /var must be mounted before the system dbus-daemon starts shutdown: dbus-daemon must stop before /usr and /var are unmounted Would not having the implicit Conflicts on shutdown.target mean that dbus-daemon would survive the initial shutdown transaction and only be killed when systemd got as far as unmounting the filesystems, or does it schedule everything required for shutdown, including the unmounting, as a single large transaction? (In reply to Michael Biebl from comment #1) > systemd has a final killing spree, before it unmounts the file systems and I > don't think dbus needs to do something special on stop? It does not need to do anything special on stop. Is this killing spree suffiently early that it happens before attempting to unmount /usr?
Hello Simon, (In reply to Simon McVittie from comment #3) > (Also, is there any reason to prefer NFS /usr that doesn't apply equally to > preferring NFS rootfs?) I'm not sure if root on NFS was ever attempted/supported. You'd basically need half an OS in your initramfs then? :-) > The fewer constraints we have to apply to startup, the better, to make it > more likely that dependency loops can be avoided in arbitrarily weird > situations (NFS /usr and/or /var, or networking infrastructure that needs > D-Bus - you can have one or the other but you cannot have both). So I would > be OK with giving dbus.service DefaultDependencies=no, but I would not > necessarily encourage requiring it to start before basic.target. Why not before basic.target? This would make it a basic system service which everyone could rely on. It's already kind of that through dbus.socket? It could lead to a more serialized boot of course, as non-dbus services would have to wait for dbus.service then. > If we give it DefaultDependencies=no and RequiresMountsFor=/usr /var /proc, > but do not order it either before or after basic.target, would that help? We don't need /proc, that's always available. "/usr /var" sounds like a good idea. I'd additionally have After=local-fs.target to ensure that you have a r/w root partition; or RequiresMountsFor=/ /usr /var. But any kind of dependency that we add here (and we have to) will lead to dbus.service being stopped at some time before that dependency/mount gets shut down, and then we are back to the situation that d-bus stops too early. In my experiments it again got stopped before any Type=dbus dependency, just because nothing else was directly depending on it. I have to use Before=basic.target so that it survives long enough during shutdown for all multi-user.target-like services to go down. > Then we'd get: > > startup: /usr and /var must be mounted before the system dbus-daemon starts > > shutdown: dbus-daemon must stop before /usr and /var are unmounted That's required indeed, but not sufficient. We also need to tell it to stop after consumers of dbus.service (which is really the whole point of this exercise). > Would not having the implicit Conflicts on shutdown.target mean that > dbus-daemon would survive the initial shutdown transaction and only be > killed when systemd got as far as unmounting the filesystems Unfortunately not, see above. This only works if you don't specify *any* dependency, but that's obviously not going to work for boot. > (In reply to Michael Biebl from comment #1) > > systemd has a final killing spree, before it unmounts the file systems and I > > don't think dbus needs to do something special on stop? > > It does not need to do anything special on stop. Is this killing spree > suffiently early that it happens before attempting to unmount /usr? It should better be, otherwise every random running process that the user and system left behind would block the unmounting. But I don't see how we can actually get to the point where dbus.service is never actually explicitly stopped; and we don't really need to, as long as stopping it happens sufficiently late. In my initial experiments this seems to work well: DefaultDependencies=no After=local-fs.target RequiresMountsFor=/usr /var Before=basic.target network.target Note that I didn't test this yet with a remote /var (or /usr).
(In reply to Martin Pitt from comment #4) > I'm not sure if root on NFS was ever attempted/supported. You'd basically > need half an OS in your initramfs then? :-) Yes it is/was, with or without an initramfs: https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt https://anonscm.debian.org/cgit/kernel/initramfs-tools.git/tree/scripts/nfs > > The fewer constraints we have to apply to startup, the better [...] > > So I would > > be OK with giving dbus.service DefaultDependencies=no, but I would not > > necessarily encourage requiring it to start before basic.target. > > Why not before basic.target? This would make it a basic system service which > everyone could rely on. It's already kind of that through dbus.socket? It > could lead to a more serialized boot of course, as non-dbus services would > have to wait for dbus.service then. If dbus.service *on a particular system configuration* is After something that is After basic.target, and basic.target is After dbus.service, then we have a cyclic dependency on that system configuration. (This is regardless of whether it works without cyclic dependencies for the relevant distribution's dbus maintainers, who presumably know enough to avoid bizarre boot orderings.) Notable examples of things that dbus might need: * NIS and other sources of users * networking and other non-trivial infrastructure to get /usr Debian and Ubuntu have traditionally run dbus as a rc2.d service, so there is nothing to stop it having such dependencies.
Created attachment 114829 [details] [review] system bus: do not allow stopping the system dbus-daemon There is nothing that prevents D-Bus from stopping very early, way earlier than all of the Type=dbus services. There is an attempt to prevent that as systemd implies "After=dbus.socket" for Type=dbus units, but that doesn't save us: you can't re-start D-Bus after shutting it down and expect things to just work, as it loses all of its state. Putting dbus.service Before basic.target was considered and rejected, because it's a recipe for cyclic dependencies: as soon as dbus.service wants to be After anything that is After basic.target (e.g. NIS and other user databases, or remote filesystems) you get a cycle. dbus-daemon does not need to start early, it only needs to stop late. systemd has a final killing spree before it unmounts the file systems, which should be sufficient to avoid dbus-daemon preventing a separate /usr from being unmounted; this does not consider KillMode. dbus-daemon doesn't need to do anything special during shutdown, so it's OK that it survives until then. Based on a suggestion from Michael Biebl and Martin Pitt. --- How's this? I made the stop command be "echo" instead of "true" so that it leaves a hint in systemctl status if someone tries to stop it by hand. Unfortunately, "systemctl restart dbus" (which was never supported either) will now start a second dbus-daemon in parallel with the first, and in my testing, the second one will get all new connections. Any ideas for how to avoid that? Perhaps it would be better to make the stop command exit nonzero? ... but then we'd log scary messages during a normal shutdown, which is no better really.
(In reply to Simon McVittie from comment #6) > Perhaps it would be better to make the stop command exit > nonzero? Straw man: ExecStop=/bin/sh -c "echo Stopping the system dbus-daemon is not supported. Reboot the system instead.; exit 1" ... which does work, but logs "Unit dbus.service entered failed state" during shutdown. It seems better to avoid "crying wolf" if possible, because it can hide real issues.
(In reply to Simon McVittie from comment #6) > How's this? I made the stop command be "echo" instead of "true" so that it > leaves a hint in systemctl status if someone tries to stop it by hand. Ah, good one. > Unfortunately, "systemctl restart dbus" (which was never supported either) > will now start a second dbus-daemon in parallel with the first, and in my > testing, the second one will get all new connections. Any ideas for how to > avoid that? Perhaps it would be better to make the stop command exit > nonzero? ... but then we'd log scary messages during a normal shutdown, > which is no better really. Right, I don't like making it fail on stop. I don't see anything explicit which would declare "cannot restart"; I haven't tested this (travelling/no real computer), but would something like ConditionPathExists=!/run/dbus/system_bus_socket prevent further starts/restarts? I'm not sure if conditions are evaluated on start only, or on restart too -- my gut feeling is the former, i. e. that won't work. Thanks! BTW, this is being discussed on the upstream ML too: http://lists.freedesktop.org/archives/systemd-devel/2015-April/030070.html (I haven't caught up with that yet)
(In reply to Martin Pitt from comment #8) > I don't see anything explicit > which would declare "cannot restart"; I haven't tested this (travelling/no > real computer), but would something like > > ConditionPathExists=!/run/dbus/system_bus_socket > > prevent further starts/restarts? Good idea, I'll try it. systemd guarantees that /run is a tmpfs, so the "what if I have stale misc lying around in /var/run?" question that we would have had under sysvinit and possibly Upstart does not exist. > BTW, this is being discussed on the upstream ML too: > > http://lists.freedesktop.org/archives/systemd-devel/2015-April/030070.html I am aware, and have replied there.
(In reply to Martin Pitt from comment #8) > ConditionPathExists=!/run/dbus/system_bus_socket That can't be suitable, because dbus.socket creates that filesystem object, so dbus-daemon would never start. Removing --nopidfile and adding ConditionPathExists=!/run/dbus/pid in addition to the patch I posted above (for upstream consumption that would be @DBUS_SYSTEM_PID_FILE@) does work.
(In reply to Simon McVittie from comment #7) > (In reply to Simon McVittie from comment #6) > > Perhaps it would be better to make the stop command exit > > nonzero? > > Straw man: > > ExecStop=/bin/sh -c "echo Stopping the system dbus-daemon is not supported. > Reboot the system instead.; exit 1" > > ... which does work, but logs "Unit dbus.service entered failed state" > during shutdown. It seems better to avoid "crying wolf" if possible, because > it can hide real issues. Spawning a shell (which will burn more cpu cycles then a simple /bin/true), this will also trigger a failure on every shutdown/reboot. Do we really want this?
Please do not apply that ExecStop= thing. You really shouldn't block that. Think about people who boot their system with "emergency" on the kernel cmdline, and thus get a PID 1 plus a shell and nothing else, they should be able to start dbus and stop it as they wish... If at all, use RefuseManualStop=yes on the unit, but I don't like that much either. It's mostly relevant for things like audit where the kernel might stop if the service is shut down. The best way is to fix the few services that really need dbus unconditionally to be around to add After=dbus.service. And all is good.
(In reply to Lennart Poettering from comment #12) > The best way is to fix the few services that really need dbus > unconditionally to be around to add After=dbus.service. And all is good. If we go with that approach, then it would IMHO be cleaner to change the implied "After=dbus.socket" for Type=dbus units to "After=dbus.service", for the non-kdbus case at least.
(In reply to Martin Pitt from comment #13) > (In reply to Lennart Poettering from comment #12) > > The best way is to fix the few services that really need dbus > > unconditionally to be around to add After=dbus.service. And all is good. > > If we go with that approach, then it would IMHO be cleaner to change the > implied "After=dbus.socket" for Type=dbus units to "After=dbus.service", for > the non-kdbus case at least. Nah, I am pretty sure this should be fixed in the few services that need this, rather than adding these otherwise unnecessary synchronization points to all services, including the ones that don't need it.
(In reply to Simon McVittie from comment #6) > Unfortunately, "systemctl restart dbus" (which was never supported either) > will now start a second dbus-daemon in parallel with the first I think that's unacceptable. (In reply to Lennart Poettering from comment #12) > If at all, use RefuseManualStop=yes on the unit, but I don't like that much > either. I don't think this would satisfy the original requirement unless the ExecStop=/bin/true was also included, and you've said that that's undesired. > Please do not apply that ExecStop= thing. You really shouldn't block that. Fair enough. NOTOURBUG then. > The best way is to fix the few services that really need dbus > unconditionally to be around to add After=dbus.service. And all is good. OK. In the case of the original bug report, I think this means wpasupplicant and maybe NetworkManager would have to either turn off exit-on-disconnect (for libdbus and/or GDBus, depending which they use), or be After=dbus.service.
Note that bug 98254 alleviates this a fair bit -- with that, dbus.service can stop later.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.