On a system under load (possibly due to a runaway service, in which case this bug becomes more or less critical), systemctl will fail with various errors: Failed to issue method call: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Failed to get D-Bus connection: Failed to authenticate in time. (the first message sometimes appears even though the call did actually succeed)
Well, if the machine is busy things time out. That's hardly surprising?
It is quite surprising. Normally, tasks will finish under comparable load. In fact, almost everything (other than systemctl) works fine, albeit very slowly. I don't see a reason why systemctl should stop working under load, considering that it's a relatively important system management command.
Hmm, so is the same case as https://bugs.freedesktop.org/show_bug.cgi?id=68232 ? if so, then this is not surprising. Basically, while PID 1 is cleaning up the private tmp dir we don't execute the event loop anymore, thus not dispatching bus requests anymore, which means they time out. This is really bad design on systemd's side. We really shouldn't do possibly unbounded IO from PID 1 I guess, blocking execution otherwise...
*** This bug has been marked as a duplicate of bug 68232 ***
No, it's not the same as 68232. Any load will make it impossible to talk to systemd. Everything still works, you can also create sessions, but anything "systemctl" will die with a DBUS timeout. This is clearly a separate issue.
In fact, it will sometimes work, but it will usually fail, depending on how processes get scheduled. But that means that you need to watch it and re-start commands that fail. Really boring.
So if it's not related to systemd getting stuck in tmpdir removal, then it's a bit unexpected. How many units do you have? Output of perf report could help us understand what's going on. Can you start 'perf record -g -p 1' before the systemctl call is started, and afterwards ^C perf, and attach the output of 'perf report --sort symbol -g fractal,5'. Please make sure that you have the debugging symbols available.
Sigh. I already said it twice, but maybe it will work the third time? The problem is that there is a timeout. Neither systemd nor systemctl are doing any significant work. If the system is under load, it may take a long while for the right processes to get to the CPU in the right order. That's all. You can't fix that, it's completely external. All I am asking for is there to not be a timeout. If not having a timeout is not an option (say, a dbus restriction), then making the timeout unrealistically long might be sufficient (say, 8h). As things are, I can hit a timeout under realistic conditions.
(In reply to comment #8) The timeout is quite large already (30s ?). And it is used in various conditions, e.g. during shutdown when dbus is already down. Removing the timeout would mean that things block instead of failing more gracefully. How loaded is your machine when systemctl dies with a timeout?
Load average is between say 20 and 100 when things start to go awry, not sure when exactly, I don't normally trigger high loads willfully. But these things do happen. I suspect it may be more a function of (un)available physical memory, though. Can the timeout for interactive systemctl be different from other (shutdown-time) timeouts? (And when I say interactive, I also mean running from scripts, where this is even more of a problem, i.e. not based on a controlling tty). It'd be less of a problem if there was another way (more direct than dbus) to reach systemd, but I didn't find anything.
http://cgit.freedesktop.org/systemd/systemd/commit/?id=1f19a5 made the timeout configurable.
I am pretty sure these are not the timeouts I am seeing. My concern is the timeout when systemctl is talking to dbus, not anything directly related to units. Also in the patch you refer to, diff --git a/man/systemd.mount.xml b/man/systemd.mount.xml has a small mistake in it, p.p. of set is set, not setted.
Closing all stale bugs with NEEDINFO. Please open a new bug at https://github.com/systemd/issues if the problem still occurs.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.