Bug 70593

Summary:	random 90 sec delay on shutdown/reboot (A stop job is running for User Manager for 0)
Product:	systemd	Reporter:	Gerardo Exequiel Pozzi <vmlinuz386>
Component:	general	Assignee:	systemd-bugs
Status:	RESOLVED WORKSFORME	QA Contact:	systemd-bugs
Severity:	normal
Priority:	medium	CC:	aklhfex, bergman.martin, bernardo.reino, cju, crrodriguez, cwbell, da_audiophile, equeim, fademind, freedesktop-bugs, hugo, jajcus, jokeyrhyme, keepitsimpleengineer, marti, mister.freeman, pa_collins, radek, ronald.hitz, rvoinea, steven, yogeshm.007, zikas
Version:	unspecified
Hardware:	x86-64 (AMD64)
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:
Attachments:	output of console [no delay](systemd.journald.forward_to_console) output of console [delay 90 sec] (systemd.journald.forward_to_console) journalctl logs about this bug Fedora 20 systemd user unit file

Description Gerardo Exequiel Pozzi 2013-10-18 01:40:39 UTC

Created attachment 87807 [details]
output of console [no delay](systemd.journald.forward_to_console)

Just running poweroff/systemctl shutdown/reboot/systemctl reboot, from zsh on an autologin console for root. Sometimes there is no delay, but other times 90 sec delays appears. There is no error displayed about timeout.

systemd[1]: user@0.service stopping timed out. Killing.

Arch Linux x86_64 under qemu-kvm
Linux-3.11.5-1
systemd-208-1

Comment 1 Gerardo Exequiel Pozzi 2013-10-18 01:42:19 UTC

Created attachment 87808 [details]
output of console [delay 90 sec] (systemd.journald.forward_to_console)

Comment 2 Barto 2013-10-18 20:09:54 UTC

you OS is archlinux ?

if you downgrade systemd package to 204 version the bug is still present ?

your bug seems similar to bug 70413

https://bugs.freedesktop.org/show_bug.cgi?id=70413

Comment 3 Barto 2013-10-18 20:17:01 UTC

some archlinux users found a workaround by adding "Before=basic.target" to dbus.service file :

https://bbs.archlinux.org/viewtopic.php?id=170756&p=2

this workaround doesn't work for all users

Comment 4 Gerardo Exequiel Pozzi 2013-10-18 22:17:30 UTC

(In reply to comment #2)
> you OS is archlinux ?
> 

Yes. I am the archiso developer. :)

> if you downgrade systemd package to 204 version the bug is still present ?

No. systemd 204 works like a charm. All these kill-timeouts issues start from 205, and some are solved now.

> 
> your bug seems similar to bug 70413
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=70413

Yes.

Comment 5 Gerardo Exequiel Pozzi 2013-10-18 22:50:50 UTC

(In reply to comment #3)
> some archlinux users found a workaround by adding "Before=basic.target" to
> dbus.service file :
> 
> https://bbs.archlinux.org/viewtopic.php?id=170756&p=2
> 
> this workaround doesn't work for all users

Doing such workaround is wrong. Makes a ordering cycle. Anyway, yes it works. But shutdown process looks abnormal.

See:
systemd[1]: Found ordering cycle on basic.target/stop
systemd[1]: Found dependency on dbus.service/stop
systemd[1]: Found dependency on basic.target/stop
systemd[1]: Breaking ordering cycle by deleting job dbus.service/stop
systemd[1]: Job dbus.service/stop deleted to break ordering cycle starting with basic.target/stop

Comment 6 Arokux 2013-11-06 20:34:38 UTC

*** Bug 70413 has been marked as a duplicate of this bug. ***

Comment 7 Dash 2013-11-26 17:35:56 UTC

I've encountered this issue on an opensuse 13.1 install, which is systemd 208.

I have noticed that this only happens for me whilst I don't have log level set to debug.  As soon as I turn on debug log the delay vanishes (helpful).

I'm not sure if it's related, but when I shutdown any SSH client is not being disconnected.

systemd[1]: user@0.service stopping timed out. Killing.

Comment 8 saladom 2013-12-04 10:34:58 UTC

Created attachment 90211 [details]
journalctl logs about this bug

lolol. you have a same problem of mine.

I installed an Arch Linux 2013.12.01 version. x86-64bit

linux 3.12.2-1
systemd 208-2

You can see the 90 seconds delay in the picture. That is the problem when I reboot/poweroff/shutdown the system. not always, but frequently.

If I command that with a '-f'(forceful), there is no any delay.


Please, fix this bugs.

I tested this bugs in both VMware Player and Virtual Box, but no difference appears.

Comment 9 Zbigniew Jedrzejewski-Szmek 2013-12-25 04:05:23 UTC

This hang should be fixed with http://cgit.freedesktop.org/systemd/systemd/commit/?id=63966da. If not, please reopen.

Comment 10 Gerardo Exequiel Pozzi 2013-12-25 04:28:54 UTC

nope, not fixed.

Comment 11 Zbigniew Jedrzejewski-Szmek 2013-12-25 04:41:42 UTC

(In reply to comment #10)
> nope, not fixed.

Which version are you running?

Comment 12 Gerardo Exequiel Pozzi 2013-12-25 05:34:49 UTC

(In reply to comment #11)
> (In reply to comment #10)
> > nope, not fixed.
> 
> Which version are you running?

system-208-3 [#1] (this Arch Linux pkg release include such patch)

[#1] https://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/systemd&id=c1b13c2cb3c728c62fc71b2e6e73fc9fdb6eb5c3

Comment 13 Gerardo Exequiel Pozzi 2014-02-20 22:31:05 UTC

Fixed in 209! Thanks guys :)

Comment 14 cju 2014-04-06 11:20:41 UTC

I'm sorry I have to bring this up again, but ever since updating to 211, this bug reappears again on Arch. The latest upgrade to 212 didn't change things.

Comment 15 Chris Bell 2014-09-08 04:02:20 UTC

Bug still exists in systemd 216. 

Logging out before restart/halt seems to significantly reduce the frequency of occurrence, but it doesn't absolutely prevent it. (From ~ 1 in 3 shutdowns to ~ 1 in 15). This behavior is reproducible on two separate machines, a Fedora 20  laptop with systemd 208, and an Arch box with 216. 

Also, and this might be nothing, I have only had this happen when using text/getty login, rather than a display manager. When I was using graphical as my default target, I never had a delay. Only when using multi-user. Again, both machines - both versions - show the same behavior. As I said, this might just be my dumb luck, but I thought it was worth mentioning.

Comment 16 sam tygier 2014-09-15 16:36:40 UTC

I have seen this today on raspbian jessie (debian for raspberrypi) with systemd 208 
"A stop job is running for User Manager for $BIGNUMBER"
"A stop job is running for Session c2 of user sam"

Comment 17 Bernardo Reino 2014-10-18 19:54:47 UTC

I also have this issue. Running debian/sid with 215-5+b1, no graphical login (I run startx with xfce).

When it hangs I go to tty9 (crash debug shell, always active until systemd is stable) and kill -9 the pid of "systemd --user". The shutdown then continues immediately.

The issue is caused AFAIK by the pam_systemd PAM module, which unfortunately is necessary to avoid creating a mess of seats, sessions, policykits and whatnot.

Not that even disabling lingering (loginctl --disable-linger <user>) does not resolve this issue.

Comment 18 Chris Bell 2014-10-21 17:25:27 UTC

If there isn't a workaround/fix available, is there a way to change the timeout value? Say from 90 seconds to a far more reasonable 15?

Comment 19 Bernardo Reino 2014-10-22 08:10:37 UTC

@Chris,

I added "WorkingDirectory=/" at the end of /usr/lib/systemd/user/systemd-exit.service and the issue has never happened again.

It doesn't make any sense, but hey, maybe it will be fixed some day. Maybe not.

Comment 20 Chris Bell 2014-10-22 14:09:20 UTC

(In reply to Bernardo Reino from comment #19)
> @Chris,
> 
> I added "WorkingDirectory=/" at the end of
> /usr/lib/systemd/user/systemd-exit.service and the issue has never happened
> again.
> 
> It doesn't make any sense, but hey, maybe it will be fixed some day. Maybe
> not.

No, it doesn't. Make sense, that is. I was under the assumption that it was waiting because /var/run/user/<uid> wasn't unmounted, so it gave it 90 sec before simply forcing the unmount. But / surely wouldn't be unmounted at that point, and it wouldn't (shouldn't) be able to force an unmount of /, so there goes that theory....

The big question on my mind is: what _is_ the stop job, exactly? What's it waiting for? Why 90 seconds? Where (in the code) could I start bughunting? That's more than one question, I guess. It's frustrating that it doesn't give a bit more info.

Comment 21 Marti Raudsepp 2014-10-22 14:29:27 UTC

(In reply to Chris Bell from comment #20)
> The big question on my mind is: what _is_ the stop job, exactly? What's it
> waiting for? Why 90 seconds? Where (in the code) could I start bughunting?

I think you'll get better results asking this on the systemd mailing lists than here.

Comment 22 Chris Bell 2014-10-23 15:11:36 UTC

Created attachment 108303 [details]
Fedora 20 systemd user unit file

Stock user@.service unit file provided in Fedora 20 distro.

Comment 23 Chris Bell 2014-10-23 15:12:27 UTC

I'm not sure when, but it seems Fedora has modified their user@.service unit file to address this issue. They changed KillMode=mixed to:

ExecStop=/bin/kill -TERM ${MAINPID}
KillSignal=SIGCONT

It appears it's due to `systemd --user` being killed by daddy systemd before it can clean up properly.

I haven't been able to reproduce the delay since they implemented this workaround. I've attached the unit file, if anyone else wants to try.

I'm not sure when they added it; I just now noticed it. My Fedora 19 server has the old file (the one found in the systemd git repo). I'm going to modify the unit on my Arch box to see if it makes this go away.

Comment 24 Chris Bell 2014-10-23 15:37:05 UTC

Also, the Fedora Project's patch can be found in the F20 branch of the Fedora systemd package git repo: http://pkgs.fedoraproject.org/cgit/systemd.git/tree/0266-Temporary-work-around-for-slow-shutdown-due-to-unter.patch?h=f20

Comment 25 Zbigniew Jedrzejewski-Szmek 2014-11-01 21:23:12 UTC

*** Bug 83838 has been marked as a duplicate of this bug. ***

Comment 26 Javier Jardón 2015-09-07 00:19:39 UTC

Seems this has been fixed in 58ea275a68cd242ad60161bcb7582614d1d89f13, with the introduction of the KillMode=mixed?

Comment 27 Daniel Hahler 2016-01-06 15:58:01 UTC

I get this on Arch Linux (systemd 228-3) when there is a open ssh connection to a remote host.

I've thought that it might be related to OpenSSH's "ControlMaster" setting(s), but it happens also without this.

Arch uses this /usr/lib/systemd/system/user@.service:

[Unit]
Description=User Manager for UID %i
After=systemd-user-sessions.service

[Service]
User=%i
PAMName=systemd-user
Type=notify
ExecStart=-/usr/lib/systemd/systemd --user
Slice=user-%i.slice
KillMode=mixed
Delegate=yes

Comment 28 Mike C 2016-02-19 22:07:49 UTC

This is happening to me on arch linux with systemd 229-2 (and also 229-3 from the testing repo)

If right clicking on the plasma desktop and selecting "Leave" and then shutdown or reboot, the console gives a message "A stop job is running for Session c2 for user...." and counts down the 90 seconds before completing the shutdown. In the journal I see:

Feb 18 20:13:24 home1 systemd[1]: Stopped target Network.
Feb 18 20:14:53 home1 systemd[589]: dbus.service: State 'stop-final-sigterm' timed out. Killing.
Feb 18 20:14:53 home1 systemd[1]: user@1000.service: State 'stop-sigterm' timed out. Killing.
Feb 18 20:14:53 home1 systemd[1]: Stopped User Manager for UID 1000.
Feb 18 20:14:53 home1 systemd[1]: user@1000.service: Unit entered failed state.
Feb 18 20:14:53 home1 systemd[1]: user@1000.service: Failed with result 'timeout'.
Feb 18 20:14:53 home1 systemd[1]: Removed slice User Slice of mike.
Feb 18 20:14:53 home1 systemd[1]: Stopping Login Service...
Feb 18 20:14:53 home1 systemd[1]: Stopping Permit User Sessions...
Feb 18 20:14:53 home1 systemd[1]: Stopped Permit User Sessions.
Feb 18 20:14:53 home1 systemd[1]: Stopped target Remote File Systems.
Feb 18 20:14:53 home1 systemd[1]: Stopped Login Service.
Feb 18 20:14:53 home1 systemd[1]: Stopped target Basic System.

and again from today:

Feb 19 15:45:36 home1 systemd[1]: Stopped Simple Desktop Display Manager.
Feb 19 15:47:01 home1 systemd[599]: dbus.service: State 'stop-final-sigterm' timed out. Killing.
Feb 19 15:47:01 home1 systemd[1]: user@1000.service: State 'stop-sigterm' timed out. Killing.
Feb 19 15:47:01 home1 systemd[1]: Stopped User Manager for UID 1000.
Feb 19 15:47:01 home1 systemd[1]: user@1000.service: Unit entered failed state.
Feb 19 15:47:01 home1 systemd[1]: user@1000.service: Failed with result 'timeout'.
Feb 19 15:47:01 home1 systemd[1]: Removed slice User Slice of mike.
Feb 19 15:47:01 home1 systemd[1]: Stopping Login Service...
Feb 19 15:47:01 home1 systemd[1]: Stopping Permit User Sessions...
Feb 19 15:47:01 home1 systemd[1]: Stopped Permit User Sessions.

Once the shutdown stop job delay has ended the shutdown then completes rapidly.

which seems to be a dbus related shutdown delay with the 90 seconds wait period seen in the timestamps above. I also installed systemd 229-3 from testing but this does not help the issue.

This is likely related: https://lists.freedesktop.org/archives/systemd-devel/2016-January/035619.html

Comment 29 Barto 2016-02-23 19:08:48 UTC

(In reply to Mike C from comment #28)
> This is happening to me on arch linux with systemd 229-2 (and also 229-3
> from the testing repo)
> 
> If right clicking on the plasma desktop and selecting "Leave" and then
> shutdown or reboot, the console gives a message "A stop job is running for
> Session c2 for user...." and counts down the 90 seconds before completing
> the shutdown. 

same bug for me, I use also archlinux, systemd 229-3

Comment 30 Radek Podgorny 2016-02-24 17:51:00 UTC

same for me (also arch). this is not de-related, btw. it happens to me with bare console (x is running, though)...

Comment 31 Cosimo Morelli 2016-03-05 21:44:13 UTC

Same thing is happening to me on Arch with systemd 229-3

Comment 32 Larry Johnson 2016-03-07 23:31:43 UTC

I am also experiencing this (except the session is C1) with systemd 229-3 on two systems both AMD64 abut one Intel and one AMD processor...

Comment 33 John 2016-03-13 15:40:49 UTC

On Arch Linux here as well and for me, downgrading to the 4.3.x series of kernels fixes this.  Bug is persistent  on 4.4.1-4.4.5 and on 4.5rc7.  Can some others affected confirm that downgrading to a 4.3.x series kernel reverses this?

Comment 34 Larry Johnson 2016-03-16 19:25:03 UTC

Now happening with regular consistency and in addition on a third AMD64 Intel system all running graysky's repo-ck kernels, this last one on a Manjaro xfce desktop.

Comment 35 Nino Miletić 2016-03-20 19:42:38 UTC

Also experiencing this the presence of this 90 second hang in Arch (linux 4.4.5-1, systemd 229-3).

I haven't experienced it in Gnome (same laptop), but did in KDE. Not sure if the DE plays any significant role in the occurrence rate, though.

Comment 36 Andreas Hartmetz 2016-03-29 11:50:07 UTC

The problem with KDE seems to be as follows:
kglobalaccel5, a service process that allocates and listens to global keyboard shortcuts, does not terminate at session exit - I'm not sure why.
Anyway, kglobalaccel until very recently *also* did not die when receiving SIGTERM due to trying to catch the signal and not doing it properly, thus getting stuck. Because catching signals properly isn't trivial and really not necessary, I've removed that part.
While debugging the issue of slow shutdown, I noticed the lingering kglobalaccel5 and that killing it with SIGKILL resulted in immediate shutdown.

Comment 37 Simon McVittie 2016-04-11 12:40:10 UTC

Everyone who has reopened this bug/reported it as happening recently: you have seen the same *symptoms*. That does not necessarily mean that you have all encountered the same *bug* as the original reporter, or even the same bug as each other.

(In reply to Mike C from comment #28)
> Feb 18 20:14:53 home1 systemd[589]: dbus.service: State 'stop-final-sigterm'
> timed out. Killing.

This means that either dbus-daemon itself, or a child process that was launched by D-Bus activation, didn't terminate promptly when it received the TERM signal. IMO it is much more likely to be a different child process (some random D-Bus session service), because dbus-daemon's own SIGTERM handling is rather simple.

If you enable the systemd debug shell and use it to run "systemd-cgls" during this 90 second wait, you'll probably see that all processes in the dbus.service cgroup below user@1000.service were terminated by systemd sending the TERM signal, except for some daemon (for example it might be kglobalaccel5 in Comment #36) or a small number of daemons. If so, the cause of your slow shutdown is very likely to be a bug in those daemons - they do not die when sent the TERM signal - similar to the kglobalaccel5 bug described in Comment #36.

Comment 38 Hugo Osvaldo Barrera 2016-04-12 01:10:44 UTC

Thanks for that enlightening comment, Simon. It's pointed me in the right direction of what's causing the issue (rather then wait here for an unrelated bug to get closed).

I've noticed that several of my session processed don't get killed:
https://i.imgur.com/96in2Pg.jpg

How WOULD I have systemd send a SIGTERM to all of them when I attempt to reboot? Manually sending a SIGTERM to the parent one seems to work fine.

Comment 39 Hugo Osvaldo Barrera 2016-04-12 01:44:20 UTC

It turns out non of my session processes are SIGTERM'd. Even things like vim, gvfs, or less are not being killed and result in the 90s delay.

KillUserProcesses in logind.conf might help, but I'm not sure how this interacts if my local session dies and I've left a tmux session on the machine.

Comment 40 mattia.b89 2016-12-31 13:39:52 UTC

I was following this issue since 6 months ago, because I suffered of the same problem.
I am on Arch Linux x86_64.

I comment now to say that I don't encounter this problem for at least two months, likely one of the latest updates has solved it.
This was my experience and I wanted to share it.

Comment 41 Lennart Poettering 2017-10-27 17:15:47 UTC

I am pretty sure this was fixed a while back in current versions. Closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.