Summary: | random 90 sec delay on shutdown/reboot (A stop job is running for User Manager for 0) | ||
---|---|---|---|
Product: | systemd | Reporter: | Gerardo Exequiel Pozzi <vmlinuz386> |
Component: | general | Assignee: | systemd-bugs |
Status: | RESOLVED WORKSFORME | QA Contact: | systemd-bugs |
Severity: | normal | ||
Priority: | medium | CC: | aklhfex, bergman.martin, bernardo.reino, cju, crrodriguez, cwbell, da_audiophile, equeim, fademind, freedesktop-bugs, hugo, jajcus, jokeyrhyme, keepitsimpleengineer, marti, mister.freeman, pa_collins, radek, ronald.hitz, rvoinea, steven, yogeshm.007, zikas |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
output of console [no delay](systemd.journald.forward_to_console)
output of console [delay 90 sec] (systemd.journald.forward_to_console) journalctl logs about this bug Fedora 20 systemd user unit file |
Description
Gerardo Exequiel Pozzi
2013-10-18 01:40:39 UTC
Created attachment 87808 [details]
output of console [delay 90 sec] (systemd.journald.forward_to_console)
you OS is archlinux ? if you downgrade systemd package to 204 version the bug is still present ? your bug seems similar to bug 70413 https://bugs.freedesktop.org/show_bug.cgi?id=70413 some archlinux users found a workaround by adding "Before=basic.target" to dbus.service file : https://bbs.archlinux.org/viewtopic.php?id=170756&p=2 this workaround doesn't work for all users (In reply to comment #2) > you OS is archlinux ? > Yes. I am the archiso developer. :) > if you downgrade systemd package to 204 version the bug is still present ? No. systemd 204 works like a charm. All these kill-timeouts issues start from 205, and some are solved now. > > your bug seems similar to bug 70413 > > https://bugs.freedesktop.org/show_bug.cgi?id=70413 Yes. (In reply to comment #3) > some archlinux users found a workaround by adding "Before=basic.target" to > dbus.service file : > > https://bbs.archlinux.org/viewtopic.php?id=170756&p=2 > > this workaround doesn't work for all users Doing such workaround is wrong. Makes a ordering cycle. Anyway, yes it works. But shutdown process looks abnormal. See: systemd[1]: Found ordering cycle on basic.target/stop systemd[1]: Found dependency on dbus.service/stop systemd[1]: Found dependency on basic.target/stop systemd[1]: Breaking ordering cycle by deleting job dbus.service/stop systemd[1]: Job dbus.service/stop deleted to break ordering cycle starting with basic.target/stop *** Bug 70413 has been marked as a duplicate of this bug. *** I've encountered this issue on an opensuse 13.1 install, which is systemd 208. I have noticed that this only happens for me whilst I don't have log level set to debug. As soon as I turn on debug log the delay vanishes (helpful). I'm not sure if it's related, but when I shutdown any SSH client is not being disconnected. systemd[1]: user@0.service stopping timed out. Killing. Created attachment 90211 [details]
journalctl logs about this bug
lolol. you have a same problem of mine.
I installed an Arch Linux 2013.12.01 version. x86-64bit
linux 3.12.2-1
systemd 208-2
You can see the 90 seconds delay in the picture. That is the problem when I reboot/poweroff/shutdown the system. not always, but frequently.
If I command that with a '-f'(forceful), there is no any delay.
Please, fix this bugs.
I tested this bugs in both VMware Player and Virtual Box, but no difference appears.
This hang should be fixed with http://cgit.freedesktop.org/systemd/systemd/commit/?id=63966da. If not, please reopen. nope, not fixed. (In reply to comment #10) > nope, not fixed. Which version are you running? (In reply to comment #11) > (In reply to comment #10) > > nope, not fixed. > > Which version are you running? system-208-3 [#1] (this Arch Linux pkg release include such patch) [#1] https://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/systemd&id=c1b13c2cb3c728c62fc71b2e6e73fc9fdb6eb5c3 Fixed in 209! Thanks guys :) I'm sorry I have to bring this up again, but ever since updating to 211, this bug reappears again on Arch. The latest upgrade to 212 didn't change things. Bug still exists in systemd 216. Logging out before restart/halt seems to significantly reduce the frequency of occurrence, but it doesn't absolutely prevent it. (From ~ 1 in 3 shutdowns to ~ 1 in 15). This behavior is reproducible on two separate machines, a Fedora 20 laptop with systemd 208, and an Arch box with 216. Also, and this might be nothing, I have only had this happen when using text/getty login, rather than a display manager. When I was using graphical as my default target, I never had a delay. Only when using multi-user. Again, both machines - both versions - show the same behavior. As I said, this might just be my dumb luck, but I thought it was worth mentioning. I have seen this today on raspbian jessie (debian for raspberrypi) with systemd 208 "A stop job is running for User Manager for $BIGNUMBER" "A stop job is running for Session c2 of user sam" I also have this issue. Running debian/sid with 215-5+b1, no graphical login (I run startx with xfce). When it hangs I go to tty9 (crash debug shell, always active until systemd is stable) and kill -9 the pid of "systemd --user". The shutdown then continues immediately. The issue is caused AFAIK by the pam_systemd PAM module, which unfortunately is necessary to avoid creating a mess of seats, sessions, policykits and whatnot. Not that even disabling lingering (loginctl --disable-linger <user>) does not resolve this issue. If there isn't a workaround/fix available, is there a way to change the timeout value? Say from 90 seconds to a far more reasonable 15? @Chris, I added "WorkingDirectory=/" at the end of /usr/lib/systemd/user/systemd-exit.service and the issue has never happened again. It doesn't make any sense, but hey, maybe it will be fixed some day. Maybe not. (In reply to Bernardo Reino from comment #19) > @Chris, > > I added "WorkingDirectory=/" at the end of > /usr/lib/systemd/user/systemd-exit.service and the issue has never happened > again. > > It doesn't make any sense, but hey, maybe it will be fixed some day. Maybe > not. No, it doesn't. Make sense, that is. I was under the assumption that it was waiting because /var/run/user/<uid> wasn't unmounted, so it gave it 90 sec before simply forcing the unmount. But / surely wouldn't be unmounted at that point, and it wouldn't (shouldn't) be able to force an unmount of /, so there goes that theory.... The big question on my mind is: what _is_ the stop job, exactly? What's it waiting for? Why 90 seconds? Where (in the code) could I start bughunting? That's more than one question, I guess. It's frustrating that it doesn't give a bit more info. (In reply to Chris Bell from comment #20) > The big question on my mind is: what _is_ the stop job, exactly? What's it > waiting for? Why 90 seconds? Where (in the code) could I start bughunting? I think you'll get better results asking this on the systemd mailing lists than here. Created attachment 108303 [details]
Fedora 20 systemd user unit file
Stock user@.service unit file provided in Fedora 20 distro.
I'm not sure when, but it seems Fedora has modified their user@.service unit file to address this issue. They changed KillMode=mixed to: ExecStop=/bin/kill -TERM ${MAINPID} KillSignal=SIGCONT It appears it's due to `systemd --user` being killed by daddy systemd before it can clean up properly. I haven't been able to reproduce the delay since they implemented this workaround. I've attached the unit file, if anyone else wants to try. I'm not sure when they added it; I just now noticed it. My Fedora 19 server has the old file (the one found in the systemd git repo). I'm going to modify the unit on my Arch box to see if it makes this go away. Also, the Fedora Project's patch can be found in the F20 branch of the Fedora systemd package git repo: http://pkgs.fedoraproject.org/cgit/systemd.git/tree/0266-Temporary-work-around-for-slow-shutdown-due-to-unter.patch?h=f20 *** Bug 83838 has been marked as a duplicate of this bug. *** Seems this has been fixed in 58ea275a68cd242ad60161bcb7582614d1d89f13, with the introduction of the KillMode=mixed? I get this on Arch Linux (systemd 228-3) when there is a open ssh connection to a remote host. I've thought that it might be related to OpenSSH's "ControlMaster" setting(s), but it happens also without this. Arch uses this /usr/lib/systemd/system/user@.service: [Unit] Description=User Manager for UID %i After=systemd-user-sessions.service [Service] User=%i PAMName=systemd-user Type=notify ExecStart=-/usr/lib/systemd/systemd --user Slice=user-%i.slice KillMode=mixed Delegate=yes This is happening to me on arch linux with systemd 229-2 (and also 229-3 from the testing repo) If right clicking on the plasma desktop and selecting "Leave" and then shutdown or reboot, the console gives a message "A stop job is running for Session c2 for user...." and counts down the 90 seconds before completing the shutdown. In the journal I see: Feb 18 20:13:24 home1 systemd[1]: Stopped target Network. Feb 18 20:14:53 home1 systemd[589]: dbus.service: State 'stop-final-sigterm' timed out. Killing. Feb 18 20:14:53 home1 systemd[1]: user@1000.service: State 'stop-sigterm' timed out. Killing. Feb 18 20:14:53 home1 systemd[1]: Stopped User Manager for UID 1000. Feb 18 20:14:53 home1 systemd[1]: user@1000.service: Unit entered failed state. Feb 18 20:14:53 home1 systemd[1]: user@1000.service: Failed with result 'timeout'. Feb 18 20:14:53 home1 systemd[1]: Removed slice User Slice of mike. Feb 18 20:14:53 home1 systemd[1]: Stopping Login Service... Feb 18 20:14:53 home1 systemd[1]: Stopping Permit User Sessions... Feb 18 20:14:53 home1 systemd[1]: Stopped Permit User Sessions. Feb 18 20:14:53 home1 systemd[1]: Stopped target Remote File Systems. Feb 18 20:14:53 home1 systemd[1]: Stopped Login Service. Feb 18 20:14:53 home1 systemd[1]: Stopped target Basic System. and again from today: Feb 19 15:45:36 home1 systemd[1]: Stopped Simple Desktop Display Manager. Feb 19 15:47:01 home1 systemd[599]: dbus.service: State 'stop-final-sigterm' timed out. Killing. Feb 19 15:47:01 home1 systemd[1]: user@1000.service: State 'stop-sigterm' timed out. Killing. Feb 19 15:47:01 home1 systemd[1]: Stopped User Manager for UID 1000. Feb 19 15:47:01 home1 systemd[1]: user@1000.service: Unit entered failed state. Feb 19 15:47:01 home1 systemd[1]: user@1000.service: Failed with result 'timeout'. Feb 19 15:47:01 home1 systemd[1]: Removed slice User Slice of mike. Feb 19 15:47:01 home1 systemd[1]: Stopping Login Service... Feb 19 15:47:01 home1 systemd[1]: Stopping Permit User Sessions... Feb 19 15:47:01 home1 systemd[1]: Stopped Permit User Sessions. Once the shutdown stop job delay has ended the shutdown then completes rapidly. which seems to be a dbus related shutdown delay with the 90 seconds wait period seen in the timestamps above. I also installed systemd 229-3 from testing but this does not help the issue. This is likely related: https://lists.freedesktop.org/archives/systemd-devel/2016-January/035619.html (In reply to Mike C from comment #28) > This is happening to me on arch linux with systemd 229-2 (and also 229-3 > from the testing repo) > > If right clicking on the plasma desktop and selecting "Leave" and then > shutdown or reboot, the console gives a message "A stop job is running for > Session c2 for user...." and counts down the 90 seconds before completing > the shutdown. same bug for me, I use also archlinux, systemd 229-3 same for me (also arch). this is not de-related, btw. it happens to me with bare console (x is running, though)... Same thing is happening to me on Arch with systemd 229-3 I am also experiencing this (except the session is C1) with systemd 229-3 on two systems both AMD64 abut one Intel and one AMD processor... On Arch Linux here as well and for me, downgrading to the 4.3.x series of kernels fixes this. Bug is persistent on 4.4.1-4.4.5 and on 4.5rc7. Can some others affected confirm that downgrading to a 4.3.x series kernel reverses this? Now happening with regular consistency and in addition on a third AMD64 Intel system all running graysky's repo-ck kernels, this last one on a Manjaro xfce desktop. Also experiencing this the presence of this 90 second hang in Arch (linux 4.4.5-1, systemd 229-3). I haven't experienced it in Gnome (same laptop), but did in KDE. Not sure if the DE plays any significant role in the occurrence rate, though. The problem with KDE seems to be as follows: kglobalaccel5, a service process that allocates and listens to global keyboard shortcuts, does not terminate at session exit - I'm not sure why. Anyway, kglobalaccel until very recently *also* did not die when receiving SIGTERM due to trying to catch the signal and not doing it properly, thus getting stuck. Because catching signals properly isn't trivial and really not necessary, I've removed that part. While debugging the issue of slow shutdown, I noticed the lingering kglobalaccel5 and that killing it with SIGKILL resulted in immediate shutdown. Everyone who has reopened this bug/reported it as happening recently: you have seen the same *symptoms*. That does not necessarily mean that you have all encountered the same *bug* as the original reporter, or even the same bug as each other. (In reply to Mike C from comment #28) > Feb 18 20:14:53 home1 systemd[589]: dbus.service: State 'stop-final-sigterm' > timed out. Killing. This means that either dbus-daemon itself, or a child process that was launched by D-Bus activation, didn't terminate promptly when it received the TERM signal. IMO it is much more likely to be a different child process (some random D-Bus session service), because dbus-daemon's own SIGTERM handling is rather simple. If you enable the systemd debug shell and use it to run "systemd-cgls" during this 90 second wait, you'll probably see that all processes in the dbus.service cgroup below user@1000.service were terminated by systemd sending the TERM signal, except for some daemon (for example it might be kglobalaccel5 in Comment #36) or a small number of daemons. If so, the cause of your slow shutdown is very likely to be a bug in those daemons - they do not die when sent the TERM signal - similar to the kglobalaccel5 bug described in Comment #36. Thanks for that enlightening comment, Simon. It's pointed me in the right direction of what's causing the issue (rather then wait here for an unrelated bug to get closed). I've noticed that several of my session processed don't get killed: https://i.imgur.com/96in2Pg.jpg How WOULD I have systemd send a SIGTERM to all of them when I attempt to reboot? Manually sending a SIGTERM to the parent one seems to work fine. It turns out non of my session processes are SIGTERM'd. Even things like vim, gvfs, or less are not being killed and result in the 90s delay. KillUserProcesses in logind.conf might help, but I'm not sure how this interacts if my local session dies and I've left a tmux session on the machine. I was following this issue since 6 months ago, because I suffered of the same problem. I am on Arch Linux x86_64. I comment now to say that I don't encounter this problem for at least two months, likely one of the latest updates has solved it. This was my experience and I wanted to share it. I am pretty sure this was fixed a while back in current versions. Closing. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.