Summary: | Services that BindTo another service should be restarted if systemd restarts the target service when it exits. | ||
---|---|---|---|
Product: | systemd | Reporter: | Chris Paulson-Ellis <chris> |
Component: | general | Assignee: | Lennart Poettering <lennart> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | colin, david.ward |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
schedule JOB_RESTART from SERVICE_AUTO_RESTART state
service: actually delay auto-restart if another job is pending service: fix auto-restart handling in service_stop() service: fix auto-restart handling in service_start() |
Description
Chris Paulson-Ellis
2012-02-01 11:33:38 UTC
Created attachment 56944 [details] [review] schedule JOB_RESTART from SERVICE_AUTO_RESTART state I agree. Services that Require a target service continue to run if the target service automatically restarts. It seems appropriate that services which BindTo a target service should be restarted but ultimately continue to run if the target service automatically restarts. Right now the services are stopped but do not start back. In my case I am trying to replace the 'watchquagga' supervisor by using the automatic restart capabilities of systemd, but this is the piece that doesn't work. The attached patch fixes the behavior for me. When a service is in the 'auto-restart' state, a 'restart' job is now created for it, rather than placing the service into the 'dead' or 'failed' states and then creating a 'start' job for it. As a result, during the dependency checking a 'restart' job will be created for other services that bind to the first one, rather than a 'stop' job. I believe this also means that units specified in OnFailure= will not be launched during an automatic restart, unless the service ultimately gives up on restarting and moves to a 'failed' state. Thanks, patch applied. And sorry again for letting this rot for so long. We had some other patches queued up in that area and I wanted those right first. This would appear to be the commit responsible for breaking OnFailure= handling.... I had an OnFailure= directive in my prefdm.service (which has Restart=always) but even when prefdm fails (and is reported as such), the unit referred to never gets started. Reverting this commit solved that. So while the warning about the OnFailure mentioned here is true, it seems that it is never run at all now.. Note that this is seen when booting the system, and thus starting the service, not restarting it - tho' not sure if that' makes a difference. Reopening for now, but please feel free to close it again if you feel a separate bug is more appropriate. So looking further, it seems that with this patch applied, after systemd gives up restarting the unit it enters an inactive+dead state: Description=Display Manager LoadState=loaded ActiveState=inactive SubState=dead Without the patch applied it is in a failed+failed state: Description=Display Manager LoadState=loaded ActiveState=failed SubState=failed I suspect that this difference is what causes the OnFailure to never kick in. I agree that the OnFailure action should not kick in until the final failure statue is reached.... we just need to ensure it does actually get there :D Now, I'm not sure if the SubState of failed would mean that systemd will try and restart it again later automatically. IMO we don't want this - i.e. it should require admin interaction. So would a: ActiveState=failed SubState=failed-permanently make sense here (if such a substate exists... I've seen the phrase fly past but not looked at it properly). Colin, Is this still happening? How do you create a condition where the service fails and gives up on auto-restarting (so I can try to reproduce it)? David Hi David, As far as I'm aware, this is still an issue although I've not tried a full git master of late. Here are a couple of sample units that should reproduce the problem: [root@jimmy system]# cat my-failing.service [Unit] Description=A Failing Service OnFailure=my-on-failure.service [Service] ExecStart=/bin/false Restart=always RestartSec=0 [root@jimmy system]# cat my-on-failure.service [Unit] Description=An On Failure Service [Service] ExecStart=/bin/touch /tmp/it-failed.txt Type=oneshot RemainAfterExit=yes So I dropped thos services in /lib/systemd/systemd/ and did a daemon-reload: [root@jimmy system]# systemctl status my-failing.service my-on-failure.service my-failing.service - A Failing Service Loaded: loaded (/lib/systemd/system/my-failing.service; static) Active: inactive (dead) CGroup: name=systemd:/system/my-failing.service my-on-failure.service - An On Failure Service Loaded: loaded (/lib/systemd/system/my-on-failure.service; static) Active: inactive (dead) CGroup: name=systemd:/system/my-on-failure.service Both units are loaded, but inactive. Without the patch applied, I start the failing service: [root@jimmy system]# systemctl start my-failing.service [root@jimmy system]# systemctl status my-failing.service my-on-failure.service my-failing.service - A Failing Service Loaded: loaded (/lib/systemd/system/my-failing.service; static) Active: failed (Result: exit-code) since Mon, 23 Apr 2012 08:50:49 +0100; 3min 45s ago Process: 16606 ExecStart=/bin/false (code=exited, status=1/FAILURE) CGroup: name=systemd:/system/my-failing.service my-on-failure.service - An On Failure Service Loaded: loaded (/lib/systemd/system/my-on-failure.service; static) Active: active (exited) since Mon, 23 Apr 2012 08:50:49 +0100; 3min 45s ago Process: 16593 ExecStart=/bin/touch /tmp/it-failed.txt (code=exited, status=0/SUCCESS) CGroup: name=systemd:/system/my-on-failure.service [root@jimmy system]# ls -l /tmp/it-failed.txt -rw-r--r-- 1 root root 0 Apr 23 08:50 /tmp/it-failed.txt So as you can see, all works as intended and the OnFailure unit is activated. With the patch applied however this failure unit never got started. Hope that's enough info to reproduce the problem. I've not tried this with the patch applied but these are copied from the setup we used in production so I think it should be valid. I'll try and reapply the patch and retry if you don't get it immediately working. Hi, I filed this issue and although I haven't checked recently, I have no reason to believe it's resolved. See the mailing list link in the report for a description of how to reproduce (client.service BindTo -> server.service with Restart=always; kill server - client doesn't restart). Chris. @Chris: I thought the patch David wrote and which was committed (see comment 2) solved the original problem you reported? My issue is that it introduces a regression which I've outlined and detailed. Sorry you're right... I misread what David was asking and was assuming that his patch had been reverted, leaving my original issue unresolved. Created attachment 60770 [details] [review] service: actually delay auto-restart if another job is pending Created attachment 60771 [details] [review] service: fix auto-restart handling in service_stop() Created attachment 60772 [details] [review] service: fix auto-restart handling in service_start() Colin, please try the three additional patches I am attaching to this bug, which fix your issue for me, as well as some other ones. Note that the original behavior is restored -- OnFailure units will be triggered after each time that a service fails. There is an RFE for using StartLimitAction to trigger units only after the service gives up trying to restart: https://bugzilla.redhat.com/show_bug.cgi?id=746279 Also note that these patches can also be applied on top of v44; just change the target from src/core/service.c to src/service.c. Can't easily test right now as we're in freeze and I'm not really keen on playing about too much right now with things! However a quick review of the patches seems to suggest they'd do the job and are all sensible (and suitably small too :)) Cheers This bug blocks the removal of the 'watchquagga' daemon supervisor in Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=817990 A separate tracking bug has been created in Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=817968 Lennart, please review the attached patches when you get a chance. Thanks! Comment on attachment 56944 [details] [review] schedule JOB_RESTART from SERVICE_AUTO_RESTART state This one is already applied. Comment on attachment 60770 [details] [review] service: actually delay auto-restart if another job is pending Applied this one. Comment on attachment 60771 [details] [review] service: fix auto-restart handling in service_stop() Applied Comment on attachment 60772 [details] [review] service: fix auto-restart handling in service_start() Applied. Sorry for not reviewing this more timely! All patches applied! Thanks for your work! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.