Use-case: some service-unit like =============== [Unit] Description=Test-Unit [Service] ... Restart=on-failure StartLimitBurst=5 StartLimitInterval=10 OnFailure=./alarm-message.sh FailureAction=none [Install] WantedBy=multi-user.target =============== where ./alarm-message.sh is the admin-made script, which sends email OR jabber OR sms message to administrator if service fail. The idea is to define the directive to send alarm-message _after_ StartLimitBurst=/StartLimitInterval= are hit, instead of sending alarm-message after _each_ service failure-state.
Sorry, moved OnFailure= directive into right section - [Unit]: =============== [Unit] Description=Test-Unit OnFailure=./alarm-message.sh [Service] ... Restart=on-failure StartLimitBurst=5 StartLimitInterval=10s FailureAction=none [Install] WantedBy=multi-user.target ===============
You can define a unit to do whatever you want, e.g. send an e-mail. (This could be template unit, so OnFailure=send-email@%n.service can be used to make it generic.) But like you say, there's no nice way to start the unit only when the final failure occurs.
OnFailure= dependencies should already deliver the require behaviour, no? Or are you saying that OnFailure= doesn't get triggerd when the start limit is hit? Not sure what this bug report is precisely requesting?
>> The idea is to define the directive to send alarm-message _after_ StartLimitBurst=/StartLimitInterval= are hit, instead of sending alarm-message after _each_ service failure-state. >Or are you saying that OnFailure= doesn't get triggerd when the start limit is hit? Doc: ====== OnFailure= A space-separated list of one or more units that are activated when this unit enters the "failed" state. (http://www.freedesktop.org/software/systemd/man/systemd.unit.html) ===== So, as I can understand, OnFailure= action runs _every time_ when service gets failure. In our test-case, user should get five e-mail alarms (on each service failure). After StartLimitBurst=/StartLimitInterval= are hit, systemd stops its attemps to restart service without any alarms for user\admin. So, I'd like to have the possibility to force systemd to send me alarm-message _only after_ StartLimitBurst=/StartLimitInterval= are hit (e.g. "Your service [...] is stopped after 5 attemps in 10 seconds to be restarted. Fix your hands and design your service correctly. :)" ).
Hmm, I am pretty sure that if you use Restart=on-failure, then OnFailure= is only triggered after the start limit is reached... I think the docs could be improved about this.
> I think the docs could be improved about this. Possibly, so. So, if StartLimitBurst\StartLimitInterval are not defined, systemd tries to restart service unlimited times, and if OnFailure=alarm-message.sh, admin will recieve alarm-message on every service failure. And if StartLimitBurst\StartLimitInterval are defined, OnFailure=alarm-message.sh will work off after StartLimitBurst\StartLimitInterval limits are reached. Do I understand it right?
(In reply to Mikhail Kasimov from comment #6) > > I think the docs could be improved about this. > > Possibly, so. > > So, if StartLimitBurst\StartLimitInterval are not defined, systemd tries to > restart service unlimited times, and if OnFailure=alarm-message.sh, admin > will recieve alarm-message on every service failure. > > And if StartLimitBurst\StartLimitInterval are defined, > OnFailure=alarm-message.sh will work off after > StartLimitBurst\StartLimitInterval limits are reached. > > Do I understand it right? Nope. By default StartLimitBurst= defaults to 5 and StartLimitInterval= defaults to 10s. Restart= defaults to no. With these settings OnFailure= will be triggered on the first failure, and no restart is attempted. If you enable Restart=, then OnFailure= will only be triggered after the StartLimit is hit. If you disable the StartLimit then the service will be restarted into all eternity, and OnFailure= will never be triggred. That's at least how it should work. If the code behaviour doesn't match this then I#d consider this a bug, and we should fix it.
(In reply to Lennart Poettering from comment #7) > That's at least how it should work. If the code behaviour doesn't match this > then I#d consider this a bug, and we should fix it. It doesn't match.
Is this bug-tracker alive or everything is on https://github.com/systemd/systemd/issues/ accordint to http://www.freedesktop.org/wiki/Software/systemd/?
Closing all stale bugs with NEEDINFO. Please open a new bug at https://github.com/systemd/issues if the problem still occurs.
Greetings from https://github.com/systemd/systemd/issues/305 (2015-th year)... Due to comment: https://bugs.freedesktop.org/show_bug.cgi?id=87799#c8 -- I think systemd owners should reopen https://github.com/systemd/systemd/issues/305 Thanks! P.S. And, possibly, https://github.com/systemd/systemd/issues/8398 has the same nature of problem behavior. Please, re-check!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.