Bug 87799 - RFE: OnFailure version which is used when StartLimitBurst= or StartLimitInterval= are hit
Summary: RFE: OnFailure version which is used when StartLimitBurst= or StartLimitInter...
Status: RESOLVED WONTFIX
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-28 15:25 UTC by Mikhail Kasimov
Modified: 2018-03-09 08:16 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Mikhail Kasimov 2014-12-28 15:25:30 UTC
Use-case: some service-unit like
===============
[Unit]
Description=Test-Unit

[Service]
...
Restart=on-failure
StartLimitBurst=5
StartLimitInterval=10
OnFailure=./alarm-message.sh
FailureAction=none

[Install]
WantedBy=multi-user.target
===============

where ./alarm-message.sh is the admin-made script, which sends email OR jabber OR sms message to administrator if service fail.

The idea is to define the directive to send alarm-message _after_ StartLimitBurst=/StartLimitInterval= are hit, instead of sending alarm-message after _each_ service failure-state.
Comment 1 Mikhail Kasimov 2014-12-31 19:28:01 UTC
Sorry, moved OnFailure= directive into right section - [Unit]:

===============
[Unit]
Description=Test-Unit
OnFailure=./alarm-message.sh

[Service]
...
Restart=on-failure
StartLimitBurst=5
StartLimitInterval=10s
FailureAction=none

[Install]
WantedBy=multi-user.target
===============
Comment 2 Zbigniew Jedrzejewski-Szmek 2015-01-06 04:23:46 UTC
You can define a unit to do whatever you want, e.g. send an e-mail. (This could be template unit, so OnFailure=send-email@%n.service can be used to make it generic.)

But like you say, there's no nice way to start the unit only when the final failure occurs.
Comment 3 Lennart Poettering 2015-02-03 00:50:06 UTC
OnFailure= dependencies should already deliver the require behaviour, no?

Or are you saying that OnFailure= doesn't get triggerd when the start limit is hit?

Not sure what this bug report is precisely requesting?
Comment 4 Mikhail Kasimov 2015-02-03 07:26:50 UTC
>> The idea is to define the directive to send alarm-message _after_ StartLimitBurst=/StartLimitInterval= are hit, instead of sending alarm-message after _each_ service failure-state.

>Or are you saying that OnFailure= doesn't get triggerd when the start limit is hit?

Doc:
======
OnFailure=

A space-separated list of one or more units that are activated when this unit enters the "failed" state.

(http://www.freedesktop.org/software/systemd/man/systemd.unit.html)
=====

So, as I can understand, OnFailure= action runs _every time_ when service gets failure. In our test-case, user should get five e-mail alarms (on each service failure).

After StartLimitBurst=/StartLimitInterval= are hit, systemd stops its attemps to restart service without any alarms for user\admin. So, I'd like to have the possibility to force systemd to send me alarm-message _only after_ StartLimitBurst=/StartLimitInterval= are hit (e.g. "Your service [...] is stopped after 5 attemps in 10 seconds to be restarted. Fix your hands and design your service correctly. :)" ).
Comment 5 Lennart Poettering 2015-02-03 12:11:26 UTC
Hmm, I am pretty sure that if you use Restart=on-failure, then OnFailure= is only triggered after the start limit is reached... I think the docs could be improved about this.
Comment 6 Mikhail Kasimov 2015-02-03 12:37:37 UTC
> I think the docs could be improved about this.

Possibly, so.

So, if StartLimitBurst\StartLimitInterval are not defined, systemd tries to restart service unlimited times, and if OnFailure=alarm-message.sh, admin will recieve alarm-message on every service failure.

And if StartLimitBurst\StartLimitInterval are defined, OnFailure=alarm-message.sh will work off after StartLimitBurst\StartLimitInterval limits are reached.

Do I understand it right?
Comment 7 Lennart Poettering 2015-02-03 12:43:20 UTC
(In reply to Mikhail Kasimov from comment #6)
> > I think the docs could be improved about this.
> 
> Possibly, so.
> 
> So, if StartLimitBurst\StartLimitInterval are not defined, systemd tries to
> restart service unlimited times, and if OnFailure=alarm-message.sh, admin
> will recieve alarm-message on every service failure.
> 
> And if StartLimitBurst\StartLimitInterval are defined,
> OnFailure=alarm-message.sh will work off after
> StartLimitBurst\StartLimitInterval limits are reached.
> 
> Do I understand it right?

Nope. 

By default StartLimitBurst= defaults to 5 and StartLimitInterval= defaults to 10s. Restart= defaults to no.

With these settings OnFailure= will be triggered on the first failure, and no restart is attempted.

If you enable Restart=, then OnFailure= will only be triggered after the StartLimit is hit. 

If you disable the StartLimit then the service will be restarted into all eternity, and OnFailure= will never be triggred.

That's at least how it should work. If the code behaviour doesn't match this then I#d consider this a bug, and we should fix it.
Comment 8 Zbigniew Jedrzejewski-Szmek 2015-02-03 14:18:18 UTC
(In reply to Lennart Poettering from comment #7)
> That's at least how it should work. If the code behaviour doesn't match this
> then I#d consider this a bug, and we should fix it.
It doesn't match.
Comment 9 Mikhail Kasimov 2015-06-20 12:25:35 UTC
Is this bug-tracker alive or everything is on https://github.com/systemd/systemd/issues/ accordint to http://www.freedesktop.org/wiki/Software/systemd/?
Comment 10 Zbigniew Jedrzejewski-Szmek 2018-03-09 08:00:45 UTC
Closing all stale bugs with NEEDINFO. Please open a new bug at https://github.com/systemd/issues if the problem still occurs.
Comment 11 Mikhail Kasimov 2018-03-09 08:16:56 UTC
Greetings from https://github.com/systemd/systemd/issues/305 (2015-th year)...

Due to comment: https://bugs.freedesktop.org/show_bug.cgi?id=87799#c8 -- I think systemd owners should reopen https://github.com/systemd/systemd/issues/305

Thanks!

P.S. And, possibly, https://github.com/systemd/systemd/issues/8398 has the same nature of problem behavior. Please, re-check!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.