Bug 88192 - RFE: WatchdogUnit= program
Summary: RFE: WatchdogUnit= program
Status: RESOLVED WONTFIX
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
: 89656 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-01-08 06:36 UTC by rektide
Modified: 2019-08-22 11:14 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description rektide 2015-01-08 06:36:50 UTC
The current watchdog mechanism requires modifying an existing application to add in a new watchdog liveness signalling mechanism, and it leaves it to the application to bake in any deliberate failure mechanisms it might needs (signal to itself to stop sending "alive" beacons).

I'd like to see a mechanism introduced to enable health-monitor checks via external units. In a decoupled form, an application doesn't necessarily have to bake in custom systemd functionality. A webserver might reply as normal to a /ping on loopback, and the healthcheck is just a curl -f unit test. A slightly more sophisticated monitor unit might go 'curl -f -m 0.3 localhost/ping', and so on to higher levels.

I'd love to see systemd better able to mechanize autonomous reflexes for the processes under it's controls, and making more general purpose check systems (fire watchdog when X fails) is the #1 thing I think of.

(I have an existing suggestion for inter-unit linkages in another form in,
https://www.libreoffice.org/bugzilla/show_bug.cgi?id=85709
which suggests EnvironmentUnit=, a directive using one task's stdout to supplements a starting unit's existing static Environment/EnvironmentFile/&c values)
Comment 1 Marc Haber 2015-05-02 15:52:02 UTC
That would indeed be incredibly useful.
Comment 2 Lennart Poettering 2015-05-03 14:30:48 UTC
*** Bug 89656 has been marked as a duplicate of this bug. ***
Comment 3 Zbigniew Jedrzejewski-Szmek 2019-08-22 11:14:21 UTC
It is possible to configure a "watchdog service" that simply exits on failure, and have OnFailure= or other type of error handling to restart other units or do arbitrary magic. We don't need to bake this into the state machine of the system manager itself. I'll hence close this. If you think something like this is still needed, it'd be more useful to start a discussion on the mailing list.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.