90273 – RFE: make it easy to plug in monitor tools that are run after completion of specific services and can process their logs (example: sendmail tool that mails service logs)

Bug 90273 - RFE: make it easy to plug in monitor tools that are run after completion of specific services and can process their logs (example: sendmail tool that mails service logs)

Summary: RFE: make it easy to plug in monitor tools that are run after completion of s...

Status:	NEW

Alias:	None

Product:	systemd
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium enhancement
Assignee:	systemd-bugs
QA Contact:	systemd-bugs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-05-01 17:59 UTC by Marc Haber
Modified:	2015-05-18 19:39 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments

Description Marc Haber 2015-05-01 17:59:59 UTC

Hi,

please consider implementing the possibility to have Standard Output of a systemd.exec unit mailed to a given mail address. This would make using timer units as drop-in cron replacement much easier, since many cron mechanisms rely on the cronjob's result being mailed.

Mail delivery can most easily be accomplished via /usr/lib/sendmail (stanard cron doesn't do it any other way), but of course one could also configure SMTP Server, Port number, Username, Password and From: address in systemd's global or unit configuration.

Greetings
Marc

Comment 1 Lennart Poettering 2015-05-03 15:19:30 UTC

Well, distros like Fedora default to logging cron job output, rather than mailing, and I am very sure that's a much better approach. 

I am pretty sure that systemd should not call into sendmail directly.

I think it would make sense to provide a nice way how users/admins can plug something into services that allow sending logs via email after services completed running. We already support "journalctl -u <unit>" as a nice way to pull the logs out from a unit, we'd then need a good way to get this invoked each time a service ran, with the right time parameter to only show the bits generated since the service was started the last time.

Not sure how precisely this could look like, but I think it would be pretty useful to have this. Downstream could then add ready-made tools to use this for sending emails when jobs complete.

Maybe something as simply as using ExecStopPost= for this, with an env var $LOGS_SINCE that is set to the activation time of a service could suffice. Then, users could use ExecStopPost=/usr/bin/monitor-script foo@bar.com that implements what is need.

I am taking the liberty to change the bug title, to ask for this.

Comment 2 Zbigniew Jedrzejewski-Szmek 2015-05-04 13:02:07 UTC

I wonder is we should have a generic SYSTEMD_UNIT_INSTANCE=<uuid> field (better name needed), which systemd (PID 1 and the --user instances) would add to all messages about a given a service. It would be reset when the service is stopped or restarted. Then journalctl could do something similar to --list-boots, and gather all the intervals between when <uuid> was first and last seen, and then use that to construe a time-range limit. This time limit would then be combined with normal filtering for a unit. This would give generic support for doing
  journalctl --unit=<...> --unit-run=<n>

Comment 3 Lennart Poettering 2015-05-18 19:39:39 UTC

(In reply to Zbigniew Jedrzejewski-Szmek from comment #2)
> I wonder is we should have a generic SYSTEMD_UNIT_INSTANCE=<uuid> field
> (better name needed), which systemd (PID 1 and the --user instances) would
> add to all messages about a given a service. It would be reset when the
> service is stopped or restarted. Then journalctl could do something similar
> to --list-boots, and gather all the intervals between when <uuid> was first
> and last seen, and then use that to construe a time-range limit. This time
> limit would then be combined with normal filtering for a unit. This would
> give generic support for doing
>   journalctl --unit=<...> --unit-run=<n>

Yeah, that sounds like a really good idea to me.

I think a "unit runtime id" would make a ton of sense, to correspond with the boot id of the system, which one could also call a "system runtime id" after all...

Ideally we'd assign one when we change from INACTIVE to ACTIVATING of a unit, and would provide APIs to query it for any running service, and make it non-fakable. Then, we could add "int sd_pid_get_runtime_id(pid_t pid, sd_id128_t *ret)" to sd-login.h to query it. 

I think the runtime id should be unique across units, so that it would suffice to use "journalctl _SYSTEMD_UNIT_RUNTIME_ID=..." to match the runtime of one specific service. (i.e. no need to combine this with --unit=)

Of course, the question is how to best attach the id to a service, so that we not only can read it via dbus, but also synchronously via direct fs access for usage in sd-login.h. One option would be to maintain a set of symlinks in /run/systemd/units, that are named after the unit, and point to the runtime id (using symlinks here has the benefit that we can easily create them atomically...).

Another question is how to best add support for this to the stream protocol spoken between pid 1 and journald, so that we avoid breaking compat for it (this isn't crucial, but still kinda nice to make live upgrades not worse than necessary, in particular since the --user instances of systemd speak the protocol with the system journald too.) One option could be to encode this in the same line as the unit name, but seperate it with a slash or so (since a slash cannot be part of a unit name). Then, journald would look for the slash, if it is there it parses out the unit runtime id, otherwise it would consider it unset. As long as we restart journald on upgrades we should be fine then...

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.