77507 – systemctl status exits with failure status for a service that completed successfully

Bug 77507 - systemctl status exits with failure status for a service that completed successfully

Summary: systemctl status exits with failure status for a service that completed succe...

Status:	NEW

Alias:	None

Product:	systemd
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	systemd-bugs
QA Contact:	systemd-bugs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-04-16 04:12 UTC by Tim Cuthbertson
Modified:	2018-05-23 19:02 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Tim Cuthbertson 2014-04-16 04:12:30 UTC

When a service has _successfully_ exited, `systemctl status <unit-name>` returns a nonzero exit code (specifically, 3).

I don't see how a successfully exited unit should constitute a failure - I think it should exit 0.

Steps to reproduce below:
(I'm using --user mode for testing, but the behaviour is the same in system mode)

----

$ cat ~/.config/systemd/user/myapp-service.service
[Service]
ExecStart=/usr/bin/true

$ systemctl --user daemon-reload

$ systemctl --user start myapp-service.service

$ systemctl --user status myapp-service.service; echo "STATUS: $?"
myapp-service.service
   Loaded: loaded (/home/sandbox/.config/systemd/user/myapp-service.service; enabled)
   Active: inactive (dead) since Wed 2014-04-16 13:57:34 EST; 8s ago
  Process: 8486 ExecStart=/usr/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 8486 (code=exited, status=0/SUCCESS)

Apr 16 13:57:34 meep systemd[7899]: Started myapp-service.service.
STATUS: 3


$ systemctl --version
systemd 208
+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ

----


I'm using `systemctl status <all-units-I've-installed>` as a high level check in a deployment script to check that nothing is broken, and I display the output (and fail the deployment) when the result is nonzero. So this behaviour breaks my deployment script now that I have added a routine (timer) service which happens to run quickly.

Comment 1 Zbigniew Jedrzejewski-Szmek 2014-04-17 02:33:39 UTC

systemctl status (and some of the other verbs too) follows LSB semantics... 
http://refspecs.linuxbase.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html has a nice table, and 3 in this case means 'program is not running'. But indeed, 0, meaning 'service is OK' could be considered valid too.
Dunno, a bit of a corner case.

Comment 2 van.de.bugger 2014-12-18 21:54:53 UTC

I would not agree, it is *not* a corner case. Look at systemctl man page (http://www.freedesktop.org/software/systemd/man/systemctl.html): at the bottom, in "Exit status" section, it says:

    On success, 0 is returned, a non-zero failure code otherwise.

That's all. Subsection "status" says nothing about systemctl exit status at all. If systemctl follows LSB semantics for init scripts, it should be documented at least. 

BUT... there is a difference between init scripts and systemctl. A script reports status of one service controlled by this script. systemctl allows specifying multiple services in one command line, e. g.:

    systemctl status autofs sshd crond

or even

    systemctl --all status

How are you going to follow LSB semantics in such a case?

BTW, exit status problem is not limited by status command. is-active, is-failed, is-enabled also suffer from lack of documentation/specification. 

For example, man page says about is-active:

>    Check whether any of the specified units are active (i.e. running). Returns an exit code 0 if at least one is active, or non-zero otherwise. 

Description of -s-failed is very similar:

>    Check whether any of the specified units are in a "failed" state. Returns an exit code 0 if at least one has failed, non-zero otherwise.

It is better than "status", but not enough. Look:

    $ systemctl -q is-active syslog; echo $?
    3

    $ systemctl -q is-active syslg; echo $?
    3

These are two very distinct cases: in the first case syslog service exists but is not active, in the second case there is no "syslg" service at all.  Let us check is-failed then:

    $ systemctl -q is-failed syslog; echo $?
    1

    $ systemctl -q is-failed syslg; echo $?
    1

It meets current man ("non-zero otherwise"), but it is non-consistent at least. Why one command returns 3 but another command returns 1 in similar case?

I do not recommend to follow LSB semantics for init scripts because it is neither user-oriented nor complete:

    0. program is running or service is OK
    1. program is dead and /var/run pid file exists
    2. program is dead and /var/lock lock file exists
    3. program is not running

For example, an oneshot service can be active even if program is not running. If I understand correctly, systemd uses control groups to stop or kill services, so pid files and lock files are not so important now. 

I would recommend following simpler but more universal semantics:

    0. Success or true. 
    1. Success but false.
    > 1. Trouble.

For example:

    $ systemctl is-active xxx

returns 0 if service xxx is active, 1 if service is known but not active, and some status bigger than 1 in case of troubles: command-line error (e. g. unknown option), runtime error (unit file is not readable), etc.

Comment 3 David Williams 2017-10-28 19:06:19 UTC

I too ran into this issue and at first thought something was "wrong" with my service. 

In my case, the service is of "Type=oneshot" so it is normal for it to not be running. In my case, I found this while scripting a check on the 'certbot.service' which is ran periodically by the 'certbot.timer', which is "always running". 

Seems it should at least be better documented? Though, I'll admit, I googled it before looking at man page. :)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.