Bug 50184

Summary: systemd-journal sometimes loses _CMDLINE, _COMM, _EXE, _SYSTEMD_CGROUP, _SYSTEMD_UNIT
Product: systemd Reporter: Maksim Melnikau <maxposedon>
Component: generalAssignee: Lennart Poettering <lennart>
Status: RESOLVED MOVED QA Contact:
Severity: critical    
Priority: medium CC: bevan, crrodriguez, freedesktop, jeff, rektide
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard: kernel-bug
i915 platform: i915 features:

Description Maksim Melnikau 2012-05-21 11:26:52 UTC
libsystemd-journal or systemd-journald loses some systemd's parts of message context when few messages logged in a moment

# systemd-journalctl --version 
systemd 44
gentoo
+PAM +LIBWRAP -AUDIT -SELINUX -SYSVINIT -LIBCRYPTSETUP

Steps for repoduce
1. % cat example.c 
#include <stdlib.h>
#include <systemd/sd-journal.h>
int main() {
    int i=0, j=0;
    while(i<10 && j<10) {
        random()&1 ? ++i : ++j;
        sd_journal_send(
            "MESSAGE=(%d,%d)", i, j,
            "DI=%d", i, 
            "DJ=%d", j,
            NULL
        );
    }
    return 0;
}

2. % gcc -O0 -Wall -std=gnu99 -g3 -lsystemd-journal example.c -o example
3. # systemd-journalctl -f -o export
4. % ./example
5. few first messages looks like
.cursor=....
.realtime=1337624286115673
.monotonic=93667700928
.boot_id=97af7b9ccc9341a78bff939661d1e53b
MESSAGE=(1,0)
DI=1
DJ=0
_TRANSPORT=journal
_PID=6392
_UID=1001
_GID=1001
_COMM=example
_EXE=/home/max_posedon/Development/talks/systemd-journald/example
_CMDLINE=./example
_SYSTEMD_CGROUP=/system/kdm@.service/tty7
_SYSTEMD_UNIT=kdm@tty7.service
_SOURCE_REALTIME_TIMESTAMP=1337624286114631
_BOOT_ID=97af7b9ccc9341a78bff939661d1e53b
_MACHINE_ID=be1598bd68baa268cb48e33d0000000f
_HOSTNAME=m_melnikau-vaio
6. the rest looks like
.cursor=...
.realtime=1337624286122174
.monotonic=93667707429
.boot_id=97af7b9ccc9341a78bff939661d1e53b
MESSAGE=(10,5)
DI=10
DJ=5
_TRANSPORT=journal
_PID=6392
_UID=1001
_GID=1001
_SOURCE_REALTIME_TIMESTAMP=1337624286118123
_BOOT_ID=97af7b9ccc9341a78bff939661d1e53b
_MACHINE_ID=be1598bd68baa268cb48e33d0000000f
_HOSTNAME=m_melnikau-vaio
7. _CMDLINE, _COMM, _EXE, _SYSTEMD_CGROUP, _SYSTEMD_UNIT lost(!)
Comment 1 Lennart Poettering 2012-05-21 16:39:07 UTC
This is a race. We get the client PID with SCM_CREDENTIALS and then try to read _CMDLINE/_COMM/_EXEC and the cgroupd ata from /proc quickly but at that time the process might already have died.

To fix this properly we need SCM_COMM or so, to get the process cmdline/comm/exe, and SCM_CGROUPS to get the cgroup data in a race free fashion. Until this is added to the kernel this will continue to be racy. These items are listed on the plumbers wishlist, because we need this from the kernel folks.
Comment 2 Maksim Melnikau 2012-05-22 04:13:56 UTC
Is some nice workaround exists?

E.g. as I understand now program should do smth like "sleep 1", atexit, which is not very nice.

May be should be added some call to ensure that data is logged.
On some runner, which keeps process as ZOMBIE until all data is logged.
Comment 3 rektide 2014-11-15 21:27:06 UTC
I've noticed that often times- for example- `systemctl status keystone.service` might show a fail but have very minimal output, but `journalctl _SYSTEMD_UNIT=keystone.service` has all the output.

This discrepancy doesn't wash well with me. If systemctl for whatever reason cannot do this itself, ought it not rely on whatever underpins journalctl?

Asked in #systemd, was pointed at this bug.
Comment 4 steubens 2015-11-23 08:33:53 UTC
(In reply to Lennart Poettering from comment #1)
> To fix this properly we need SCM_COMM or so, to get the process
> cmdline/comm/exe, and SCM_CGROUPS to get the cgroup data in a race free
> fashion. Until this is added to the kernel this will continue to be racy.

can something be done about this in the interim? if systemd or journald knows that a program has exited without collecting some information, insert a sentinel into the journal that it has happened (which systemctl status could use), and maybe an informational 'xxx might be incomplete, use journalctl _EXE' or similar to facilitate people looking for information without asking for assistance and only being told there's basically a bug and to keep it in mind
Comment 5 Alain Kalker 2016-02-02 02:23:55 UTC
Why not have systemd-journald ask systemd for the process metadata needed to add the required fields to the journal?

Have systemd maintain a cache of process metadata so that even if a process dies before a journal message is processed, systemd can still supply systemd-journald with the necessary data.

IMO this would also be a better separation of concerns: have systemd manage process metadata which is to end up in fields such as _SYSTEMD_CGROUP, _SYSTEMD_SESSION, etc., and have systemd-journald take care only of maintaining the journal, adding fields such as PRIORITY, MESSAGE, etc.
Comment 6 Lennart Poettering 2016-02-02 11:55:33 UTC
We can cache all we want, but that doesn't help at all for very short-lived processes. If a process is created, logs something and immediately exits, then we have the message and its PID but cannot use the PID anymore to retriev metainformation about it. journald can't and systemd can't either.

The right fix is to get cgroup metadata sent along in the message. There where even kernel patches to add that, but the submitter dropped the ball unfortunately.
Comment 7 Mark Stosberg 2016-06-23 02:57:53 UTC
There's a related pull request which improves the logging-just-before-exit problem by caching the metadata available for review and comment:

https://github.com/systemd/systemd/pull/2280

It doesn't address the case of super-short-lived processes that start, log and exit, but it narrows the scope of the problem.
Comment 8 Lennart Poettering 2017-02-10 10:04:57 UTC
There's an issue about this one also on github, let's continue this there, and close this one:

https://github.com/systemd/systemd/issues/2913

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.