libsystemd-journal or systemd-journald loses some systemd's parts of message context when few messages logged in a moment # systemd-journalctl --version systemd 44 gentoo +PAM +LIBWRAP -AUDIT -SELINUX -SYSVINIT -LIBCRYPTSETUP Steps for repoduce 1. % cat example.c #include <stdlib.h> #include <systemd/sd-journal.h> int main() { int i=0, j=0; while(i<10 && j<10) { random()&1 ? ++i : ++j; sd_journal_send( "MESSAGE=(%d,%d)", i, j, "DI=%d", i, "DJ=%d", j, NULL ); } return 0; } 2. % gcc -O0 -Wall -std=gnu99 -g3 -lsystemd-journal example.c -o example 3. # systemd-journalctl -f -o export 4. % ./example 5. few first messages looks like .cursor=.... .realtime=1337624286115673 .monotonic=93667700928 .boot_id=97af7b9ccc9341a78bff939661d1e53b MESSAGE=(1,0) DI=1 DJ=0 _TRANSPORT=journal _PID=6392 _UID=1001 _GID=1001 _COMM=example _EXE=/home/max_posedon/Development/talks/systemd-journald/example _CMDLINE=./example _SYSTEMD_CGROUP=/system/kdm@.service/tty7 _SYSTEMD_UNIT=kdm@tty7.service _SOURCE_REALTIME_TIMESTAMP=1337624286114631 _BOOT_ID=97af7b9ccc9341a78bff939661d1e53b _MACHINE_ID=be1598bd68baa268cb48e33d0000000f _HOSTNAME=m_melnikau-vaio 6. the rest looks like .cursor=... .realtime=1337624286122174 .monotonic=93667707429 .boot_id=97af7b9ccc9341a78bff939661d1e53b MESSAGE=(10,5) DI=10 DJ=5 _TRANSPORT=journal _PID=6392 _UID=1001 _GID=1001 _SOURCE_REALTIME_TIMESTAMP=1337624286118123 _BOOT_ID=97af7b9ccc9341a78bff939661d1e53b _MACHINE_ID=be1598bd68baa268cb48e33d0000000f _HOSTNAME=m_melnikau-vaio 7. _CMDLINE, _COMM, _EXE, _SYSTEMD_CGROUP, _SYSTEMD_UNIT lost(!)
This is a race. We get the client PID with SCM_CREDENTIALS and then try to read _CMDLINE/_COMM/_EXEC and the cgroupd ata from /proc quickly but at that time the process might already have died. To fix this properly we need SCM_COMM or so, to get the process cmdline/comm/exe, and SCM_CGROUPS to get the cgroup data in a race free fashion. Until this is added to the kernel this will continue to be racy. These items are listed on the plumbers wishlist, because we need this from the kernel folks.
Is some nice workaround exists? E.g. as I understand now program should do smth like "sleep 1", atexit, which is not very nice. May be should be added some call to ensure that data is logged. On some runner, which keeps process as ZOMBIE until all data is logged.
I've noticed that often times- for example- `systemctl status keystone.service` might show a fail but have very minimal output, but `journalctl _SYSTEMD_UNIT=keystone.service` has all the output. This discrepancy doesn't wash well with me. If systemctl for whatever reason cannot do this itself, ought it not rely on whatever underpins journalctl? Asked in #systemd, was pointed at this bug.
(In reply to Lennart Poettering from comment #1) > To fix this properly we need SCM_COMM or so, to get the process > cmdline/comm/exe, and SCM_CGROUPS to get the cgroup data in a race free > fashion. Until this is added to the kernel this will continue to be racy. can something be done about this in the interim? if systemd or journald knows that a program has exited without collecting some information, insert a sentinel into the journal that it has happened (which systemctl status could use), and maybe an informational 'xxx might be incomplete, use journalctl _EXE' or similar to facilitate people looking for information without asking for assistance and only being told there's basically a bug and to keep it in mind
Why not have systemd-journald ask systemd for the process metadata needed to add the required fields to the journal? Have systemd maintain a cache of process metadata so that even if a process dies before a journal message is processed, systemd can still supply systemd-journald with the necessary data. IMO this would also be a better separation of concerns: have systemd manage process metadata which is to end up in fields such as _SYSTEMD_CGROUP, _SYSTEMD_SESSION, etc., and have systemd-journald take care only of maintaining the journal, adding fields such as PRIORITY, MESSAGE, etc.
We can cache all we want, but that doesn't help at all for very short-lived processes. If a process is created, logs something and immediately exits, then we have the message and its PID but cannot use the PID anymore to retriev metainformation about it. journald can't and systemd can't either. The right fix is to get cgroup metadata sent along in the message. There where even kernel patches to add that, but the submitter dropped the ball unfortunately.
There's a related pull request which improves the logging-just-before-exit problem by caching the metadata available for review and comment: https://github.com/systemd/systemd/pull/2280 It doesn't address the case of super-short-lived processes that start, log and exit, but it narrows the scope of the problem.
There's an issue about this one also on github, let's continue this there, and close this one: https://github.com/systemd/systemd/issues/2913
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.