Bug 61411

Summary: All logs since last boot gone after crash/hard reboot
Product: systemd Reporter: Philipp Überbacher <murks>
Component: generalAssignee: systemd-bugs
Status: CLOSED FIXED QA Contact: systemd-bugs
Severity: major    
Priority: medium CC: peter
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Philipp Überbacher 2013-02-24 21:42:09 UTC
Here's the rough timeline:
Shutdown at 4AM
Boot at 12AM
Freeze at 10PM -> hard shutdown
Boot at 10PM


And the logs contain:
Shutdown at 4AM
Boot at 10PM

This means that full ten hours are missing from the logs, including the reason for the freeze. In my opinion that makes the journal quite useless a logging solution.


System information:
Arch Linux, fully updated, ext4
linux 3.7.9
systemd 197
Comment 1 Philipp Überbacher 2013-03-09 19:26:23 UTC
I just had the same freeze-like issue again and noticed something new.
The system wouldn't react to anything, but sound was still running (hard to say whether it was looping or not). The only thing I could do was to press the power button, after which I got to see the tty output of systemd shutting down the machine as if everything was normal. After the next boot however the journal was empty, no way to figure out what went wrong.

This is even worse than I thought before, systemd still works when this happens, it shuts down the system but loses the logfile anyway.
Comment 2 Kay Sievers 2013-03-12 20:32:17 UTC
You don't happen to run laptop mode, or any other thing that does
tune the writeback or disk syncing?

Can you please try something like:
  $ logger thisissometextilookfor; strings /var/log/journal/6a...9ea/system.journal | grep thisis
  MESSAGE=thisissometextilookfor

It ends up immediately on disk here, as you can see.

You can also check:
  $ strings /var/log/journal/6a...9ea/*..journal | tail -5
  ...
  _SOURCE_REALTIME_TIMESTAMP=1363120145422193

And translate the usec to the readable date:
  $ date --date=@1363120145
  Tue Mar 12 21:29:05 CET 2013
Comment 3 Philipp Überbacher 2013-03-24 08:04:01 UTC
This works:
$ logger thisissometextilookfor; strings /var/log/journal/6a...9ea/system.journal | grep thisis

This doesn't:
$ date --date=@1364111653000416
date: invalid date @1364111653000416

One thing I'm aware of is powertop, which has an option named 'VM writeback timeout', but I never enabled it.
laptop-mode-tools weren't installed when that happened, I have them installed now but not yet configured and enabled.
I have a rather minimal Arch installation here, no DEs that would do fancy things. What could cause this?
Comment 4 Peter Wu 2013-05-13 21:55:01 UTC
journald is unreliable for keeping logs[1] (at this moment), you should run another syslog daemon if you need the logs. I have described such a method on the Arch wiki, using rsyslog[2].

 [1]: http://unix.stackexchange.com/q/67394/8250
 [2]: https://wiki.archlinux.org/index.php/Rsyslog#journald_with_rsyslog_for_kernel_messages
Comment 5 Kay Sievers 2013-05-13 22:51:42 UTC
(In reply to comment #4)
> journald is unreliable for keeping logs[1] (at this moment)

This should not be the case since a while. Please read the NEWS file.
Comment 6 Zbigniew Jedrzejewski-Szmek 2013-05-15 02:34:19 UTC
Such problems should be alleviated by http://cgit.freedesktop.org/systemd/systemd/commit/?id=26687bf8a90. If a large quantity of messages are lost again with journald set up to fsync periodically, please reopen.
Comment 7 Peter Wu 2013-05-16 11:14:40 UTC
`ls -l /var/log/journal/*/` shows that the logs are indeed modified every 5 minutes (if there is a change). The following results are observed on a Raspberry Pi ARM board.

mtime of system.journal says 12:37. Thus I waited until 12:42 where I synced the file system and did a hard reset after some seconds (basically by taking the power off).

After rebooting, I see messages from 12:34:20, that looks correct.

I was however confused by the behavior of bug #64566, the fact that log messages are interleaved (no RTC, time is retrieved using NTP) and a lot spurious "-- Reboot --" lines.
Comment 8 Philipp Überbacher 2013-05-19 11:47:44 UTC
Thanks so far.
Do I understand correctly that with default settings you could still lose the 5 minutes leading up to the freeze/crash?
How do other logging solutions deal with that?
Comment 9 Peter Wu 2013-05-19 12:49:04 UTC
The original issue in this bug (logs gone on crash) is solved. Entries are properly written to the journals.

I have a different issue where the order of messages is interleaved (journalctl is apparently sorting by date?), but that fails horribly because there is no RTC in my device. Therefore, all boots starts on January 1st.

rsyslogd would write log entries to disk and if power is suddenly gone, you can see the new boot log entries appended to the previous one.

rsyslogd example (after reboot, truncated to preserve space):
Jan  1 01:00:16 arpi systemd[1]: Starting OpenNTP Daemon...
Jan  1 01:00:17 arpi systemd[1]: Failed to read PID from file /run/nginx...
Jan  1 01:00:17 arpi systemd[1]: Started A high performance ...
May 19 14:32:10 arpi systemd[1]: Time has been changed
May 19 14:32:10 arpi systemd[1]: Started OpenNTP Daemon.

journalctl (diff on journal before "crash" and after reboot):
+Jan 01 01:00:16 arpi systemd[1]: Starting OpenNTP Daemon...
+-- Reboot --
 (old log entries here)
+-- Reboot --
+Jan 01 01:00:17 arpi systemd[1]: Failed to read PID from file /run/nginx...
+-- Reboot --

 (more old log entries and newer entries interleaved)

+May 19 14:31:10 arpi sshd[358]: pam_unix(sshd:session): ...
+May 19 14:31:10 arpi systemd-logind[115]: New session c2 ...
+-- Reboot --
+May 19 14:32:10 arpi systemd[1]: Time has been changed
+May 19 14:32:10 arpi systemd[1]: Started OpenNTP Daemon.

Again, this is a completely different issue, I am not sure what it is related to.
Comment 10 Peter Wu 2013-05-19 13:01:16 UTC
I did more tests, it really occurs only on crash (unclean shutdown), not with normal reboots. Filed a new bug at #64760.
Comment 11 Philipp Überbacher 2013-06-02 21:27:07 UTC
I didn't have such a hang in quite a while, so I couldn't test it. Now I had one, and the error is logged. Assuming this is the same kind of error, which seems likely to me, it seems logging is indeed improved.

Case anyone is interested what it looks like:
...
Jun 02 20:49:41 eeyore devmon[1128]: media:          []
Jun 02 20:49:41 eeyore devmon[1128]: partition:      [1]
Jun 02 23:12:45 eeyore kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jun 02 23:12:45 eeyore kernel: [drm] capturing error event; look for more information in/sys/kernel/debug/dri/0/i915_error_state
Jun 02 23:12:57 eeyore systemd-logind[1132]: Power key pressed.
Jun 02 23:12:57 eeyore systemd-logind[1132]: Powering Off...
Jun 02 23:12:57 eeyore systemd-logind[1132]: System is powering down.
...

Sadly I couldn't do anything and hence not check the error state. However, thanks to the log I can now be reasonably sure that the graphics driver is the culprit.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.