Summary: | All logs since last boot gone after crash/hard reboot | ||
---|---|---|---|
Product: | systemd | Reporter: | Philipp Überbacher <murks> |
Component: | general | Assignee: | systemd-bugs |
Status: | CLOSED FIXED | QA Contact: | systemd-bugs |
Severity: | major | ||
Priority: | medium | CC: | peter |
Version: | unspecified | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Philipp Überbacher
2013-02-24 21:42:09 UTC
I just had the same freeze-like issue again and noticed something new. The system wouldn't react to anything, but sound was still running (hard to say whether it was looping or not). The only thing I could do was to press the power button, after which I got to see the tty output of systemd shutting down the machine as if everything was normal. After the next boot however the journal was empty, no way to figure out what went wrong. This is even worse than I thought before, systemd still works when this happens, it shuts down the system but loses the logfile anyway. You don't happen to run laptop mode, or any other thing that does tune the writeback or disk syncing? Can you please try something like: $ logger thisissometextilookfor; strings /var/log/journal/6a...9ea/system.journal | grep thisis MESSAGE=thisissometextilookfor It ends up immediately on disk here, as you can see. You can also check: $ strings /var/log/journal/6a...9ea/*..journal | tail -5 ... _SOURCE_REALTIME_TIMESTAMP=1363120145422193 And translate the usec to the readable date: $ date --date=@1363120145 Tue Mar 12 21:29:05 CET 2013 This works: $ logger thisissometextilookfor; strings /var/log/journal/6a...9ea/system.journal | grep thisis This doesn't: $ date --date=@1364111653000416 date: invalid date @1364111653000416 One thing I'm aware of is powertop, which has an option named 'VM writeback timeout', but I never enabled it. laptop-mode-tools weren't installed when that happened, I have them installed now but not yet configured and enabled. I have a rather minimal Arch installation here, no DEs that would do fancy things. What could cause this? journald is unreliable for keeping logs[1] (at this moment), you should run another syslog daemon if you need the logs. I have described such a method on the Arch wiki, using rsyslog[2]. [1]: http://unix.stackexchange.com/q/67394/8250 [2]: https://wiki.archlinux.org/index.php/Rsyslog#journald_with_rsyslog_for_kernel_messages (In reply to comment #4) > journald is unreliable for keeping logs[1] (at this moment) This should not be the case since a while. Please read the NEWS file. Such problems should be alleviated by http://cgit.freedesktop.org/systemd/systemd/commit/?id=26687bf8a90. If a large quantity of messages are lost again with journald set up to fsync periodically, please reopen. `ls -l /var/log/journal/*/` shows that the logs are indeed modified every 5 minutes (if there is a change). The following results are observed on a Raspberry Pi ARM board. mtime of system.journal says 12:37. Thus I waited until 12:42 where I synced the file system and did a hard reset after some seconds (basically by taking the power off). After rebooting, I see messages from 12:34:20, that looks correct. I was however confused by the behavior of bug #64566, the fact that log messages are interleaved (no RTC, time is retrieved using NTP) and a lot spurious "-- Reboot --" lines. Thanks so far. Do I understand correctly that with default settings you could still lose the 5 minutes leading up to the freeze/crash? How do other logging solutions deal with that? The original issue in this bug (logs gone on crash) is solved. Entries are properly written to the journals. I have a different issue where the order of messages is interleaved (journalctl is apparently sorting by date?), but that fails horribly because there is no RTC in my device. Therefore, all boots starts on January 1st. rsyslogd would write log entries to disk and if power is suddenly gone, you can see the new boot log entries appended to the previous one. rsyslogd example (after reboot, truncated to preserve space): Jan 1 01:00:16 arpi systemd[1]: Starting OpenNTP Daemon... Jan 1 01:00:17 arpi systemd[1]: Failed to read PID from file /run/nginx... Jan 1 01:00:17 arpi systemd[1]: Started A high performance ... May 19 14:32:10 arpi systemd[1]: Time has been changed May 19 14:32:10 arpi systemd[1]: Started OpenNTP Daemon. journalctl (diff on journal before "crash" and after reboot): +Jan 01 01:00:16 arpi systemd[1]: Starting OpenNTP Daemon... +-- Reboot -- (old log entries here) +-- Reboot -- +Jan 01 01:00:17 arpi systemd[1]: Failed to read PID from file /run/nginx... +-- Reboot -- (more old log entries and newer entries interleaved) +May 19 14:31:10 arpi sshd[358]: pam_unix(sshd:session): ... +May 19 14:31:10 arpi systemd-logind[115]: New session c2 ... +-- Reboot -- +May 19 14:32:10 arpi systemd[1]: Time has been changed +May 19 14:32:10 arpi systemd[1]: Started OpenNTP Daemon. Again, this is a completely different issue, I am not sure what it is related to. I did more tests, it really occurs only on crash (unclean shutdown), not with normal reboots. Filed a new bug at #64760. I didn't have such a hang in quite a while, so I couldn't test it. Now I had one, and the error is logged. Assuming this is the same kind of error, which seems likely to me, it seems logging is indeed improved. Case anyone is interested what it looks like: ... Jun 02 20:49:41 eeyore devmon[1128]: media: [] Jun 02 20:49:41 eeyore devmon[1128]: partition: [1] Jun 02 23:12:45 eeyore kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung Jun 02 23:12:45 eeyore kernel: [drm] capturing error event; look for more information in/sys/kernel/debug/dri/0/i915_error_state Jun 02 23:12:57 eeyore systemd-logind[1132]: Power key pressed. Jun 02 23:12:57 eeyore systemd-logind[1132]: Powering Off... Jun 02 23:12:57 eeyore systemd-logind[1132]: System is powering down. ... Sadly I couldn't do anything and hence not check the error state. However, thanks to the log I can now be reasonably sure that the graphics driver is the culprit. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.