|Summary:||journalctl and parsing corrupted journals|
|Product:||systemd||Reporter:||Oleksii Shevchuk <public.avatar>|
|Status:||NEW ---||QA Contact:||systemd-bugs|
|Priority:||high||CC:||fweimer, nmiell, radek, rektide|
|i915 platform:||i915 features:|
Description Oleksii Shevchuk 2012-09-23 13:20:28 UTC
At this momenct journalctl couldn't parse corrupted logs. Simple test test > journalctl -D $(pwd) > out test > wc -l out 82154 out test > for i in $(seq 1 512); do let "out=$i*10"; dd if=/dev/urandom of=system.journal count=1 conv=notrunc seek=$out; done 2>/dev/null test > journalctl -D $(pwd) > out2 test > wc -l out2 4 out2 test >
Comment 1 Zbigniew Jedrzejewski-Szmek 2014-07-16 03:24:14 UTC
But what should it do? Writing at those offsets basically nukes basis header structures...
Comment 2 Nicholas Miell 2014-10-18 15:38:34 UTC
It should recognize portions of the file are corrupted and start searching for magic numbers indicating a record from that point. When it finds a candidate magic number, it should check the record's checksum at the known offset from the magic number to verify it actually found a record and not just a spurious magic number. Once it finds an intact record again, it should continue parsing journal records as normal (until the next corrupt region, if any). ObjectHeader has 6 reserved uint8_t's to use for the magic number and checksum, so this would be a compatible change. Then you could make a tool (or just modify journalctl) that takes a damaged journal as input and converts all the damaged regions into artificially generated log entries saying something like MESSAGE="Damaged Region #1" with an attached DATA=the original corrupted binary data. Essentially, journalctl's current behavior of pretending the journal file ends at the first damaged region is unacceptable.
Comment 3 Nicholas Miell 2014-10-18 15:41:36 UTC
Oh, and because apps can log arbitrary binary data, they could potentially create fake records inside their payloads, so the checksum/hash should be over the ObjectHeader and some random per-journal file value an attacker can't access.