Bug 55239 - journalctl and parsing corrupted journals
Summary: journalctl and parsing corrupted journals
Status: NEW
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: high enhancement
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-23 13:20 UTC by Oleksii Shevchuk
Modified: 2015-06-30 10:10 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Oleksii Shevchuk 2012-09-23 13:20:28 UTC
At this momenct journalctl couldn't parse corrupted logs. Simple test

test > journalctl -D $(pwd) > out
test > wc -l out 
82154 out
                                                                                                                
test > for i in $(seq 1 512); do let "out=$i*10"; dd if=/dev/urandom of=system.journal count=1 conv=notrunc seek=$out; done 2>/dev/null 
test > journalctl -D $(pwd) > out2
test > wc -l out2                 
4 out2           
test >
Comment 1 Zbigniew Jedrzejewski-Szmek 2014-07-16 03:24:14 UTC
But what should it do? Writing at those offsets basically nukes basis header structures...
Comment 2 Nicholas Miell 2014-10-18 15:38:34 UTC
It should recognize portions of the file are corrupted and start searching for magic numbers indicating a record from that point. When it finds a candidate magic number, it should check the record's checksum at the known offset from the magic number to verify it actually found a record and not just a spurious magic number. Once it finds an intact record again, it should continue parsing journal records as normal (until the next corrupt region, if any).

ObjectHeader has 6 reserved uint8_t's to use for the magic number and checksum, so this would be a compatible change.

Then you could make a tool (or just modify journalctl) that takes a damaged journal as input and converts all the damaged regions into artificially generated log entries saying something like MESSAGE="Damaged Region #1" with an attached DATA=the original corrupted binary data.


Essentially, journalctl's current behavior of pretending the journal file ends at the first damaged region is unacceptable.
Comment 3 Nicholas Miell 2014-10-18 15:41:36 UTC
Oh, and because apps can log arbitrary binary data, they could potentially create fake records inside their payloads, so the checksum/hash should be over the ObjectHeader and some random per-journal file value an attacker can't access.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.