74714 – journalctl crashes when journal is corrupted

Bug 74714 - journalctl crashes when journal is corrupted

Summary: journalctl crashes when journal is corrupted

Status:	RESOLVED FIXED

Alias:	None

Product:	systemd
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified
Hardware:	x86 (IA32) Linux (All)

Importance:	medium normal
Assignee:	systemd-bugs
QA Contact:	systemd-bugs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-02-08 16:58 UTC by Janna Martl
Modified:	2016-06-07 13:14 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Janna Martl 2014-02-08 16:58:15 UTC

Some of my journal files seem to have gotten corrupted as a result of a failing drive, and journalctl (or 'journalctl -n N' for sufficiently large N) dies with either a segfault or (more recently) a bus error. I have also sometimes seen the journalctl process stick around as a D-state process. These crashes happen immediately after getting an I/O error (see below). This is annoying because journalctl is one of the tools I'd like to use in troubleshooting hard drive failure.

Here is a backtrace:

(gdb) bt full
#0  journal_file_move_to_object (f=f@entry=0x969cbc0, type=type@entry=3, offset=2638264, ret=ret@entry=0xbfc6505c) at src/journal/journal-file.c:434
        r = 1
        t = 0xb2c781b8
        o = 0xb2c781b8
        __PRETTY_FUNCTION__ = "journal_file_move_to_object"
#1  0x0805fb37 in generic_array_get (f=f@entry=0x969cbc0, first=375376, i=<optimized out>, ret=ret@entry=0xbfc651ac, offset=offset@entry=0xbfc651b0)
    at src/journal/journal-file.c:1443
        o = 0xb2c68b40
        p = <optimized out>
        a = <optimized out>
        t = 3158
        r = <optimized out>
        ci = 0x9687280
        __PRETTY_FUNCTION__ = "generic_array_get"
#2  0x08060269 in journal_file_next_entry (f=f@entry=0x969cbc0, o=o@entry=0xb2c77c38, p=2636856, direction=direction@entry=DIRECTION_DOWN, ret=ret@entry=0xbfc651ac,
    offset=offset@entry=0xbfc651b0) at src/journal/journal-file.c:1920
        i = 3181
        r = <optimized out>
        __PRETTY_FUNCTION__ = "journal_file_next_entry"
#3  0x0805855c in next_with_matches (j=j@entry=0x9670850, f=f@entry=0x969cbc0, direction=direction@entry=DIRECTION_DOWN, ret=ret@entry=0xbfc651ac, offset=offset@entry=0xbfc651b0)
    at src/journal/sd-journal.c:814
        c = 0xb2c77c38
        cp = 2636856
        __PRETTY_FUNCTION__ = "next_with_matches"
#4  0x0805a744 in next_beyond_location (offset=<synthetic pointer>, ret=0xbfc651a4, direction=DIRECTION_DOWN, f=0x969cbc0, j=0x9670850) at src/journal/sd-journal.c:836
        c = 0xb2c77c38
        cp = 2636856
        r = <optimized out>
#5  real_journal_next (j=j@entry=0x9670850, direction=direction@entry=DIRECTION_DOWN) at src/journal/sd-journal.c:895
        found = <optimized out>
        f = 0x969cbc0
        new_file = 0x969b4e0
        new_offset = 399336
        o = 0xb324f718
        p = <optimized out>
        i = 0x968df04
        r = <optimized out>
        __func__ = "real_journal_next"
#6  0x0805b08e in sd_journal_next (j=0x9670850) at src/journal/sd-journal.c:931
No locals.
#7  0x0804daa1 in main (argc=1, argv=0xbfc65574) at src/journal/journalctl.c:1573
        flags = <optimized out>
        r = 0
        j = 0x9670850
        need_seek = true
        previous_boot_id = {bytes = "X\357`\266L\307O̟\274\356\232\177\241\b2", qwords = {14722204839188688728, 3605309071142337695}}
        previous_boot_id_valid = true
        first_line = true
        n_shown = 41695
        ellipsized = false
        __func__ = "main"


and here an I/O error that occured immediately before one of these crashes:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x5
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:c8:ca:92/00:00:2e:00:00/e0 tag 0 dma 4096 in
         res 51/40:05:cb:ca:92/40:00:2e:00:00/ee Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/100
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda]
mResult: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda]
mSense Key : 0x3 [current] [descriptor]
mDescriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
        2e 92 ca cb
sd 0:0:0:0: [sda] 
mASC=0x11 ASCQ=0x4
sd 0:0:0:0: [sda] CDB:
mcdb[0]=0x28: 28 00 2e 92 ca c8 00 00 08 00
end_request: I/O error, dev sda, sector 781372107
ata1: EH complete

Comment 1 Zbigniew Jedrzejewski-Szmek 2014-02-08 17:57:48 UTC

If they get stuck in D-state that's probably a kernel issue.

It seems SIGBUS on IO error is legitimate. We could install a handler for the signal.

Comment 2 Lennart Poettering 2014-02-10 21:37:19 UTC

Is there any chance you could pass us the journal files triggering this?

Comment 3 Zbigniew Jedrzejewski-Szmek 2014-02-11 02:36:39 UTC

(In reply to comment #2)
> Is there any chance you could pass us the journal files triggering this?

Do you think that it's journalctl getting confused? I thought that the kernel is sending SIGBUS because it can't read the mmap'ed file.

Comment 4 Lennart Poettering 2014-02-11 10:23:27 UTC

(In reply to comment #3)
> (In reply to comment #2)
> > Is there any chance you could pass us the journal files triggering this?
> 
> Do you think that it's journalctl getting confused? I thought that the
> kernel is sending SIGBUS because it can't read the mmap'ed file.

Oh, hmm, this is a bit confusing in the original bug report.

Janna, the IO error you saw, did that happen when the file was corrupted, or when you viewed the file? In the latter case the SIGBUS is to be expected. In the former case this would be a bug in journalctl.

Comment 5 Janna Martl 2014-02-14 14:20:18 UTC

(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > Is there any chance you could pass us the journal files triggering this?
> > 
> > Do you think that it's journalctl getting confused? I thought that the
> > kernel is sending SIGBUS because it can't read the mmap'ed file.
> 
> Oh, hmm, this is a bit confusing in the original bug report.
> 
> Janna, the IO error you saw, did that happen when the file was corrupted, or
> when you viewed the file? In the latter case the SIGBUS is to be expected.
> In the former case this would be a bug in journalctl.

I meant the latter. I'm pretty sure the files were corrupted before journalctl got around to reading them, and the crash was caused by a read error. Sorry for the confusion.

Comment 6 Lennart Poettering 2016-06-07 13:14:27 UTC

current journalctl should be pretty happy with corrupted journal files and read them as far as they aren't corrupted. Also, we del nicely with the SIGBUS issues now. Hence closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.