Some of my journal files seem to have gotten corrupted as a result of a failing drive, and journalctl (or 'journalctl -n N' for sufficiently large N) dies with either a segfault or (more recently) a bus error. I have also sometimes seen the journalctl process stick around as a D-state process. These crashes happen immediately after getting an I/O error (see below). This is annoying because journalctl is one of the tools I'd like to use in troubleshooting hard drive failure. Here is a backtrace: (gdb) bt full #0 journal_file_move_to_object (f=f@entry=0x969cbc0, type=type@entry=3, offset=2638264, ret=ret@entry=0xbfc6505c) at src/journal/journal-file.c:434 r = 1 t = 0xb2c781b8 o = 0xb2c781b8 __PRETTY_FUNCTION__ = "journal_file_move_to_object" #1 0x0805fb37 in generic_array_get (f=f@entry=0x969cbc0, first=375376, i=<optimized out>, ret=ret@entry=0xbfc651ac, offset=offset@entry=0xbfc651b0) at src/journal/journal-file.c:1443 o = 0xb2c68b40 p = <optimized out> a = <optimized out> t = 3158 r = <optimized out> ci = 0x9687280 __PRETTY_FUNCTION__ = "generic_array_get" #2 0x08060269 in journal_file_next_entry (f=f@entry=0x969cbc0, o=o@entry=0xb2c77c38, p=2636856, direction=direction@entry=DIRECTION_DOWN, ret=ret@entry=0xbfc651ac, offset=offset@entry=0xbfc651b0) at src/journal/journal-file.c:1920 i = 3181 r = <optimized out> __PRETTY_FUNCTION__ = "journal_file_next_entry" #3 0x0805855c in next_with_matches (j=j@entry=0x9670850, f=f@entry=0x969cbc0, direction=direction@entry=DIRECTION_DOWN, ret=ret@entry=0xbfc651ac, offset=offset@entry=0xbfc651b0) at src/journal/sd-journal.c:814 c = 0xb2c77c38 cp = 2636856 __PRETTY_FUNCTION__ = "next_with_matches" #4 0x0805a744 in next_beyond_location (offset=<synthetic pointer>, ret=0xbfc651a4, direction=DIRECTION_DOWN, f=0x969cbc0, j=0x9670850) at src/journal/sd-journal.c:836 c = 0xb2c77c38 cp = 2636856 r = <optimized out> #5 real_journal_next (j=j@entry=0x9670850, direction=direction@entry=DIRECTION_DOWN) at src/journal/sd-journal.c:895 found = <optimized out> f = 0x969cbc0 new_file = 0x969b4e0 new_offset = 399336 o = 0xb324f718 p = <optimized out> i = 0x968df04 r = <optimized out> __func__ = "real_journal_next" #6 0x0805b08e in sd_journal_next (j=0x9670850) at src/journal/sd-journal.c:931 No locals. #7 0x0804daa1 in main (argc=1, argv=0xbfc65574) at src/journal/journalctl.c:1573 flags = <optimized out> r = 0 j = 0x9670850 need_seek = true previous_boot_id = {bytes = "X\357`\266L\307OĢ\274\356\232\177\241\b2", qwords = {14722204839188688728, 3605309071142337695}} previous_boot_id_valid = true first_line = true n_shown = 41695 ellipsized = false __func__ = "main" and here an I/O error that occured immediately before one of these crashes: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata1.00: BMDMA stat 0x5 ata1.00: failed command: READ DMA EXT ata1.00: cmd 25/00:08:c8:ca:92/00:00:2e:00:00/e0 tag 0 dma 4096 in res 51/40:05:cb:ca:92/40:00:2e:00:00/ee Emask 0x9 (media error) ata1.00: status: { DRDY ERR } ata1.00: error: { UNC } ata1.00: configured for UDMA/100 sd 0:0:0:0: [sda] Unhandled sense code sd 0:0:0:0: [sda] mResult: hostbyte=0x00 driverbyte=0x08 sd 0:0:0:0: [sda] mSense Key : 0x3 [current] [descriptor] mDescriptor sense data with sense descriptors (in hex): 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 2e 92 ca cb sd 0:0:0:0: [sda] mASC=0x11 ASCQ=0x4 sd 0:0:0:0: [sda] CDB: mcdb[0]=0x28: 28 00 2e 92 ca c8 00 00 08 00 end_request: I/O error, dev sda, sector 781372107 ata1: EH complete
If they get stuck in D-state that's probably a kernel issue. It seems SIGBUS on IO error is legitimate. We could install a handler for the signal.
Is there any chance you could pass us the journal files triggering this?
(In reply to comment #2) > Is there any chance you could pass us the journal files triggering this? Do you think that it's journalctl getting confused? I thought that the kernel is sending SIGBUS because it can't read the mmap'ed file.
(In reply to comment #3) > (In reply to comment #2) > > Is there any chance you could pass us the journal files triggering this? > > Do you think that it's journalctl getting confused? I thought that the > kernel is sending SIGBUS because it can't read the mmap'ed file. Oh, hmm, this is a bit confusing in the original bug report. Janna, the IO error you saw, did that happen when the file was corrupted, or when you viewed the file? In the latter case the SIGBUS is to be expected. In the former case this would be a bug in journalctl.
(In reply to comment #4) > (In reply to comment #3) > > (In reply to comment #2) > > > Is there any chance you could pass us the journal files triggering this? > > > > Do you think that it's journalctl getting confused? I thought that the > > kernel is sending SIGBUS because it can't read the mmap'ed file. > > Oh, hmm, this is a bit confusing in the original bug report. > > Janna, the IO error you saw, did that happen when the file was corrupted, or > when you viewed the file? In the latter case the SIGBUS is to be expected. > In the former case this would be a bug in journalctl. I meant the latter. I'm pretty sure the files were corrupted before journalctl got around to reading them, and the crash was caused by a read error. Sorry for the confusion.
current journalctl should be pretty happy with corrupted journal files and read them as far as they aren't corrupted. Also, we del nicely with the SIGBUS issues now. Hence closing.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.