Bug 64616

Summary: Not all cursors seem to work correctly.
Product: systemd Reporter: Marius Vollmer <marius.vollmer>
Component: generalAssignee: systemd-bugs
Status: NEW --- QA Contact: systemd-bugs
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Ttry to avoid binary search when looking up by seqnum.

Description Marius Vollmer 2013-05-15 07:13:25 UTC
I can see this curious behavior:

# ./journalctl -D /var/log/journal.cursor-bug -p3 -u rhsc-ws.service _BOOT_ID=a12b9c5d73a14fd49bcd51b99faa0c17 -o verbose | grep s=
Wed 2013-05-08 15:03:43 EDT [s=2d3aee2eda874445a91bcb0f92718eea;i=3d8;b=a12b9c5d73a14fd49bcd51b99faa0c17;m=261477778;t=4dc3997b0904f;x=a971f4f19117d9f]
Fri 2013-05-10 04:21:41 EDT [s=2d3aee2eda874445a91bcb0f92718eea;i=48b;b=a12b9c5d73a14fd49bcd51b99faa0c17;m=21a4e8dab1;t=4dc58db51f388;x=f89f9d5aaec9bed9]
# ./journalctl -D /var/log/journal.cursor-bug -c 's=2d3aee2eda874445a91bcb0f92718eea;i=48b;b=a12b9c5d73a14fd49bcd51b99faa0c17;m=21a4e8dab1;t=4dc58db51f388;x=f89f9d5aaec9bed9' -n1 -o verbose | grep s=
Mon 2013-05-13 02:49:19 EDT [s=c66de0241a874e3cac6e7a34baccf561;i=1;b=19efc07fb36244ae97f9e568897c4547;m=8cb56;t=4dc93ea840fa9;x=8e7e4637eff8a6d1]

That is, when feeding a certain cursor back into journalctl, it will fail to seek to that cursor and instead output something else.

Since the cursor has just been output by journalctl, I would expect it to be valid.

In any case, a error or warning message would be appropriate, I think.

I'll try to debug this right away.  If you are interested, I can upload 
/var/log/journal.cursor-bug somewhere.  It's about 23 MiB.
Comment 1 Marius Vollmer 2013-05-15 16:34:13 UTC
Right, it turned out that some of the files in /var/log/journal.cursor-bug are corrupted.  Some entry objects in them are zeroed-out.  I am quite brutal with this machine (a VM), so this is not that much of a surprise.

Unfortunately, the journal is not just shorter than expected, it really is corrupted and becomes inconsistent as shown in the bug description.

_Fortunately_, binary search doesn't really seem to be necessary for finding entries by seqnum.  A entry with seqnum S should always be at index S-1 in the entry array of its file, no?

This patch works for my scenario:

--- a/src/journal/journal-file.c
+++ b/src/journal/journal-file.c
@@ -1767,6 +1767,21 @@ int journal_file_move_to_entry_by_seqnum(
                 Object **ret,
                 uint64_t *offset) {
 
+        int r;
+
+        if (seqnum >= 1 && seqnum <= f->header->n_entries) {
+                Object *o;
+                r = generic_array_get(f,
+                                      le64toh(f->header->entry_array_offset),
+                                      seqnum-1,
+                                      &o, offset);
+                if (r >= 0 && le64toh(o->entry.seqnum) == seqnum) {
+                        if (ret)
+                                *ret = o;
+                        return 1;
+                }
+        }
+
         return generic_array_bisect(f,
                                     le64toh(f->header->entry_array_offset),
                                     le64toh(f->header->n_entries),

(It is quite conservative, and falls back to the existing code if my assumption is wrong.)

I haven't yet looked at journal_file_move_to_entry_by_seqnum_for_data.  Something similar will probably be appropriate there as well.

Even without considering corruption, using generic_array_get with seqnums should be a small performance improvement, no?


In general, we might try to detect corruption during a binary search, issue a warning, and try to repair the damage.  For example, we might do a linear search for a non-corrupted entry and continue the bisecting from there.

I'll look at the _data case and then make a complete patch.
Comment 2 Marius Vollmer 2013-05-16 07:20:21 UTC
The _data case needs binary search, and seqnums are probably not guaranteed to strictly correspond to array indices.  Correct?

I am happy for now with the attached patch, but that doesn't do much about robustness against file corruption.
Comment 3 Marius Vollmer 2013-05-16 07:21:24 UTC
Created attachment 79382 [details] [review]
Ttry to avoid binary search when looking up by seqnum.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.