Bug 64616 - Not all cursors seem to work correctly.
Summary: Not all cursors seem to work correctly.
Status: NEW
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-15 07:13 UTC by Marius Vollmer
Modified: 2013-05-16 07:21 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Ttry to avoid binary search when looking up by seqnum. (1.58 KB, patch)
2013-05-16 07:21 UTC, Marius Vollmer
Details | Splinter Review

Description Marius Vollmer 2013-05-15 07:13:25 UTC
I can see this curious behavior:

# ./journalctl -D /var/log/journal.cursor-bug -p3 -u rhsc-ws.service _BOOT_ID=a12b9c5d73a14fd49bcd51b99faa0c17 -o verbose | grep s=
Wed 2013-05-08 15:03:43 EDT [s=2d3aee2eda874445a91bcb0f92718eea;i=3d8;b=a12b9c5d73a14fd49bcd51b99faa0c17;m=261477778;t=4dc3997b0904f;x=a971f4f19117d9f]
Fri 2013-05-10 04:21:41 EDT [s=2d3aee2eda874445a91bcb0f92718eea;i=48b;b=a12b9c5d73a14fd49bcd51b99faa0c17;m=21a4e8dab1;t=4dc58db51f388;x=f89f9d5aaec9bed9]
# ./journalctl -D /var/log/journal.cursor-bug -c 's=2d3aee2eda874445a91bcb0f92718eea;i=48b;b=a12b9c5d73a14fd49bcd51b99faa0c17;m=21a4e8dab1;t=4dc58db51f388;x=f89f9d5aaec9bed9' -n1 -o verbose | grep s=
Mon 2013-05-13 02:49:19 EDT [s=c66de0241a874e3cac6e7a34baccf561;i=1;b=19efc07fb36244ae97f9e568897c4547;m=8cb56;t=4dc93ea840fa9;x=8e7e4637eff8a6d1]

That is, when feeding a certain cursor back into journalctl, it will fail to seek to that cursor and instead output something else.

Since the cursor has just been output by journalctl, I would expect it to be valid.

In any case, a error or warning message would be appropriate, I think.

I'll try to debug this right away.  If you are interested, I can upload 
/var/log/journal.cursor-bug somewhere.  It's about 23 MiB.
Comment 1 Marius Vollmer 2013-05-15 16:34:13 UTC
Right, it turned out that some of the files in /var/log/journal.cursor-bug are corrupted.  Some entry objects in them are zeroed-out.  I am quite brutal with this machine (a VM), so this is not that much of a surprise.

Unfortunately, the journal is not just shorter than expected, it really is corrupted and becomes inconsistent as shown in the bug description.

_Fortunately_, binary search doesn't really seem to be necessary for finding entries by seqnum.  A entry with seqnum S should always be at index S-1 in the entry array of its file, no?

This patch works for my scenario:

--- a/src/journal/journal-file.c
+++ b/src/journal/journal-file.c
@@ -1767,6 +1767,21 @@ int journal_file_move_to_entry_by_seqnum(
                 Object **ret,
                 uint64_t *offset) {
 
+        int r;
+
+        if (seqnum >= 1 && seqnum <= f->header->n_entries) {
+                Object *o;
+                r = generic_array_get(f,
+                                      le64toh(f->header->entry_array_offset),
+                                      seqnum-1,
+                                      &o, offset);
+                if (r >= 0 && le64toh(o->entry.seqnum) == seqnum) {
+                        if (ret)
+                                *ret = o;
+                        return 1;
+                }
+        }
+
         return generic_array_bisect(f,
                                     le64toh(f->header->entry_array_offset),
                                     le64toh(f->header->n_entries),

(It is quite conservative, and falls back to the existing code if my assumption is wrong.)

I haven't yet looked at journal_file_move_to_entry_by_seqnum_for_data.  Something similar will probably be appropriate there as well.

Even without considering corruption, using generic_array_get with seqnums should be a small performance improvement, no?


In general, we might try to detect corruption during a binary search, issue a warning, and try to repair the damage.  For example, we might do a linear search for a non-corrupted entry and continue the bisecting from there.

I'll look at the _data case and then make a complete patch.
Comment 2 Marius Vollmer 2013-05-16 07:20:21 UTC
The _data case needs binary search, and seqnums are probably not guaranteed to strictly correspond to array indices.  Correct?

I am happy for now with the attached patch, but that doesn't do much about robustness against file corruption.
Comment 3 Marius Vollmer 2013-05-16 07:21:24 UTC
Created attachment 79382 [details] [review]
Ttry to avoid binary search when looking up by seqnum.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.