Bug 63672

Summary: journalctl skips first entry after reboot in certain scenario
Product: systemd Reporter: Marius Vollmer <marius.vollmer>
Component: generalAssignee: systemd-bugs
Status: RESOLVED FIXED QA Contact: systemd-bugs
Severity: major    
Priority: medium CC: diego.viola, fred
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Test program for this bug.

Description Marius Vollmer 2013-04-18 07:40:10 UTC
There is a inconsistency when using "-n".  First without:

# journalctl UNIT=boot.mount | tail -20
Apr 11 05:37:28 f18 systemd[1]: Mounting /boot...
Apr 11 05:37:28 f18 systemd[1]: Mounted /boot.
-- Reboot --
Apr 12 02:09:41 f18 systemd[1]: Mounting /boot...
Apr 12 02:09:41 f18 systemd[1]: Mounted /boot.
-- Reboot --
Apr 15 02:53:17 f18 systemd[1]: Mounting /boot...
Apr 15 02:53:17 f18 systemd[1]: Mounted /boot.
-- Reboot --
Apr 15 13:01:50 f18 systemd[1]: Mounting /boot...
Apr 15 13:01:50 f18 systemd[1]: Mounted /boot.
-- Reboot --
Apr 16 02:17:29 f18 systemd[1]: Mounting /boot...
Apr 16 02:17:29 f18 systemd[1]: Mounted /boot.
-- Reboot --
Apr 16 09:06:05 f18 systemd[1]: Mounting /boot...
Apr 16 09:06:05 f18 systemd[1]: Mounted /boot.
-- Reboot --
Apr 17 11:12:33 f18 systemd[1]: Mounting /boot...
Apr 17 11:12:33 f18 systemd[1]: Mounted /boot.

There seem to be two messages per boot about the "boot.mount" unit.

Now with -n:

sh-4.2# journalctl UNIT=boot.mount -n10 
-- Logs begin at Fri 2013-04-05 04:13:15 EDT, end at Thu 2013-04-18 03:01:01 EDT. --
Apr 15 02:53:17 f18 systemd[1]: Mounting /boot...
Apr 15 02:53:17 f18 systemd[1]: Mounted /boot.
-- Reboot --
Apr 15 13:01:50 f18 systemd[1]: Mounted /boot.
-- Reboot --
Apr 16 02:17:29 f18 systemd[1]: Mounted /boot.
-- Reboot --
Apr 16 09:06:05 f18 systemd[1]: Mounted /boot.
-- Reboot --
Apr 17 11:12:33 f18 systemd[1]: Mounted /boot.

After crossing a "-- Reboot --" line, the "Mounting /boot" message is skipped.

I haven't yet checked this with master.  I'll investigate a bit myself also.

(I think this is quite major since silently omitting messages is never good.)
Comment 1 Marius Vollmer 2013-04-18 11:18:57 UTC
Here is a patch that fixes this for me:

diff --git a/src/journal/sd-journal.c b/src/journal/sd-journal.c
index 064929b..15239b5 100644
--- a/src/journal/sd-journal.c
+++ b/src/journal/sd-journal.c
@@ -109,6 +109,9 @@ static void set_location(sd_journal *j, LocationType type, JournalFile *f, Objec
 
         init_location(&j->current_location, type, f, o);
 
+        if (j->current_file)
+                j->current_file->current_offset = 0;
+
         j->current_file = f;
         j->current_field = 0;

The theory is the following:  When sd_journal_prev goes from entry e_a in file f_a to entry e_b in file f_b, then f_a->current_offset will still point to e_a.  When a subsequent sd_journal_next then goes in the other direction and switches from file f_b to f_a, it will advance beyond the current position in f_a, and thereby skip e_a.

Of course, that's just my theory.  Please check this carefully.
Comment 2 Lennart Poettering 2013-04-18 20:32:42 UTC
Hmm, so, yeah, you found the issue I think, this is indeed broken the way you describe. That said, it was actually supposed to be a good thing to cache the last positions in all files, to make simple iteration faster. If we drop it, then we always have to find the location where we are first, which is O(log(n)) rather than O(1) as before. Now, the real fix is probably to reset the location as soon as we "change direction" during traversal. But for that we'd have to remember the last direction...

Anyway, I'll commit your fix for now, given that O(log(n)) isn't so much worse than O(1), and I want to get a new release out, but I will leave this bug open, to look into this, add a comment about this and add a TODO item, so I really won't forget...

BTW, I greatly appreciate the patches from you! It's awesome getting bug reports plus the right patches all the time. That's awesome! Thanks a ton!
Comment 3 Marius Vollmer 2013-04-19 06:16:24 UTC
(In reply to comment #2)
> Hmm, so, yeah, you found the issue I think, this is indeed broken the way
> you describe. That said, it was actually supposed to be a good thing to
> cache the last positions in all files, to make simple iteration faster.

You mean, in the case where multiple files overlap in time?  I didn't think of that.  Isn't my fix broken for that case?  Is that actually supported?

(E.g, when iterating, you get entry 1 from file A, then entries 2 and 3 from file B, then entry 2 from file A, etc.  I think with my fix we would go back to entry 1 from file A after entries 2 and 3 from file B.)

My first idea was to reset current_offset when a file is exhausted (when next_beyond_location returns 0 in real_journal_next), but that would break the 'guarantee' that you are still on the same position after sd_journal_next has returned 0, and can get a cursor for that position, for example.  Is there such a guarantee, actually?  If not, that should be documented.  If there is such a guarantee, we can do the resetting of exhausted files only if they are not the current_file, at the end of real_journal_next (I think).

In any case, current_offset should not be reset to 0 when a file is exhausted, but to a value that indicates that we are one step beyond the beginning _or_ end, as appropriate.

> Anyway, I'll commit your fix for now, given that O(log(n)) isn't so much
> worse than O(1), and I want to get a new release out, but I will leave this
> bug open, to look into this, add a comment about this and add a TODO item,
> so I really won't forget...

Please consider whether this breaks the overlapping files scenario...  I'll try to make another patch as outlined above.

> BTW, I greatly appreciate the patches from you! It's awesome getting bug
> reports plus the right patches all the time. That's awesome! Thanks a ton!

My pleasure!  It's fun to hack on some real algorithms and data structures for a change. :)
Comment 4 Marius Vollmer 2013-04-19 11:19:41 UTC
(In reply to comment #3)

> > Anyway, I'll commit your fix for now, given that O(log(n)) isn't so much
> > worse than O(1), and I want to get a new release out, but I will leave this
> > bug open, to look into this, add a comment about this and add a TODO item,
> > so I really won't forget...
> 
> Please consider whether this breaks the overlapping files scenario...  I'll
> try to make another patch as outlined above.

Here is a small test program for this bug.  It also tests interleaved files, and they are broken without the patch and fixed with it.  It is, of course, exactly as you say: setting current_offset to zero just triggers a search for the right location, it doesn't rewind the file back to its start.  Sorry for getting confused there a bit.
Comment 5 Marius Vollmer 2013-04-19 11:20:34 UTC
Created attachment 78233 [details]
Test program for this bug.
Comment 6 Frederic Crozat 2013-05-31 13:05:29 UTC
This patch caused https://bugzilla.novell.com/show_bug.cgi?id=817778, infinite loop on some specific interleaved journals..
Comment 7 Diego Viola 2013-06-07 15:53:41 UTC
Hello,

I'm getting some problem with journalctl and infinite loops as well.

When I try to do: `journalctl --since=yesterday --until=now', and then I scroll down, it scrolls down forever.

If I hit the End key, it also loops forever/endlessly, until I hit <C-c> (ctrl-c).

My system is Arch Linux x86_64, systemd 204.

Thanks.
Comment 8 Zbigniew Jedrzejewski-Szmek 2013-06-10 15:00:50 UTC
I verified that the test program exposes the original bug described in the report, and that it was fixed in http://cgit.freedesktop.org/systemd/systemd/commit/?id=a3e6f050.

Test program committed as testcase: http://cgit.freedesktop.org/systemd/systemd/commit/?id=7a050b54b.

Cache is restored in http://cgit.freedesktop.org/systemd/systemd/commit/?id=87011c2.

So, since the original bug is fixed, and the other bug that the bugfix exposed should be fixed too, closing now.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.