Bug 68000

Summary: RFE: Journal: apply per-driver rate-limiting to kernel messages
Product: systemd Reporter: Adam Williamson <adamw>
Component: generalAssignee: systemd-bugs
Status: RESOLVED DUPLICATE QA Contact: systemd-bugs
Severity: critical    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Adam Williamson 2013-08-11 18:23:44 UTC
I just looked at the journalctl output on my laptop, and saw this:

Aug 10 18:41:52 laptop.localdomain systemd[1]: systemd-journald.service: main process exited, code=killed, status=10/USR1
Aug 10 18:41:52 laptop.localdomain systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Aug 10 18:41:52 laptop.localdomain systemd-journal[28662]: Allowing system journal files to grow to 4.0G.
Aug 10 18:41:52 laptop.localdomain systemd-journal[28662]: Journal started
Aug 10 18:41:52 laptop.localdomain systemd[1]: systemd-journald.service: main process exited, code=exited, status=1/FAILURE
Aug 10 18:41:52 laptop.localdomain systemd-journal[28665]: Allowing system journal files to grow to 4.0G.
Aug 10 18:41:52 laptop.localdomain systemd-journal[28665]: Journal started
Aug 10 18:41:52 laptop.localdomain systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Aug 10 18:41:52 laptop.localdomain systemd[1]: systemd-journald.service: main process exited, code=exited, status=1/FAILURE
Aug 10 18:41:52 laptop.localdomain systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.

Note it's now 14:19 on August 11, so it appears that journald has pretty much given up and gone home...

Looking in /var/log/journal, clearly something has spammed the journal in a major way:

[adamw@laptop fbb0c4896a9c0f680475c0a07edd2c5a]$ pwd
/var/log/journal/fbb0c4896a9c0f680475c0a07edd2c5a
[adamw@laptop fbb0c4896a9c0f680475c0a07edd2c5a]$ du -h
2.7G	.

Eyeballing the journalctl output, what I'm seeing is a whole huge bunch of this:

Jun 27 02:53:35 laptop.localdomain kernel: psmouse serio1: Trackpad at isa0060/serio1/input0 - driver resynced.
Jun 27 02:53:35 laptop.localdomain kernel: psmouse serio1: Trackpad at isa0060/serio1/input0 lost sync at byte 1
Jun 27 02:53:35 laptop.localdomain kernel: psmouse serio1: Trackpad at isa0060/serio1/input0 - driver resynced.
Jun 27 02:53:35 laptop.localdomain kernel: psmouse serio1: Trackpad at isa0060/serio1/input0 lost sync at byte 1

over and over and over and over again - there are literally hundreds of those messages each second at some points.

It seems like the rate limiting stuff is not catching this case for some reason; maybe it doesn't handle repeating pairs of messages like this?
Comment 1 Harald Hoyer 2013-08-30 08:37:46 UTC
(In reply to comment #0)
> It seems like the rate limiting stuff is not catching this case for some
> reason; maybe it doesn't handle repeating pairs of messages like this?

Hmm, the journal is designed to handle repeating stuff. It stores strings and only references them. So, if the same string comes along again, the string is not stored again, but only referenced. So, it seems something else was going wrong and all the metadata sourrounding that string accumulated to 2.7G.

Is your disk full?

# df -h /var/log/journal
Comment 2 Lennart Poettering 2013-09-12 17:23:36 UTC
We currently do not apply rate limiting to kernel message, trusting that the kernel gets that right. But I figure we should extend our stuff to also rate limit kernel messages per driver, since that is apparently a major problem and and log rate limiting is really something drivers have to opt in for rather than automatic.
Comment 3 Adam Williamson 2013-09-13 15:53:42 UTC
harald: I'm on vacation and don't have the system handy, but IIRC, yes, the system's disk was filled up.
Comment 4 Lennart Poettering 2014-10-24 00:44:28 UTC

*** This bug has been marked as a duplicate of bug 83221 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.