Bug 90364

Summary: snoopy (LD_PRELOAD logging of exec calls) prevents boot since jounald start times out and is repeated over and over
Product: systemd Reporter: Stefan Schmidt <zaphodb>
Component: generalAssignee: systemd-bugs
Status: RESOLVED NOTOURBUG QA Contact: systemd-bugs
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Stefan Schmidt 2015-05-07 17:29:54 UTC
On two instances of Xen "PV on HVM" Servers running Debian that were freshly dist-upgraded from wheezy to jessie i found that they were unable to boot while snoopy ( https://packages.debian.org/jessie/snoopy ) was enabled. They did however boot up just fine after snoopy was disabled.

It seems that the logging of all execve calls by snoopy via an LD_PRELOAD wrapping of that function somehow interferes with the upbringing of journald and apparently also prevents it from responding to systemd in time so that it gets restarted over and over without ever succeeding to launch and thus preventing systemd to continue booting up.

It seems the issue has already been reported at Fedora in https://bugzilla.redhat.com/show_bug.cgi?id=745603 .

I realize that this might not be exactly the fault of systemd but rather of the crude way that snoopy is implemented however maybe you can think of an elegant way to work around the issue.

Sadly i was unable to produce any more detailed output by setting ForwardToConsole=yes TTYPath=/dev/ttyS0 (i also tried /dev/console) in /etc/systemd/journald.conf .

Console looked like this booting up while snoopy was enabled: 

[  OK  ] Stopped Trigger Flushing of Journal to Persistent Storage.
         Stopping Journal Service...
[  OK  ] Stopped Journal Service.
         Starting Journal Service...
[  OK  ] Started Journal Service.
[***   ] (4 of 5) A start job is running for LSB: Set keymap ... 9s / no limit)[ 2771.768127] systemd[1]: systemd-journald.service watchdog timeout (limit 1min)!
         Starting Journal Service...
[  OK  ] Stopped Trigger Flushing of Journal to Persistent Storage.
         Stopping Journal Service...
[  OK  ] Stopped Journal Service.
         Starting Journal Service...
[  OK  ] Started Journal Service.
[FAILED] Failed to start udev Kernel Device Manager.
See 'systemctl status systemd-udevd.service' for details.
         Stopping udev Kernel Device Manager...
[  OK  ] Stopped udev Kernel Device Manager.
         Starting udev Kernel Device Manager...
[   ***] (3 of 5) A start job is running for Apply Kernel Var... 9s / no limit)[ 2832.004137] systemd[1]: systemd-journald.service watchdog timeout (limit 1min)!
         Starting Journal Service...
[  OK  ] Stopped Trigger Flushing of Journal to Persistent Storage.
         Stopping Journal Service...
[  OK  ] Stopped Journal Service.
         Starting Journal Service...
[  OK  ] Started Journal Service.
[FAILED] Failed to start udev Kernel Device Manager.
See 'systemctl status systemd-udevd.service' for details.
         Stopping udev Kernel Device Manager...
[  OK  ] Stopped udev Kernel Device Manager.
         Starting udev Kernel Device Manager...
[ 2892.256176] systemd[1]: systemd-journald.service watchdog timeout (limit 1min)!
         Starting Journal Service...
[  OK  ] Stopped Trigger Flushing of Journal to Persistent Storage.
         Stopping Journal Service...
[  OK  ] Stopped Journal Service.
         Starting Journal Service...
[  OK  ] Started Journal Service.
Comment 1 Stefan Schmidt 2015-05-08 08:58:39 UTC
Debian found that the issue might have been fixed within snoopy upstream.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=784682
-> https://bugzilla.redhat.com/show_bug.cgi?id=948417
by closing its logging file descriptor.

Not sure if it would be possible/feasible for journald to work around such occurrences.
Having something interfere with system startup via LD_PRELOAD surely counts as obscure.
Comment 2 Lennart Poettering 2015-05-13 17:38:50 UTC
Sorry, but we don't really support that. Use a tool like "perf" to trace all invocations of execve(), but our code uses fds > 2 to pass fds across exec(), and in some occasions closes unknown fds, and that's unlikely to change.

I figure "snoopy" the way it is written is simply incompatible with systemd, and that's nothing to fix in systemd. Sorry.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.