Bug 56043 - dbus-daemon loses journald sockets on restart and does not sanitise when subsequently calling dbus-daemon-launch-helper leading to SIGPIPE failures
Summary: dbus-daemon loses journald sockets on restart and does not sanitise when subs...
Status: RESOLVED FIXED
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Havoc Pennington
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-16 18:15 UTC by Colin Guthrie
Modified: 2012-10-18 17:50 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dbus-spawn: set SIGPIPE to SIG_IGN before activating services (1.43 KB, patch)
2012-10-16 18:52 UTC, Ray Strode [halfline]
Details | Splinter Review

Description Colin Guthrie 2012-10-16 18:15:07 UTC
OK, the summary is a bit long, but it does at least explain things correctly!

This issue is a bit hard to describe, but I'll do my best.

Versions: systemd 194, gnome-shell 3.6.1, dbus 1.6.8, fprintd 0.4.1

After a fresh boot, login to gnome. fprintd is started via dbus activation at login and exits after a short timeout. If I lock the screen it is launched again when trying to unlock and again exits after a short timeout. I do not have any finger print h/w on this machine so this is largely unused by me but it operates correctly. Several lock+unlock attempts show now problems.


Then I restart the systemd journal: systemctl restart systemd-journald.service.

This is where things go wrong. I lock fine, but when trying to unlock (slide the shield up) all I get is a blank screen - no dialog to enter my password. Why is this, I hear you ask. Well, gnome-shell ultimately tries to launch fprintd via dbus activation as before. dbus calls dbus-daemon-launch-helper to launch /usr/lib64/fprintd. This exits with SIGPIPE and bubbles up as an exception to gnome-shell. Now, gnome-shell should be fixed to still show me the password dialog here, but that's a separate issue, the real issue is why the crash?

Well, with some investigation, the SIGPIPE is caused by fprintd output to stdout or stderr which is in the closed state.

Upon even further investigation, fd's 1 and 2 are both sockets.

While I cannot confirm 100%, it's my guess that dbus-daemon has simply lost it's connection to journald to which it's stdout/err are normally piped. When it then calls dbus-daemon-launch-helper it inherits these closed pipes, and if any launched application then tries to output anything and hasn't called signal(SIG_PIPE, SIG_IGN); itself, then it will crash and this will bubble up the stack.

So suggested work arounds are:

1. When the stdout/err are closed, try to re-establish connection to the journal.
 or
2. Sanitise stdout/err in dbus-launch-helper such that if they are closed, then redirect them to /dev/null instead such that at least apps are started even if logging is sacrificed - i.e. make the best of a bad situation!


Hope this is clear, but feel free to ask for clarification!
Comment 1 Ray Strode [halfline] 2012-10-16 18:52:19 UTC
Created attachment 68642 [details] [review]
dbus-spawn: set SIGPIPE to SIG_IGN before activating  services

dbus's service activation code sets SIGPIPE to SIG_DFL as a
sort of poor man's prctl(... PR_SET_PDEATHSIG) to detect when
the parent goes away.

It neglects to reignore the infamous signal before performing
activation, however.

This means if, for instance, journald is restarted all
services activated after it will die with SIGPIPE when logging
messages unless they explicitly ignore SIGPIPE themselves.

This commit changes dbus's service activation code to correctly
ignore SIGPIPE to protect activated services from a gruesome,
premature death.
Comment 2 Colin Guthrie 2012-10-16 19:10:22 UTC
Hmm, the buglink in the patch itself seems wrong.... :)
Comment 3 Colin Guthrie 2012-10-16 19:12:27 UTC
Sorry, forgot to say for the benefit of the bug history after IRC discussions, but I tested this patch and it does indeed solve the problem for me in my test case.
Comment 4 Ray Strode [halfline] 2012-10-16 19:32:32 UTC
i pushed this and did a follow up commit to correct the bug link.
Comment 5 Simon McVittie 2012-10-17 14:14:40 UTC
(In reply to comment #0)
> Then I restart the systemd journal: systemctl restart
> systemd-journald.service.

Er... if this is a supported action, shouldn't the new journald still be listening on the same socket, somehow?

(This is the same can of worms as having the dbus-daemon not disconnect everyone when it's restarted; we resolved that by declaring that not to be a supported action.)

> While I cannot confirm 100%, it's my guess that dbus-daemon has simply lost
> it's connection to journald to which it's stdout/err are normally piped.

dbus-daemon does not know nothing about journald; it does the same thing regardless of whether it's writing to a pipe, a tty, a file or journald.

> 1. When the stdout/err are closed, try to re-establish connection to the
> journal.

This is certainly inappropriate; dbus-daemon knows nothing about journald.

> 2. Sanitise stdout/err in dbus-launch-helper such that if they are closed,
> then redirect them to /dev/null instead such that at least apps are started
> even if logging is sacrificed - i.e. make the best of a bad situation!

The default for SIGPIPE (SIG_DFL) is "terminate process" for a reason. Shell utilities (and daemons, I would argue) should not exit successfully if the recipient of their output has gone away; and as a general principle, silently losing logged messages seems rather bad.

So, I don't think this is necessarily an appropriate change.

One way to solve this would be to direct stdout/stderr to /dev/null when run under systemd, and tell dbus-daemon to log via syslog (which really means whatever is at the other end of /dev/log, so it might end up in journald anyway). This would match what traditionally happens under sysvinit on, e.g., Debian and old Fedora.

It does have the disadvantage of losing the stdout/stderr of processes spawned from dbus-daemon-launch-helper, but I would hope that on systemd systems, the number of such processes decreases over time, in favour of using systemd activation. (System services are spawned via systemd if there is a corresponding systemd unit, or via the traditional dbus-daemon-launch-helper if not.)
Comment 6 Colin Guthrie 2012-10-17 14:44:12 UTC
(In reply to comment #5)
> (In reply to comment #0)
> > Then I restart the systemd journal: systemctl restart
> > systemd-journald.service.
> 
> Er... if this is a supported action, shouldn't the new journald still be
> listening on the same socket, somehow?

So after discussing this with Lennart yesterday, the longer term journal plan is basically to support a re-exec mode but not a restart. When re-execing it should be able to reuse the same fds.

The service file will then likely disable restart (or at very least mark it as an "unsupported" action.)

That said, restarting the journal shouldn't actively create problems. The primary issue is that SIGPIPE just gets in the way generally and used to basically take down your whole system if you restarted the journal. Nowadays SIGPIPE is automatically set to SIG_IGN before spawning any daemons.

So dbus as a daemon would have had it set to SIG_IGN anyway, and this patch simply restores this state after the fork for any exec'ed processes.


> > While I cannot confirm 100%, it's my guess that dbus-daemon has simply lost
> > it's connection to journald to which it's stdout/err are normally piped.
> 
> dbus-daemon does not know nothing about journald; it does the same thing
> regardless of whether it's writing to a pipe, a tty, a file or journald.

In that case "s/to journald/the sockets/" in my comment and it still holds true.


> > 2. Sanitise stdout/err in dbus-launch-helper such that if they are closed,
> > then redirect them to /dev/null instead such that at least apps are started
> > even if logging is sacrificed - i.e. make the best of a bad situation!
> 
> The default for SIGPIPE (SIG_DFL) is "terminate process" for a reason. Shell
> utilities (and daemons, I would argue) should not exit successfully if the
> recipient of their output has gone away; and as a general principle,
> silently losing logged messages seems rather bad.

Right now all daemons started via systemd will have SIGPIPE set to SIG_IGN anyway. Only when they explicitly change it back to SIG_DFL internally will they exit on it.

Arguably the fix should maybe be to get the current state of SIGPIPE and restore to that rather than explicitly being set to SIG_IGN but as we know dbus via systemd will be SIG_IGN anyway, it's not a crazy assumption generally.

Also, SIGPIPE is pretty useless outside of shell pipelines anyway from what I gather, other than tracking when log output is going nowhere.

Losing logging is certainly not ideal, but I'd suggest it shouldn't be fatal and take 99% of your system down with it (extrapolating beyond just dbus).

> So, I don't think this is necessarily an appropriate change.
> 
> One way to solve this would be to direct stdout/stderr to /dev/null when run
> under systemd, and tell dbus-daemon to log via syslog (which really means
> whatever is at the other end of /dev/log, so it might end up in journald
> anyway). This would match what traditionally happens under sysvinit on,
> e.g., Debian and old Fedora.
> 
> It does have the disadvantage of losing the stdout/stderr of processes
> spawned from dbus-daemon-launch-helper, but I would hope that on systemd
> systems, the number of such processes decreases over time, in favour of
> using systemd activation. (System services are spawned via systemd if there
> is a corresponding systemd unit, or via the traditional
> dbus-daemon-launch-helper if not.)


Personally I think the longer term reexec behaviour of journald and the current approach to the signal handling is fine.

If SIGPIPE is genuinely required then this should be argued and implemented generally (i.e. for all daemons in systemd generally), not just in dbus as a lone ranger off the SIGPIPE cause! :)
Comment 7 Ray Strode [halfline] 2012-10-18 17:50:04 UTC
Hi,

(In reply to comment #5)
> This is certainly inappropriate; dbus-daemon knows nothing about journald.
It probably makes sense for dbus-daemon to optionally use the journald apis internally, just so the log messages can have better semantic information attached to them.  That's an independent issue, though, and will suffer the same problems across journald restarts until journald is fixed to re-exec. 

> > 2. Sanitise stdout/err in dbus-launch-helper such that if they are closed,
> > then redirect them to /dev/null instead such that at least apps are started
> > even if logging is sacrificed - i.e. make the best of a bad situation!
> 
> The default for SIGPIPE (SIG_DFL) is "terminate process" for a reason. Shell
> utilities (and daemons, I would argue) should not exit successfully if the
> recipient of their output has gone away; and as a general principle,
> silently losing logged messages seems rather bad.
An argument could be made that SIGPIPE's default behavior is right for shell utilities, and very few people would disagree with that.

As far as daemons go... almost all daemons should be ignoring SIGPIPE. Certainly any dbus service (which is what we're talking about here) should be ignoring SIGPIPE, since not doing so can circumvent org.freedesktop.DBus.Local.Disconnected .  Granted, we don't really support system bus restarts, but that should still should make it clear SIGPIPE should be ignored.

Any service that that serves clients should be ignoring or handling SIGPIPE as well, otherwise any client can bring the service down.

I file SIGPIPE's default behavior in the same historical bucket in my brain as close-on-exec being off FDs by default.  It's something we can't change the defaults for, but really need to do regardless.

Anyway, from the code, it's clear the intent wasn't to reenable SIGPIPE for services, it was to detect the parent dying. It clearly looks like an oversight, and I'm pretty confident the fix is right independent of anything to do with logging and program stdio.

Your argument that daemons should die with SIGPIPE if they can't log to stdout is one I don't really agree with.  Sure, there are cases where specific, security-sensitive services should die if they aren't able to log to their well-known log files.  No argument there.  But it doesn't make sense as an across-the-board policy.  And it makes little sense for STDOUT/STDERR.  

> 
> So, I don't think this is necessarily an appropriate change.
> 
> One way to solve this would be to direct stdout/stderr to /dev/null when run
> under systemd, 
Your argument just reversed from wanting daemons to die if they can't get their message out to silently ignoring their messages always.

> and tell dbus-daemon to log via syslog (which really means
> whatever is at the other end of /dev/log, so it might end up in journald
> anyway). This would match what traditionally happens under sysvinit on,
> e.g., Debian and old Fedora.
I think directing messages to the journal makes a ton of sense, but I wouldn't go through the syslog middle man.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.