Bug 92546

Summary: consider making D-Bus "checks" non-fatal, either for dbus-daemon or in general
Product: dbus Reporter: Simon McVittie <smcv>
Component: coreAssignee: D-Bus Maintainers <dbus>
Status: RESOLVED NOTABUG QA Contact: D-Bus Maintainers <dbus>
Severity: normal    
Priority: medium CC: slomo
Version: 1.10   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Simon McVittie 2015-10-19 19:56:42 UTC
D-Bus has two categories of assertion-like mechanism:

* "assertions", analogous to Standard C assert() and GLib g_assert().
  We recommend to distributions that these should be disabled for
  production builds, analogous to Standard C -DNDEBUG. If enabled,
  assertion failures are always fatal.

* "checks", analogous to GLib g_return_if_fail() and g_return_val_if_fail().
  We recommend that these are enabled for production builds.
  If enabled, they are fatal by default; they can be made non-fatal
  by setting the environment variable DBUS_FATAL_WARNINGS to 0,
  or made fatal by setting it to 1.

Debian's version of D-Bus has a long-standing patch[1] to make "checks" non-fatal by default. This approximately matches how GLib's g_return_[val_]if_fail() works: the GLib check logs a "critical warning", which is not normally fatal. Debian derivatives like Ubuntu inherit this patch.

See <https://bugzilla.gnome.org/show_bug.cgi?id=660809> for more on the philosophy of g_return_[val_]if_fail(); our "checks" are basically that. "Checks" are always at the API boundary of a libdbus public API, whereas assertions can go anywhere. In principle, this means that "checks" are making assertions about how a third-party project is using our API. However, in practice, lots of parts of dbus call libdbus public APIs too - in particular, libdbus code can call other libdbus public APIs, and dbus-daemon calls a lot of libdbus public APIs.

I see three ways we can go with this:

* The current situation: anything that fails a "check" crashes.
  In particular, this turns attacker-triggerable failed "checks"
  into denial-of-service security vulnerabilities; the trade-off
  is that it makes (some) bugs very visible, and in some rare
  situations it might mitigate a more serious bug (e.g. code execution)
  down to merely being a denial of service.

* What Debian has: anything that fails a "check" warns on stderr
  but tries to continue, returning a hopefully safe value
  (NULL, 0, "not successful") if the function is one that would
  normally return a value. This turns some denial-of-service
  vulnerabilities into not a vulnerability at all.

* A compromise between those two: dbus-daemon explicitly turns off
  fatal warnings for itself, so that it will try to keep stumbling along
  even when a check fails; but libdbus defaults to fatal warnings,
  so bugs in ordinary applications' interactions with libdbus
  are extremely visible.

Whichever way we go, we should continue to force DBUS_FATAL_WARNINGS=1 for the regression tests, so that bugs caught by the tests reliably make them fail.

[1] https://anonscm.debian.org/cgit/pkg-utopia/dbus.git/tree/debian/patches/Don-t-abort-on-fatal-warnings-by-default.patch

(I am one of the maintainers of dbus in Debian, but the patch we apply was introduced long before my involvement.)
Comment 1 Simon McVittie 2015-10-19 19:57:46 UTC
I personally think the current behaviour is not really sufficiently tolerant for a service that aims to be usable as critical system infrastructure, so I think we should either apply Debian's patch, or go for the compromise approach.
Comment 2 Philip Withnall 2015-10-19 22:37:56 UTC
(In reply to Simon McVittie from comment #1)
> I personally think the current behaviour is not really sufficiently tolerant
> for a service that aims to be usable as critical system infrastructure, so I
> think we should either apply Debian's patch, or go for the compromise
> approach.

How many user-visible bugs do you think have been caused by this in recent times? If it's just the monitoring bug, then I would say it's better to keep both checks and assertions fatal (the current situation). Personally, I have only seen the one bug (the monitoring one) recently which has been caused by this. I don't think preventing a crash there is worth the trade off of potentially missing other bugs or allowing escalations.
Comment 3 Simon McVittie 2017-11-09 13:57:48 UTC
(In reply to Philip Withnall from comment #2)
> Personally, I have
> only seen the one bug (the monitoring one) recently which has been caused by
> this. I don't think preventing a crash there is worth the trade off of
> potentially missing other bugs or allowing escalations.

This reasoning is persuasive.

Also, upstreams who happen to develop on Debian or Ubuntu will semi-frequently release code that "works" for them (with spam to stderr that they ignore or don't notice), but has undefined behaviour, and aborts on non-Debian, non-Ubuntu distributions. This is not great, particularly since Ubuntu is rather popular for upstream development. One example of the undefined behaviour these checks catch in real life is missing off the DBUS_TYPE_INVALID from a varargs function call, causing uninitialized stack contents to be interpreted as the next type to append.

If we made these checks non-fatal by default, we'd get the same situation, but replacing "Debian and Ubuntu" with "distributions that contain a recent dbus". I don't think that would be good either.

So I'm withdrawing this suggestion, and instead I'm going to try removing the patch in Debian.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.