Bug 95619 - _dbus_timeout_set_enabled doesn't reset timeout ticking
Summary: _dbus_timeout_set_enabled doesn't reset timeout ticking
Status: RESOLVED FIXED
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: git master
Hardware: Other All
: medium normal
Assignee: D-Bus Maintainers
QA Contact: D-Bus Maintainers
URL:
Whiteboard: review+
Keywords: patch
Depends on:
Blocks:
 
Reported: 2016-05-24 09:47 UTC by Michal Koutný
Modified: 2017-07-20 19:50 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Add _dbus_timeout_set_reenabled method to properly restart timeouts (4.97 KB, patch)
2016-05-24 09:47 UTC, Michal Koutný
Details | Splinter Review
bus: Fix timeout restarts (9.47 KB, patch)
2016-11-11 20:53 UTC, Simon McVittie
Details | Splinter Review
DBusMainLoop: ensure all required timeouts are restarted (992 bytes, text/plain)
2017-07-13 09:50 UTC, Michal Koutný
Details

Description Michal Koutný 2016-05-24 09:47:58 UTC
Created attachment 124018 [details] [review]
Add _dbus_timeout_set_reenabled method to properly restart timeouts

Calling _dbus_timeout_set_enabled won't reset period when the timeout is ticking. Currently, timeout's period is reset only when timeout is added into the event loop. When one timeout object is used multiple times, it will not fire at proper moments. 

This was observed under heavy load, when dbus-daemon randomly (actually after period of the *first* enablement) terminated connections. This has been also reported in 95263.

I propose alternate patch to bug 95263 that adds new field in DBusTimout structure and can be used without direct loop context. However, filing this as a separate issue targeted on timeouts and not directly related to pending fds counting.

From code inspection, bus_expire_timeout_set_interval also assumes the timeout is restarted when reenabled and would need review.
Comment 1 Simon McVittie 2016-06-30 18:34:47 UTC
(In reply to Michal Koutný from comment #0)
> I propose alternate patch to bug 95263

I don't think Lennart's patch on Bug #95263 achieves the original goal of the pending fd timer in defeating DoS attacks; it weakens the defence against those DoS attacks to a point where it is no longer practically useful.

> From code inspection, bus_expire_timeout_set_interval also assumes the
> timeout is restarted when reenabled and would need review.

Are you saying that this is an area that has a bug that is (or can be) fixed by your patch, or that this is an area that relies on the current behaviour and would be caused to regress by your patch, or what?
Comment 2 Simon McVittie 2016-06-30 18:42:47 UTC
Comment on attachment 124018 [details] [review]
Add _dbus_timeout_set_reenabled method to properly restart timeouts

Review of attachment 124018 [details] [review]:
-----------------------------------------------------------------

I like this approach a bit better than what's in Bug #95263, but I don't think either one is a correct way to deal with the pending fd time limit.

::: bus/connection.c
@@ +655,4 @@
>      {
>        _dbus_timeout_set_interval (d->pending_unix_fds_timeout,
>                bus_context_get_pending_fd_timeout (d->connections->context));
> +      _dbus_timeout_set_reenabled (d->pending_unix_fds_timeout, TRUE);

Restarting the timer every time we add a new fd is harmful; as noted on Bug #95263, under the default system bus configuration it would mean that an attacker can keep fds in-flight for up to 2h40, which is certainly not what we want.

::: dbus/dbus-connection.h
@@ +490,5 @@
>  dbus_bool_t dbus_timeout_get_enabled  (DBusTimeout      *timeout);
> +DBUS_EXPORT
> +dbus_bool_t dbus_timeout_needs_restart(DBusTimeout      *timeout);
> +DBUS_EXPORT
> +void        dbus_timeout_restarted    (DBusTimeout      *timeout);

If ordinary callers of libdbus are now expected to call these once per iteration, that would surely be an incompatible change for third-party libdbus main-loop integration libraries (dbus-glib, QtDBus, edbus, etc.)?

If libdbus (DBusConnection) never actually restarts timers and the only thing that restarts timers is dbus-daemon, then special handling in DBusLoop is sufficient, in which case these should underscore-prefixed, non-exported, not in a public header, and internal to libdbus.
Comment 3 Simon McVittie 2016-06-30 18:50:56 UTC
(In reply to Michal Koutný from comment #0)
> From code inspection, bus_expire_timeout_set_interval also assumes the
> timeout is restarted when reenabled and would need review.

If you're saying that bus_expire_timeout_set_interval is currently wrong and should be resetting the timeout, I'd like to see a regression test that demonstrates that, for instance by extending bus_expire_list_test().

It appears that BusExpireList has one use, namely tracking messages that have a finite reply timeout. The default reply timeout has been infinite since 2009, so if there is a bug in BusExpireList it can only affect non-default configurations.
Comment 4 Simon McVittie 2016-06-30 18:57:49 UTC
(In reply to Simon McVittie from comment #3)
> tracking messages that
> have a finite reply timeout. The default reply timeout has been infinite
> since 2009

This is the timeout that applies at the bus level, acting as a cap on the effective values of all of the timeouts set by individual method callers. Background:

https://lists.freedesktop.org/archives/dbus/2008-July/010125.html
https://bugs.freedesktop.org/show_bug.cgi?id=16841
Comment 5 Michal Koutný 2016-07-11 12:30:51 UTC
(In reply to Simon McVittie from comment #2)
> Restarting the timer every time we add a new fd is harmful; as noted on Bug
> #95263, under the default system bus configuration it would mean that an
> attacker can keep fds in-flight for up to 2h40, which is certainly not what
> we want.
Now, I understand better and I don't think any more that per fd timeouts are
correct. I agree the single aggregated timeout makes sense.

OTOH, my patch keeps the original semantics, i.e. timeout restarted only when
the first fd is added (n_pending_unix_fds_old == 0 && n_pending_unix_fds_new > 0).

> If ordinary callers of libdbus are now expected to call these once per
> iteration, that would surely be an incompatible change for third-party
> libdbus main-loop integration libraries (dbus-glib, QtDBus, edbus, etc.)?
If they wanted _dbus_timeout_set_reenabled to work for them, they'd have to.
However, it's not intended and these "methods" can be privatized.
This clumsy solution is due to split of data between DBusTimeout and
TimeoutCallback structures. Way how to avoid this, is the Lennart's variant
that requires O(n) to update the proper TimeoutCallback timestamps.

(In reply to Simon McVittie from comment #3)
> If you're saying that bus_expire_timeout_set_interval is currently wrong and
> should be resetting the timeout, I'd like to see a regression test that
> demonstrates that, for instance by extending bus_expire_list_test().
Unfortunately, I don't have a test case, this is just theoretical conclusion --
as the current implementation can leave any value in TimeoutCallback.

I just took random caller to bus_expire_timeout_set_interval to demonstrate:

	1) bus_connections_expire_incomplete calculates next_interval to be 10
	   seconds
	2) the connection completes within the interval (say after 6 seconds),
	   DBusTimeout is disabled and TimeoutCallback.last_tv_* fields aren't
	   updated (since timeout didn't expire)
	3) in another call to bus_connections_expire_incomplete next_interval
	   is calculated to (new) 10 seconds and DBusTimeout is enabled 4) now
	   the loop evaluates old TimeoutCallback.last_tv_* values and may
	   trigger the callback earlier than those 10 seconds

In reality, this may not happen, however, this should illustrate enabling a timer with stale last_tv_* values may result in unintended duration of the timeout.
Comment 6 Simon McVittie 2016-11-11 20:52:36 UTC
Comment on attachment 124018 [details] [review]
Add _dbus_timeout_set_reenabled method to properly restart timeouts

Review of attachment 124018 [details] [review]:
-----------------------------------------------------------------

::: dbus/dbus-connection.h
@@ +488,5 @@
>  dbus_bool_t dbus_timeout_handle       (DBusTimeout      *timeout);
>  DBUS_EXPORT
>  dbus_bool_t dbus_timeout_get_enabled  (DBusTimeout      *timeout);
> +DBUS_EXPORT
> +dbus_bool_t dbus_timeout_needs_restart(DBusTimeout      *timeout);

Minor coding style thing: space before opening parenthesis

::: dbus/dbus-mainloop.c
@@ +617,1 @@
>                int msecs_remaining;

Minor coding style thing: we are still stuck in C89, so declare variables before any statements

::: dbus/dbus-timeout.c
@@ +167,5 @@
> + * @param enabled #TRUE if timeout should be enabled.
> + */
> +void
> +_dbus_timeout_set_reenabled (DBusTimeout  *timeout,
> +                             dbus_bool_t   enabled)

I can't find any situations where we would want to:

* set the interval without also re-enabling and restarting the timer
* enable the timer without resetting the interval and re-enabling

So I think better methods here would be to replace _dbus_timeout_set_enabled() and _dbus_timeout_set_interval() with _dbus_timeout_restart (DBusTimeout, msecs) and dbus_timeout_disable (DBusTimeout).

@@ +514,5 @@
> + * @param timeout the DBusTimeout object
> + * @returns #TRUE if restart is needed
> + */
> +dbus_bool_t
> +dbus_timeout_needs_restart (DBusTimeout *timeout)

These are only for DBusLoop, so they can be private. They need to be DBUS_PRIVATE_EXPORT because DBusLoop is in libdbus-internal.la while DBusTimeout is in the public libdbus-1.la.
Comment 7 Simon McVittie 2016-11-11 20:53:43 UTC
Created attachment 127921 [details] [review]
bus: Fix timeout restarts

From: Michal Koutný

The code counting pending fds relied on restart of timeouts when they are
enabled. This patch adds function that ensures that such enabled timeouts
have their timekeeping data reset (and not only when timeout is
registered into event loop processing).

When timeouts weren't reset, they'd fire at rather random and mainly
incorrect moments leading to interruption of connections of dbus-daemon.

Every time we reset the interval, we also need to re-enable the timeout
and mark its end time to be recalculated by the event loop, so combine
the old set_enabled(TRUE) with set_interval() as a new restart() method.
This leaves all the set_enabled() calls having a FALSE parameter, so
remove the parameter and rename the method to disable().

[smcv: fix minor coding style issues]
[smcv: replace set_reenabled()/set_interval() pair with restart()]
[smcv: replace set_enabled(FALSE) with disable()]
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=95619
Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>

---

Does this make sense to you?
Comment 8 Simon McVittie 2016-11-11 21:01:44 UTC
If someone can put together a unit test (automated or manual) for this, that would be great - it would be a lot easier to be confident about it if we had a simple example of a BusExpireList and a DBusLoop being used together in a way that has the failure mode Michal described, with either Michal's original patch or my revised version making the test pass.

Alternatively, if we get positive test results on Bug #95263 (which I still can't reproduce), that would be another route to being more confident about this.
Comment 9 Simon McVittie 2017-04-05 15:19:28 UTC
Fixed in 1.11.10
Comment 10 Michal Koutný 2017-07-13 09:50:53 UTC
Created attachment 132660 [details]
DBusMainLoop: ensure all required timeouts are restarted

TL;DR The fix for timer resets isn't complete. Rarely, it can still trigger
unintentionally.

Hi,
there are still reports of apparently random disconnects of systemd-logind from
dbus even with the patch (comment 7) applied, see [1].

I was not able to reproduce it by myself but in all instances problems start
with set of messages like:
> pam_systemd(crond:session): Failed to release session: Message did not receive a reply (timeout by message bus)

It is not because reply_timeout is configured, it is really a secondary
consequence of disconnecting the client which happens from the the pending_fds
timeout and this is just a cleanup:
    bus_connection_drop_pending_replies()
      is called by bus_connection_disconnected()
        is called by bus_connections_unref() // not relevant
        is called by bus_dispatch()
           when dispatching org.freedesktop.DBus.Local.Disconnected signal

When looking for the cause of the disconnects, I figured out that the timer reset might not work as expected if there is another timer instance with zero timeout preceding the pending_fds timer in the loop->timeouts list.

Could you please review the patch? Thanks.

[1] https://github.com/systemd/systemd/issues/2925
Comment 11 Michal Koutný 2017-07-13 09:51:45 UTC
Forgot to reopen with the previous comment.
Comment 12 Simon McVittie 2017-07-20 19:50:15 UTC
(In reply to Michal Koutný from comment #10)
> When looking for the cause of the disconnects, I figured out that the timer
> reset might not work as expected if there is another timer instance with
> zero timeout preceding the pending_fds timer in the loop->timeouts list.

Thanks! Good catch. Fixed in git for 1.11.16.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.