Bug 86442

Summary: dbus-daemon should syslog when limits are exceeded
Product: dbus Reporter: Simon McVittie <smcv>
Component: coreAssignee: Simon McVittie <smcv>
Status: RESOLVED FIXED QA Contact: D-Bus Maintainers <dbus>
Severity: enhancement    
Priority: medium Keywords: patch
Version: git master   
Hardware: Other   
OS: All   
Whiteboard: review?
i915 platform: i915 features:
Bug Depends on: 86431    
Bug Blocks: 95263    
Attachments: Log to syslog when pending_fd_timeout is exceeded
Log to syslog if max_completed_connections or max_connections_per_user are exceeded
Log service_start_timeout (aka activation_timeout) when we exceed it
Log max_pending_service_starts (aka max_pending_activations) when exceeded
Log when we exceed max_replies_per_connection
Log when we exceed max_match_rules_per_connection
Log when we exceed max_services_per_connection
Document default limits in system.conf.in
Log to syslog when pending_fd_timeout is exceeded
Log to syslog if max_completed_connections or max_connections_per_user are exceeded
Log service_start_timeout (aka activation_timeout) when we exceed it
Log max_pending_service_starts (aka max_pending_activations) when exceeded
Log when we exceed max_replies_per_connection
Log when we exceed max_match_rules_per_connection
Log when we exceed max_names_per_connection (aka max_services_per_connection)
Document default limits in system.conf.in
dbus-daemon test: exercise maximum connection counts
dbus-daemon test: avoid reaching pending reply limit
dbus-daemon test: exercise maximum replies per connection
dbus-daemon test: exercise maximum match rules per connection
dbus-daemon test: exercise max_names_per_connection limit
dbus-daemon test: exercise pending fd timeout test

Description Simon McVittie 2014-11-18 19:06:48 UTC
dbus-daemon should syslog when it detects a limit being exceeded. If it did, Bug #86431 would have been easier to diagnose.

The part for auth_timeout is on Bug #86431 and will not be repeated here.

See also Bug #85108 which covers violations of maximum message size, maximum number of fds and other transport-related things, so it's rather more complicated (an error needs to be propagated out through libdbus into the dbus-daemon).
Comment 1 Simon McVittie 2014-11-18 19:21:58 UTC
Created attachment 109682 [details] [review]
Log to syslog when pending_fd_timeout is exceeded

Similarly [to Bug #86431], this is either a denial-of-service attempt or a
pathological performance problem.

Bug: https://bugs.freedesktop.org/show_bug.cgi?id=86442
Comment 2 Simon McVittie 2014-11-18 19:22:21 UTC
Created attachment 109683 [details] [review]
Log to syslog if max_completed_connections or  max_connections_per_user are exceeded
Comment 3 Simon McVittie 2014-11-18 19:22:34 UTC
Created attachment 109684 [details] [review]
Log service_start_timeout (aka activation_timeout) when  we exceed it
Comment 4 Simon McVittie 2014-11-18 19:22:49 UTC
Created attachment 109685 [details] [review]
Log max_pending_service_starts (aka  max_pending_activations) when exceeded
Comment 5 Simon McVittie 2014-11-18 19:23:04 UTC
Created attachment 109686 [details] [review]
Log when we exceed max_replies_per_connection
Comment 6 Simon McVittie 2014-11-18 19:23:20 UTC
Created attachment 109687 [details] [review]
Log when we exceed max_match_rules_per_connection
Comment 7 Simon McVittie 2014-11-18 19:23:35 UTC
Created attachment 109688 [details] [review]
Log when we exceed max_services_per_connection
Comment 8 Simon McVittie 2014-11-18 19:23:54 UTC
Created attachment 109689 [details] [review]
Document default limits in system.conf.in

I left them commented out to minimize the impact if we change them in
one place and not the other, but we should try to keep them in sync
anyway.
Comment 9 Simon McVittie 2014-11-18 19:27:20 UTC
> Created attachment 109684 [details] [review]
> Log service_start_timeout (aka activation_timeout) when  we exceed it
> Created attachment 109685 [details] [review]
> Log max_pending_service_starts (aka  max_pending_activations) when exceeded

These are more for debuggability than to detect DoS, but still seem worthwhile. The first name for each is the user-facing syntax for system.conf, the "aka" is the variable in our C code.

> Created attachment 109686 [details] [review]
> Log when we exceed max_replies_per_connection
> Created attachment 109687 [details] [review]
> Log when we exceed max_match_rules_per_connection
> Created attachment 109688 [details] [review]
> Log when we exceed max_services_per_connection

This is mostly a way to detect programmer error, but could also detect a DoS attempt.
Comment 10 Simon McVittie 2014-11-18 19:29:16 UTC
max_incoming_bytes, max_incoming_unix_fds are soft limits: we don't disconnect anyone for those, we just throttle reading from that socket.

max_outgoing_bytes, max_outgoing_unix_fds log when we drop the message on the floor I think? Bug #35358 says we do, anyway.

max_message_size, max_message_unix_fds are harder to do because they involve the DBusTransport, Bug #85108.

I now realise I haven't added one for max_incomplete_connections yet. That's related to Bug #86431.
Comment 11 Simon McVittie 2015-11-17 15:17:24 UTC
Any feedback on these? It's been almost a year.

(I haven't confirmed that they still apply, but they probably do.)
Comment 12 Ralf Habacker 2015-11-17 21:48:42 UTC
Because this is system bus related, which is not supported on windows, I guess a review needs to come the unix/linux fraction.
Comment 13 Simon McVittie 2016-07-01 12:18:31 UTC
Created attachment 124822 [details] [review]
Log to syslog when pending_fd_timeout is exceeded

This is either a denial-of-service attempt, a pathological performance
problem or a dbus-daemon bug. Sysadmins should be told about any of
these.
Comment 14 Simon McVittie 2016-07-01 12:18:50 UTC
Created attachment 124823 [details] [review]
Log to syslog if max_completed_connections or  max_connections_per_user are exceeded
Comment 15 Simon McVittie 2016-07-01 12:19:12 UTC
Created attachment 124824 [details] [review]
Log service_start_timeout (aka activation_timeout) when  we exceed it

Also upgrade it to a warning.
Comment 16 Simon McVittie 2016-07-01 12:19:25 UTC
Created attachment 124825 [details] [review]
Log max_pending_service_starts (aka  max_pending_activations) when exceeded
Comment 17 Simon McVittie 2016-07-01 12:19:47 UTC
Created attachment 124826 [details] [review]
Log when we exceed max_replies_per_connection
Comment 18 Simon McVittie 2016-07-01 12:20:05 UTC
Created attachment 124827 [details] [review]
Log when we exceed max_match_rules_per_connection
Comment 19 Simon McVittie 2016-07-01 12:20:42 UTC
Created attachment 124828 [details] [review]
Log when we exceed max_names_per_connection (aka  max_services_per_connection)

---
v2: use the name as given in the configuration file, not as given in the source code
Comment 20 Simon McVittie 2016-07-01 12:21:05 UTC
Created attachment 124829 [details] [review]
Document default limits in system.conf.in

I left them commented out to minimize the impact if we change them in
one place and not the other, but we should try to keep them in sync
anyway.
Comment 21 Simon McVittie 2016-07-01 12:21:31 UTC
Created attachment 124830 [details] [review]
dbus-daemon test: exercise maximum connection counts
Comment 22 Simon McVittie 2016-07-01 12:22:08 UTC
Created attachment 124831 [details] [review]
dbus-daemon test: avoid reaching pending reply limit

We were not actually doing what was intended (flooding the bus with
10k or 100k messages for the other side) because the bus was limiting
the sender to 128 parallel method calls.

---

Detected by having Attachment #124826 [details].
Comment 23 Simon McVittie 2016-07-01 12:22:29 UTC
Created attachment 124832 [details] [review]
dbus-daemon test: exercise maximum replies per  connection

This time we're doing so deliberately.
Comment 24 Simon McVittie 2016-07-01 12:22:44 UTC
Created attachment 124833 [details] [review]
dbus-daemon test: exercise maximum match rules per  connection
Comment 25 Simon McVittie 2016-07-01 12:23:09 UTC
Created attachment 124834 [details] [review]
dbus-daemon test: exercise max_names_per_connection  limit
Comment 26 Simon McVittie 2016-07-01 12:24:41 UTC
No test-cases yet for the ones that are harder to test:

* activation timeout (would probably be done via sd-activation.c)
* max. pending activations (ditto)
* pending fd timeout (Unix-specific, related to Bug #80559 and Bug #95263)
Comment 27 Simon McVittie 2016-07-01 12:29:27 UTC
(In reply to Ralf Habacker from comment #12)
> Because this is system bus related, which is not supported on windows, I
> guess a review needs to come the unix/linux fraction.

Not necessarily. These limits are practically useful on the system bus and on custom buses, but there's nothing really Unix-specific about them.

The messages about reaching the limits go to syslog on the Unix system bus, or to stderr if you're running a dbus-daemon manually like the regression tests do. Either way, it seems useful.

% DBUS_TEST_DAEMON=$(pwd)/../bus/dbus-daemon DBUS_TEST_DATA=$(pwd)/data ./test-dbus-daemon --tap
# random seed: R02S2ad53e3db1d5dad6bd7e85bc278a1a10
1..13
# ProcessID of this process is 22303
# UnixUserID of this process is 1000
ok 1 /creds
# GetConnectionUnixProcessID returned 22303
ok 2 /processid
# Bug Reference: https://bugs.freedesktop.org/show_bug.cgi?id=61303
# max perf: 2000 messages / 0.265155 seconds
ok 3 /unix-runtime-is-default
# Start of echo tests
# max perf: 2000 messages / 0.219887 seconds
ok 4 /echo/session
# Bug Reference: https://bugs.freedesktop.org/show_bug.cgi?id=34393
# max perf: 10000 messages / 1.257470 seconds
ok 5 /echo/limited
# End of echo tests
# Start of no-reply tests
# Bug Reference: https://bugs.freedesktop.org/show_bug.cgi?id=76112
ok 6 /no-reply/disconnect
# Bug Reference: https://bugs.freedesktop.org/show_bug.cgi?id=76112
ok 7 /no-reply/timeout
# End of no-reply tests
# Start of canonical-path tests
ok 8 /canonical-path/uae
# End of canonical-path tests
# Start of limits tests
The maximum number of active connections has been reached (max_completed_connections=3)
ok 9 /limits/max-completed-connections
The maximum number of active connections for UID 1000 has been reached (max_connections_per_user=3)
ok 10 /limits/max-connections-per-user
The maximum number of pending replies for ":1.0" (uid=1000 pid=22303 comm="/home/smcv/tmp/build/dbus/debug/test/.libs/lt-test") has been reached (max_replies_per_connection=3)
# message of type 3
The maximum number of pending replies for ":1.0" (uid=1000 pid=22303 comm="/home/smcv/tmp/build/dbus/debug/test/.libs/lt-test") has been reached (max_replies_per_connection=3)
# message of type 3
# message of type 2
# message of type 2
# message of type 2
ok 11 /limits/max-replies-per-connection
Connection ":1.0" is not allowed to add more match rules (increase limits in configuration file if required; max_match_rules_per_connection=3)
ok 12 /limits/max-match-rules-per-connection
Connection ":1.0" is not allowed to own more services (increase limits in configuration file if required; max_names_per_connection=4)
ok 13 /limits/max-names-per-connection
# End of limits tests
Comment 28 Simon McVittie 2016-07-01 14:41:29 UTC
Created attachment 124838 [details] [review]
dbus-daemon test: exercise pending fd timeout test
Comment 29 Simon McVittie 2016-08-16 17:53:22 UTC
It's been months, so I assume anyone who objected would already have done so. Pushed unreviewed, for 1.11.6; please revert anything that is problematic.

I re-ordered the commits so that "syslog when foo is exceeded" comes immediately before "add a test for the foo limit" in each case where we have both.

Minor patch changes:

(In reply to Simon McVittie from comment #13)
> Log to syslog when pending_fd_timeout is exceeded

I added the unit (ms) to the log message.

(In reply to Simon McVittie from comment #28)
> dbus-daemon test: exercise pending fd timeout test

I omitted that test when building with CMake, by adding HAVE_GIO_UNIX to the Autotools detection of gio-unix-2.0 and conditioning on that. This is because the FindGLib2 macros do not look for the gio-unix-2.0 pkg-config module.
Comment 30 RobbieTheK 2017-05-19 20:29:34 UTC
Seemingly out of nowhere this started to happen on our Fedora 25 server. I was playing with dnsmasq, and simply restarting network.service and NetworkManager when I got disconnected from SSH and had to run to the console which eventually stopped responding and I had to reboot. Thousands of these logs:

May 17 16:18:31 our server dbus-daemon[861]: [system] Connection ":1.271" is not allowed to add more match rules (increase limits in configuration file if required; max_match_rules_per_connection=2048)
May 17 16:18:27 our server kernel: [  443.794835] perf: interrupt took too long (3142 > 3138), lowering kernel.perf_event_max_sample_rate to 63000

rpm -q dbus
dbus-1.11.12-1.fc25.x86_64

Some logs just before things went awry as I also tried to restart Lightdm:
May 17 15:47:54 ourserver mate-session[29921]: WARNING: Failed to acquire org.gnome.SessionManager
May 17 15:47:57 ourserver systemd-logind: Removed session 30376.
May 17 15:47:58 ourserver mate-session[29921]: Gtk-CRITICAL: gtk_main_quit: assertion 'main_loops != NULL' failed
May 17 15:47:58 ourserver gnome-keyring-daemon[30183]: couldn't access control socket: /run/user/0/keyring/control: No such file or directory
May 17 15:47:58 ourserver gnome-keyring-daemon[30183]: couldn't access control socket: /run/user/0/keyring/control: No such file or directory
May 17 15:47:58 ourserver gnome-keyring-daemon[30183]: another secret service is running
May 17 15:47:58 ourserver gnome-keyring-daemon[30183]: The Secret Service was already initialized
May 17 15:47:59 ourserver gnome-keyring-daemon[30183]: The PKCS#11 component was already initialized
May 17 15:47:59 ourserver spice-vdagent[30239]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
May 17 15:47:59 ourserver gnome-keyring-daemon[30183]: The SSH agent was already initialized
May 17 15:47:59 ourserver hp-systray: hp-systray[30201]: warning: hp-systray should not be run as root/superuser.
May 17 15:47:59 ourserver hp-systray: hp-systray[30201]: error: hp-systray cannot be run as root. Exiting.
May 17 15:47:59 ourserver pulseaudio: module-x11-xsmp may not be loaded twice.
May 17 15:47:59 ourserver pulseaudio: Failed to load module "module-x11-xsmp" (argument: "display=:0 session_manager=local/unix:@/tmp/.ICE-unix/29921,unix/unix:/tmp/.ICE-u
nix/29921"): initialization failed.
May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.132004]: IMSettings-Daemon[23273]: INFO: Attempting to switch IM to none [lang=en_US.UTF-8, desktop=MATE, upd
ate=false]
May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.179990]: IMSettings-Daemon[23273]: INFO:   Stopping the main process for IBus [pid: 23454]
May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.180727]: IMSettings-Daemon[23273]: INFO: Sent a signal to stop /usr/bin/ibus-daemon [pid: 23454, time: 2017-0
5-17T19:47:59.180714Z]
May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.186141]: IMSettings-Daemon[23273]: INFO:   no need to invoke any auxiliary process for none
May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.186701]: IMSettings-Daemon[23273]: INFO:   no need to invoke any main process for none
May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.224715]: IMSettings-Qt backend[23273]: INFO: Setting up xim as Qt immodule
May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.226390]: IMSettings-MATE-GSettings backend[23273]: INFO: Setting up gtk-im-context-simple:xim as gtk+ immodul
e
May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.227644]: IMSettings-GSettings backend[23273]: INFO: Setting up gtk-im-context-simple:xim as gtk+ immodule
May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.228156]: IMSettings-Daemon[23273]: INFO: Stopped main process for IBus with the signal 15 [pid: 23454]
May 17 15:48:04 ourserver audit: NETFILTER_CFG table=filter family=2 entries=187
May 17 15:48:13 ourserver systemd-logind: Removed session 30377.
May 17 15:48:35 ourserver systemd: Stopping LSB: Bring up/down networking...
May 17 15:48:36 ourserver NetworkManager[28562]: <error> [1495050516.0498] platform-linux: do-add-ip4-route[2: 0.0.0.0/0 100]: failure 101 (Network is unreachable)
May 17 15:48:36 ourserver NetworkManager[28562]: <warn>  [1495050516.0498] default-route: failed to add default route 0.0.0.0/0 via 150.108.68.1 dev 2 metric 100 mss 0 src user with effective metric 100
May 17 15:48:36 ourserver network: Shutting down loopback interface:  [  OK  ]
Comment 31 Simon McVittie 2017-05-19 20:53:41 UTC
(In reply to RobbieTheK from comment #30)
> Seemingly out of nowhere this started to happen on our Fedora 25 server.

The feature requested on this bug report is working correctly: you are getting log messages. If those log messages indicate a problem, that does not mean that this bug has not been fixed.

If this recurs, please check which process is exceeding the dbus-daemon's limits and open a new bug on whatever bug tracking system is appropriate for that process, or in Fedora if that's where you received it.

> Thousands of these logs:
> 
> May 17 16:18:31 our server dbus-daemon[861]: [system] Connection ":1.271" is
> not allowed to add more match rules (increase limits in configuration file
> if required; max_match_rules_per_connection=2048)

You can find out what this connection is (in particular its process ID, with which you can get more information from /proc, and its user ID) with a command like:

dbus-send --system --dest=org.freedesktop.DBus --print-reply /org/freedesktop/DBus org.freedesktop.DBus.GetConnectionCredentials string::1.271
Comment 32 Michal Jaegermann 2017-05-21 20:18:58 UTC
(In reply to Simon McVittie from comment #31)
> (In reply to RobbieTheK from comment #30)
> > Seemingly out of nowhere this started to happen on our Fedora 25 server.

> 
> You can find out what this connection is (in particular its process ID, with
> which you can get more information from /proc, and its user ID) with a
> command like:
> 
> dbus-send --system --dest=org.freedesktop.DBus --print-reply
> /org/freedesktop/DBus org.freedesktop.DBus.GetConnectionCredentials
> string::1.271

I was attempting to identify processes in question using 'busctl'.  For a recent series of events of such sort this was pointing to wpa_supplicant, but on earlier occasions "culprits" apparently where polkitd and gnome-settings-daemon and gnome-shell and .... and other times some processes with connection identifiers which were NOT displayed by busctl.  In other words reported results were changing depending on a phase of the moon.  You may have a look at https://bugzilla.redhat.com/show_bug.cgi?id=1427721 if interested.  Oh, and once "...is not allowed to add more match rules..." starts to flood logs the a reported connection id stays the same until the next reboot.

When recently I tried 'dbus-send --system ....' as above process identification results were not different than what 'busctl' was showing.

So far I observed this happening only on one particular machine, and this was a fairly recent phenomenon, but searching internet one can find multiple reports of that sort going many years back.
Comment 33 Simon McVittie 2017-05-22 10:44:34 UTC
(In reply to Michal Jaegermann from comment #32)
>  For a
> recent series of events of such sort this was pointing to wpa_supplicant,
> but on earlier occasions "culprits" apparently where polkitd and
> gnome-settings-daemon and gnome-shell and ...

If you believe you have found a bug in dbus-daemon or libdbus, please open a separate bug report with details.

Bug #86442 is a feature request, and the feature has been implemented, so the feature request is closed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.