Description
Simon McVittie
2014-11-18 19:06:48 UTC
Created attachment 109682 [details] [review] Log to syslog when pending_fd_timeout is exceeded Similarly [to Bug #86431], this is either a denial-of-service attempt or a pathological performance problem. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=86442 Created attachment 109683 [details] [review] Log to syslog if max_completed_connections or max_connections_per_user are exceeded Created attachment 109684 [details] [review] Log service_start_timeout (aka activation_timeout) when we exceed it Created attachment 109685 [details] [review] Log max_pending_service_starts (aka max_pending_activations) when exceeded Created attachment 109686 [details] [review] Log when we exceed max_replies_per_connection Created attachment 109687 [details] [review] Log when we exceed max_match_rules_per_connection Created attachment 109688 [details] [review] Log when we exceed max_services_per_connection Created attachment 109689 [details] [review] Document default limits in system.conf.in I left them commented out to minimize the impact if we change them in one place and not the other, but we should try to keep them in sync anyway. > Created attachment 109684 [details] [review] > Log service_start_timeout (aka activation_timeout) when we exceed it > Created attachment 109685 [details] [review] > Log max_pending_service_starts (aka max_pending_activations) when exceeded These are more for debuggability than to detect DoS, but still seem worthwhile. The first name for each is the user-facing syntax for system.conf, the "aka" is the variable in our C code. > Created attachment 109686 [details] [review] > Log when we exceed max_replies_per_connection > Created attachment 109687 [details] [review] > Log when we exceed max_match_rules_per_connection > Created attachment 109688 [details] [review] > Log when we exceed max_services_per_connection This is mostly a way to detect programmer error, but could also detect a DoS attempt. max_incoming_bytes, max_incoming_unix_fds are soft limits: we don't disconnect anyone for those, we just throttle reading from that socket. max_outgoing_bytes, max_outgoing_unix_fds log when we drop the message on the floor I think? Bug #35358 says we do, anyway. max_message_size, max_message_unix_fds are harder to do because they involve the DBusTransport, Bug #85108. I now realise I haven't added one for max_incomplete_connections yet. That's related to Bug #86431. Any feedback on these? It's been almost a year. (I haven't confirmed that they still apply, but they probably do.) Because this is system bus related, which is not supported on windows, I guess a review needs to come the unix/linux fraction. Created attachment 124822 [details] [review] Log to syslog when pending_fd_timeout is exceeded This is either a denial-of-service attempt, a pathological performance problem or a dbus-daemon bug. Sysadmins should be told about any of these. Created attachment 124823 [details] [review] Log to syslog if max_completed_connections or max_connections_per_user are exceeded Created attachment 124824 [details] [review] Log service_start_timeout (aka activation_timeout) when we exceed it Also upgrade it to a warning. Created attachment 124825 [details] [review] Log max_pending_service_starts (aka max_pending_activations) when exceeded Created attachment 124826 [details] [review] Log when we exceed max_replies_per_connection Created attachment 124827 [details] [review] Log when we exceed max_match_rules_per_connection Created attachment 124828 [details] [review] Log when we exceed max_names_per_connection (aka max_services_per_connection) --- v2: use the name as given in the configuration file, not as given in the source code Created attachment 124829 [details] [review] Document default limits in system.conf.in I left them commented out to minimize the impact if we change them in one place and not the other, but we should try to keep them in sync anyway. Created attachment 124830 [details] [review] dbus-daemon test: exercise maximum connection counts Created attachment 124831 [details] [review] dbus-daemon test: avoid reaching pending reply limit We were not actually doing what was intended (flooding the bus with 10k or 100k messages for the other side) because the bus was limiting the sender to 128 parallel method calls. --- Detected by having Attachment #124826 [details]. Created attachment 124832 [details] [review] dbus-daemon test: exercise maximum replies per connection This time we're doing so deliberately. Created attachment 124833 [details] [review] dbus-daemon test: exercise maximum match rules per connection Created attachment 124834 [details] [review] dbus-daemon test: exercise max_names_per_connection limit No test-cases yet for the ones that are harder to test: * activation timeout (would probably be done via sd-activation.c) * max. pending activations (ditto) * pending fd timeout (Unix-specific, related to Bug #80559 and Bug #95263) (In reply to Ralf Habacker from comment #12) > Because this is system bus related, which is not supported on windows, I > guess a review needs to come the unix/linux fraction. Not necessarily. These limits are practically useful on the system bus and on custom buses, but there's nothing really Unix-specific about them. The messages about reaching the limits go to syslog on the Unix system bus, or to stderr if you're running a dbus-daemon manually like the regression tests do. Either way, it seems useful. % DBUS_TEST_DAEMON=$(pwd)/../bus/dbus-daemon DBUS_TEST_DATA=$(pwd)/data ./test-dbus-daemon --tap # random seed: R02S2ad53e3db1d5dad6bd7e85bc278a1a10 1..13 # ProcessID of this process is 22303 # UnixUserID of this process is 1000 ok 1 /creds # GetConnectionUnixProcessID returned 22303 ok 2 /processid # Bug Reference: https://bugs.freedesktop.org/show_bug.cgi?id=61303 # max perf: 2000 messages / 0.265155 seconds ok 3 /unix-runtime-is-default # Start of echo tests # max perf: 2000 messages / 0.219887 seconds ok 4 /echo/session # Bug Reference: https://bugs.freedesktop.org/show_bug.cgi?id=34393 # max perf: 10000 messages / 1.257470 seconds ok 5 /echo/limited # End of echo tests # Start of no-reply tests # Bug Reference: https://bugs.freedesktop.org/show_bug.cgi?id=76112 ok 6 /no-reply/disconnect # Bug Reference: https://bugs.freedesktop.org/show_bug.cgi?id=76112 ok 7 /no-reply/timeout # End of no-reply tests # Start of canonical-path tests ok 8 /canonical-path/uae # End of canonical-path tests # Start of limits tests The maximum number of active connections has been reached (max_completed_connections=3) ok 9 /limits/max-completed-connections The maximum number of active connections for UID 1000 has been reached (max_connections_per_user=3) ok 10 /limits/max-connections-per-user The maximum number of pending replies for ":1.0" (uid=1000 pid=22303 comm="/home/smcv/tmp/build/dbus/debug/test/.libs/lt-test") has been reached (max_replies_per_connection=3) # message of type 3 The maximum number of pending replies for ":1.0" (uid=1000 pid=22303 comm="/home/smcv/tmp/build/dbus/debug/test/.libs/lt-test") has been reached (max_replies_per_connection=3) # message of type 3 # message of type 2 # message of type 2 # message of type 2 ok 11 /limits/max-replies-per-connection Connection ":1.0" is not allowed to add more match rules (increase limits in configuration file if required; max_match_rules_per_connection=3) ok 12 /limits/max-match-rules-per-connection Connection ":1.0" is not allowed to own more services (increase limits in configuration file if required; max_names_per_connection=4) ok 13 /limits/max-names-per-connection # End of limits tests Created attachment 124838 [details] [review] dbus-daemon test: exercise pending fd timeout test It's been months, so I assume anyone who objected would already have done so. Pushed unreviewed, for 1.11.6; please revert anything that is problematic. I re-ordered the commits so that "syslog when foo is exceeded" comes immediately before "add a test for the foo limit" in each case where we have both. Minor patch changes: (In reply to Simon McVittie from comment #13) > Log to syslog when pending_fd_timeout is exceeded I added the unit (ms) to the log message. (In reply to Simon McVittie from comment #28) > dbus-daemon test: exercise pending fd timeout test I omitted that test when building with CMake, by adding HAVE_GIO_UNIX to the Autotools detection of gio-unix-2.0 and conditioning on that. This is because the FindGLib2 macros do not look for the gio-unix-2.0 pkg-config module. Seemingly out of nowhere this started to happen on our Fedora 25 server. I was playing with dnsmasq, and simply restarting network.service and NetworkManager when I got disconnected from SSH and had to run to the console which eventually stopped responding and I had to reboot. Thousands of these logs: May 17 16:18:31 our server dbus-daemon[861]: [system] Connection ":1.271" is not allowed to add more match rules (increase limits in configuration file if required; max_match_rules_per_connection=2048) May 17 16:18:27 our server kernel: [ 443.794835] perf: interrupt took too long (3142 > 3138), lowering kernel.perf_event_max_sample_rate to 63000 rpm -q dbus dbus-1.11.12-1.fc25.x86_64 Some logs just before things went awry as I also tried to restart Lightdm: May 17 15:47:54 ourserver mate-session[29921]: WARNING: Failed to acquire org.gnome.SessionManager May 17 15:47:57 ourserver systemd-logind: Removed session 30376. May 17 15:47:58 ourserver mate-session[29921]: Gtk-CRITICAL: gtk_main_quit: assertion 'main_loops != NULL' failed May 17 15:47:58 ourserver gnome-keyring-daemon[30183]: couldn't access control socket: /run/user/0/keyring/control: No such file or directory May 17 15:47:58 ourserver gnome-keyring-daemon[30183]: couldn't access control socket: /run/user/0/keyring/control: No such file or directory May 17 15:47:58 ourserver gnome-keyring-daemon[30183]: another secret service is running May 17 15:47:58 ourserver gnome-keyring-daemon[30183]: The Secret Service was already initialized May 17 15:47:59 ourserver gnome-keyring-daemon[30183]: The PKCS#11 component was already initialized May 17 15:47:59 ourserver spice-vdagent[30239]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0 May 17 15:47:59 ourserver gnome-keyring-daemon[30183]: The SSH agent was already initialized May 17 15:47:59 ourserver hp-systray: hp-systray[30201]: warning: hp-systray should not be run as root/superuser. May 17 15:47:59 ourserver hp-systray: hp-systray[30201]: error: hp-systray cannot be run as root. Exiting. May 17 15:47:59 ourserver pulseaudio: module-x11-xsmp may not be loaded twice. May 17 15:47:59 ourserver pulseaudio: Failed to load module "module-x11-xsmp" (argument: "display=:0 session_manager=local/unix:@/tmp/.ICE-unix/29921,unix/unix:/tmp/.ICE-u nix/29921"): initialization failed. May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.132004]: IMSettings-Daemon[23273]: INFO: Attempting to switch IM to none [lang=en_US.UTF-8, desktop=MATE, upd ate=false] May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.179990]: IMSettings-Daemon[23273]: INFO: Stopping the main process for IBus [pid: 23454] May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.180727]: IMSettings-Daemon[23273]: INFO: Sent a signal to stop /usr/bin/ibus-daemon [pid: 23454, time: 2017-0 5-17T19:47:59.180714Z] May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.186141]: IMSettings-Daemon[23273]: INFO: no need to invoke any auxiliary process for none May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.186701]: IMSettings-Daemon[23273]: INFO: no need to invoke any main process for none May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.224715]: IMSettings-Qt backend[23273]: INFO: Setting up xim as Qt immodule May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.226390]: IMSettings-MATE-GSettings backend[23273]: INFO: Setting up gtk-im-context-simple:xim as gtk+ immodul e May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.227644]: IMSettings-GSettings backend[23273]: INFO: Setting up gtk-im-context-simple:xim as gtk+ immodule May 17 15:47:59 ourserver com.redhat.imsettings: [ 1495050479.228156]: IMSettings-Daemon[23273]: INFO: Stopped main process for IBus with the signal 15 [pid: 23454] May 17 15:48:04 ourserver audit: NETFILTER_CFG table=filter family=2 entries=187 May 17 15:48:13 ourserver systemd-logind: Removed session 30377. May 17 15:48:35 ourserver systemd: Stopping LSB: Bring up/down networking... May 17 15:48:36 ourserver NetworkManager[28562]: <error> [1495050516.0498] platform-linux: do-add-ip4-route[2: 0.0.0.0/0 100]: failure 101 (Network is unreachable) May 17 15:48:36 ourserver NetworkManager[28562]: <warn> [1495050516.0498] default-route: failed to add default route 0.0.0.0/0 via 150.108.68.1 dev 2 metric 100 mss 0 src user with effective metric 100 May 17 15:48:36 ourserver network: Shutting down loopback interface: [ OK ] (In reply to RobbieTheK from comment #30) > Seemingly out of nowhere this started to happen on our Fedora 25 server. The feature requested on this bug report is working correctly: you are getting log messages. If those log messages indicate a problem, that does not mean that this bug has not been fixed. If this recurs, please check which process is exceeding the dbus-daemon's limits and open a new bug on whatever bug tracking system is appropriate for that process, or in Fedora if that's where you received it. > Thousands of these logs: > > May 17 16:18:31 our server dbus-daemon[861]: [system] Connection ":1.271" is > not allowed to add more match rules (increase limits in configuration file > if required; max_match_rules_per_connection=2048) You can find out what this connection is (in particular its process ID, with which you can get more information from /proc, and its user ID) with a command like: dbus-send --system --dest=org.freedesktop.DBus --print-reply /org/freedesktop/DBus org.freedesktop.DBus.GetConnectionCredentials string::1.271 (In reply to Simon McVittie from comment #31) > (In reply to RobbieTheK from comment #30) > > Seemingly out of nowhere this started to happen on our Fedora 25 server. > > You can find out what this connection is (in particular its process ID, with > which you can get more information from /proc, and its user ID) with a > command like: > > dbus-send --system --dest=org.freedesktop.DBus --print-reply > /org/freedesktop/DBus org.freedesktop.DBus.GetConnectionCredentials > string::1.271 I was attempting to identify processes in question using 'busctl'. For a recent series of events of such sort this was pointing to wpa_supplicant, but on earlier occasions "culprits" apparently where polkitd and gnome-settings-daemon and gnome-shell and .... and other times some processes with connection identifiers which were NOT displayed by busctl. In other words reported results were changing depending on a phase of the moon. You may have a look at https://bugzilla.redhat.com/show_bug.cgi?id=1427721 if interested. Oh, and once "...is not allowed to add more match rules..." starts to flood logs the a reported connection id stays the same until the next reboot. When recently I tried 'dbus-send --system ....' as above process identification results were not different than what 'busctl' was showing. So far I observed this happening only on one particular machine, and this was a fairly recent phenomenon, but searching internet one can find multiple reports of that sort going many years back. (In reply to Michal Jaegermann from comment #32) > For a > recent series of events of such sort this was pointing to wpa_supplicant, > but on earlier occasions "culprits" apparently where polkitd and > gnome-settings-daemon and gnome-shell and ... If you believe you have found a bug in dbus-daemon or libdbus, please open a separate bug report with details. Bug #86442 is a feature request, and the feature has been implemented, so the feature request is closed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.