Bug 72816 - Interaction between systemd and systemd-logind fails during reloads with many units
Summary: Interaction between systemd and systemd-logind fails during reloads with many...
Status: RESOLVED FIXED
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium major
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-17 23:11 UTC by David Strauss
Modified: 2019-02-28 10:56 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description David Strauss 2013-12-17 23:11:56 UTC
systemd sends DBus messages for every unit added and removed, which is all of them for reloads. This is on v208 with the Fedora 20 patches, but it looks like the same occurs in master.

systemd-logind can't keep up with these:

Dec 17 23:07:02 519149-11pod01 dbus[1636]: [system] Rejected: destination has a full message queue, 0 matched rules; type="signal", sender=":1.55216" (uid=0 pid=1 comm="/usr/lib/systemd/systemd --system --deserialize 26") interface="org.freedesktop.systemd1.Manager" member="UnitRemoved" error name="(unset)" requested_reply="0" destination="org.freedesktop.DBus" (uid=0 pid=32361 comm="/usr/lib/systemd/systemd-logind ")
Dec 17 23:07:02 519149-11pod01 dbus[1636]: [system] Rejected: destination has a full message queue, 0 matched rules; type="signal", sender=":1.55216" (uid=0 pid=1 comm="/usr/lib/systemd/systemd --system --deserialize 26") interface="org.freedesktop.systemd1.Manager" member="UnitRemoved" error name="(unset)" requested_reply="0" destination="org.freedesktop.DBus" (uid=0 pid=32361 comm="/usr/lib/systemd/systemd-logind ")

Here's what it looks like from dbus-monitor --system:

signal sender=:1.55216 -> dest=(null destination) serial=24348 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=UnitRemoved
   string "nginx_c4d9e5763f294ca19d6abf242a2575cc.socket"
   object path "/org/freedesktop/systemd1/unit/nginx_5fc4d9e5763f294ca19d6abf242a2575cc_2esocket"
signal sender=:1.55216 -> dest=(null destination) serial=24349 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=UnitRemoved
   string "nginx_add76a5f40704da0832bb6065de7f342.socket"
   object path "/org/freedesktop/systemd1/unit/nginx_5fadd76a5f40704da0832bb6065de7f342_2esocket"
signal sender=:1.55216 -> dest=(null destination) serial=24350 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=UnitRemoved
Comment 1 David Strauss 2013-12-17 23:19:36 UTC
Because messages get lost, systemd-logind gets out of sync with systemd's unit data. This causes a proliferation of orphaned systemd-related processes on the system.
Comment 2 David Strauss 2013-12-17 23:25:10 UTC
This also causes heavy context-switching in systemd and DBus.

Linux 3.11.10-200.fc19.x86_64 (519149-11pod01.panth.io.pod1.panth.io) 	12/17/2013 	_x86_64_	(24 CPU)

11:22:09 PM   UID      TGID       TID   cswch/s nvcswch/s  Command
[...]
Average:       81      1636         -    419.12      0.37  dbus-daemon
Average:       81         -      1636    419.12      0.37  |__dbus-daemon
[...]
Average:        0         1         -    645.82      0.09  systemd
Average:        0         -         1    645.82      0.09  |__systemd

That's a 10-second sample, and it's sustained for a long time when a few reloads get queued.
Comment 3 David Strauss 2013-12-19 00:29:26 UTC
Because we don't actually need systemd user sessions, we're working around this issue by masking logind:

# systemctl mask systemd-logind.service

Because the PAM module checks for a path created by logind to see if it's running, masking the service and rebooting puts things in good shape for us.
Comment 4 David Strauss 2013-12-19 00:32:58 UTC
Just removing the lagging subscriber leaves D-Bus (and the journal, which has to log D-Bus's complaints) much happier. This is the same sort of 10-second sample from pidstat as before, just with logind masked.

Linux 3.11.10-200.fc19.x86_64 (519149-11pod01.panth.io.pod1.panth.io) 	12/19/2013 	_x86_64_	(24 CPU)

12:29:53 AM   UID      TGID       TID   cswch/s nvcswch/s  Command
[...]
Average:       81      1808         -     16.90      0.00  dbus-daemon
Average:       81         -      1808     16.90      0.00  |__dbus-daemon
Comment 5 Lennart Poettering 2019-02-28 10:56:20 UTC
I am pretty sure this has been fixed since. If this is reproducible on current sytemd versions please file a new bug on github.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.