Please review this bug for more information.
Description of problem:
mesagebus service hangs on boot on system with ldap auth configured.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
hang until bored
workaround is to remove the entry for ldap from the group line in
/etc/nsswitch.conf not an acceptible long term solution.
It's also this issue.
And in SUSE Linux, I also found this issue. Any idea? Thanks!
Over on the redhat bugzilla, I provided the following analysis for
redhat bug 182464, which is also has to do with this same bug.
------ copied from redhat bugzilla bug 182464 -------
The root cause apparently has not been investigated yet. Reading the
source code of dbus-daemon has revealed the following:
dbus-daemon reads all the groups of the user root when it parses
the user="root" attributes in the configuration file. This triggers
many ldap lookups, that trigger the exponential back off of the
bind_policy hard setting in /etc/ldap.conf. So parsing the config
file takes long, and dbus-daemon forks only after parsing the config.
At that point, the boot continues.
The point is that dbus-daemon has a logical error in it. It is
not necessary to read the list of groups of a user ever. Such a
list is dynamic, it changes when naming services become available,
or when the ldap contents are changed. So dbus-daemon should rather
check group memberships when it needs to, i.e. when it has to
authorize a request. This could be done much more efficiently
using the getgrent family of calls instead of the getgrouplist
call dbus-daemon is currently using.
So I propose that the upstream providers of dbus-daemon are contacted
to get dbus-daemon fixed. Possible fixes;
1. quick and dirty: add an option to stop dbus-daemon from expanding
2. fix the logical error, don't use getgrouplist, check group membership
late and rely on nscd's caching mechanism for performance.
------ end of copy ------
In addition to what BinLi reports, there _is_ a better workaround, although
again an indesireable one: don't use "bind_policy hard" in /etc/ldap.conf,
use "bind_policy soft" instead. This causes the ldap lookups to fail, so
dbus-daemon will not get the LDAP groups but instead will quickly continue,
allowing the boot to go forward.
No, we can't call getgroups() dynamically; that implies parsing /etc/group on every message the daemon processes. This is obviously even worse with LDAP. While I haven't measured it, I'm sure it would be noticeable overhead even in the non-LDAP case.
The operating system needs a caching layer for this stuff. And it turns out one exists:
Actually there are two things here:
1) Move system bus services to PolicyKit, and thus gradually phase out all dbus daemon authorization. Actually...an intermediate step here is to detect if any config file specifies group="". If not, then we don't call getgroups().
2) Cache the groups, and get a notification from SSSD (over dbus even!) when the group list changes, and then do a reload.
(In reply to comment #3)
> No, we can't call getgroups() dynamically; that implies parsing /etc/group on
> every message the daemon processes. This is obviously even worse with LDAP.
> While I haven't measured it, I'm sure it would be noticeable overhead even in
> the non-LDAP case.
Maybe do the compromise? Lazily calling getgroups(), only when needed but then cache it for later?
*** Bug 66867 has been marked as a duplicate of this bug. ***
(In reply to comment #4)
> Maybe do the compromise? Lazily calling getgroups(), only when needed but
> then cache it for later?
I'd consider patches, but it sounds as though SSSD is a better solution to the problem of slow/potentially-offline NIS and LDAP than we're going to be able to NIH in libdbus.
(In reply to comment #3)
> Actually...an intermediate step here is to
> detect if any config file specifies group="". If not, then we don't call
I'd certainly consider patches for that - it sounds relatively unintrusive and moves us towards where we think we ought to be anyway.