Summary: | Gnome session won't start because d-bus auth fails abjectly | ||
---|---|---|---|
Product: | dbus | Reporter: | Jim Carter <jimc> |
Component: | core | Assignee: | Havoc Pennington <hp> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | 1.5 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Jim Carter
2013-07-13 00:59:54 UTC
(In reply to comment #0) > Another suggestion: Add another auth mechanism using SO_PASSCRED and > SCM_CREDENTIALS and desist with the cookie file business. This could only > be on Linux (kernel 2.4 and above), and some BSD variants with a different > protocol. That's the EXTERNAL mechanism, which has been supported for years - I think it might have been the first one implemented, in fact. It must be failing for you for some reason... The maintainers of D-Bus in SuSE might have some useful insight? As you point out, EXTERNAL can only work on vaguely modern Linux and *BSD, but that should cover the majority of D-Bus users. > Symptom: Our users' home directories reside on file servers and are mounted > by NFS (root squashed) on workstations. I believe the current status of NFS-home with D-Bus is "none of the maintainers use it; good luck". > Suggestion to the developers: Put the cookie file/directory in a location > known to be on the local machine such as /var/run or /tmp. /var is not known to be local; neither is /tmp; neither is the root filesystem. @simon, thanks for pointing me in a useful direction. This bug (or my understanding of it) has mutated and evolved. It turns out that D-Bus successfully does most of EXTERNAL authentication, but then tries to obtain user info and fails. The given UID is not in the local /etc/passwd and D-Bus does not try either nscd or net directory services (NIS in our case) even though they are available. If I add the user to the local /etc/passwd (not practical for production), EXTERNAL authentication succeeds the first time and every time. The same D-Bus symptom is seen for all 3 desktop environments: Gnome, KDE and XFCE; but it is only fatal for Gnome; the others stumble forward without being able to contact ConsoleKit or power control. After a reboot the D-Bus failure is seen two or three times, after which it mysteriously self-heals. The kludge with symlinking ~/.dbus-keyrings is ineffective, because D-Bus can't find the home home directory to do DBUS_COOKIE_SHA1, and even if it could, on OpenSuSE D-Bus runs as the messagebus user, so it could not deposit the cookie. I had just assumed it was running as root, and that later success was caused by the intervention. The reason is bizarre: when I run OpenSuSE 12.3 out of the box, just modifying /etc/nsswitch.conf to use NIS and DNS as appropriate, it hangs while booting, and I can't even get on the machine to identify positively which service is hanging. My kludgey workaround was to link in a files-only /etc/nsswitch.conf very early, and only after the network has started do I link in the network-enabled /etc/nsswitch.conf. D-Bus starts way before that, knowing only of the files, and it has experience from early tries to authenticate root that nscd has not started. This scenario is not proven step by step in straces, but it explains all the symptoms observed, including eventual self-healing when a timeout passes and it can attempt nscd again, succeeding. My workaround was to put in a systemd unit "after" network.target and nscd.service which restarts dbus.service. This kills some but not all connected daemons; I made it "before" upower.service and console-kit-daemon.service, which do not reconnect. (OpenSuSE starts ConsoleKit preemptively as an optimization, even though it's bus-activated.) This is brutal but it gets the job done; now my users can start up a Gnome session and get it right the first time and every time. It's probably not justified for D-Bus to change to accomodate my specific use-case, but D-Bus is very important and general robustness is valuable. How could D-Bus be changed to be more immune from the vagaries of directory services? And I'm wondering if my hang on boot could be related and could be fixed by an intervention to avoid using directory services. If a user is authenticated by SO_PEERCRED (the EXTERNAL mechanism for D-Bus) but has no user info in any accessible directory service (/etc/passwd), is it really necessary to reject the authentication? If not, the whole can of worms can be bypassed in contexts where EXTERNAL works, i.e. Linux. If D-Bus has a service to report user info about the connection, it could give a "service not available" error or report the user as Nobody. If you have to fall back to DBUS_COOKIE_SHA1, you need to put the cookie in some directory which the server is assured of permission to write on. Linux can be used on a discless workstation so I shouldn't have previously said "on the local machine", but there has got to be someplace that the D-Bus user has write access to. If this were in /tmp or /var/run or something like that, and if the keyring file were named using the user's numeric UID, it could all be done without reference to directory services. But in OpenSuSE, D-Bus runs as -u messagebus, so it's not clear how it could ever chown the file to prove that the authenticating user could read it. (It might run as root on other OS's or there might be a setUID helper that the Linux implementation doesn't have.) We'd better make sure that EXTERNAL auth always succeeds in Linux. Flexibility would be added if the server would tell the client, in its first challenge, where it had put the keyring. But of course this would be a different mechanism. (In reply to comment #2) > My workaround was to put in a systemd unit "after" network.target and > nscd.service which restarts dbus.service. Restarting the system dbus-daemon is not a supported action. It might coincidentally work, but if you do that, "you're on your own". I think the root cause of your problem is the use of NIS, rather than the remote home directories. > when I run OpenSuSE 12.3 out of the box, just modifying > /etc/nsswitch.conf to use NIS and DNS as appropriate, it hangs while > booting, and I can't even get on the machine to identify positively which > service is hanging. I suspect this is because a service that starts before networking needs a uid that is not in your /etc/passwd. System users (such as messagebus) should be in /etc/passwd or in some sort of local cache, so that system services that are essential for networking can start, so that networking can come up, so that the rest of the system can work. > D-Bus starts way before that, knowing only of the files, and it > has experience from early tries to authenticate root that nscd has > not started. Have you tried giving dbus-daemon "Wants=nscd" and "After=nscd", and ensuring that nscd does not depend on anything that can't happen that early? If you're using the glibc nscd, there were a lot of serious bug reports about it during Debian 7 development - Debian now seems to be recommending use of unscd instead. You might have better results with that. Where I work, our sysadmins do remote uid synching by writing a local cache (nss-db or something, I think) on each server, and keeping that up-to-date out-of-band - this makes our servers considerably more reliable, by ensuring that they can boot (albeit maybe with a slightly outdated user database) even when disconnected. I would advocate that approach, if possible. > My kludgey workaround was to link in a files-only > /etc/nsswitch.conf very early, and only after the network has > started do I link in the network-enabled /etc/nsswitch.conf. It's possible that dbus-daemon has already cached a negative query result for a user it expects to see, or something? > If a user is authenticated by SO_PEERCRED (the EXTERNAL mechanism for D-Bus) > but has no user info in any accessible directory service (/etc/passwd), is > it really necessary to reject the authentication? To try to avoid subtle security flaws, the general philosophy is "if strange things are going on in security-sensitive code, if in doubt, reject". This might be a situation where D-Bus is being too strict, but we'd have to think about it carefully to make sure we're not opening up vulnerabilities. > If you have to fall back to DBUS_COOKIE_SHA1, you need to put the cookie in > some directory which the server is assured of permission to write on. As far as I understand it, it's the clients (not the dbus-daemon) that write these files - the client is proving that it can write to a file in its own home directory. I'm not sure whether DBUS_COOKIE_SHA1 is even relevant for the system bus, though - the unprivileged dbus-daemon user can't necessarily read users' home directories either. The system bus should really be using EXTERNAL. > If this were in /tmp or /var/run or something > like that, and if the keyring file were named using the user's numeric UID, > it could all be done without reference to directory services. Using well-known filenames in /tmp results in trivial denial-of-service, and sometimes also symlink attacks. > But in OpenSuSE, D-Bus runs as -u messagebus The system dbus-daemon should always run as an unprivileged user, typically called messagebus or dbus. Definitely the root cause of this mess is using network directory services. If I didn't use NIS or LDAP I wouldn't have the problem. I haven't actually proven on the net that uses LDAP that D-Bus can't be contacted shortly after boot, but the hang on boot, cured by linking in a files-only /etc/nsswitch.conf, definitely happens equally with LDAP and NIS. I do have all system users/groups like messagebus in a local password/shadow/group file, in sync on all hosts. The OS should not refer to any ordinary users until one of them types his loginID in the greeter box, and a cached negative result is therefore unlikely. The symptom happens not during boot but afterward, caused by a snarl-up early in the boot process. I tried your suggestion for hacking the unit files for D-Bus (minus the restarting kludge). In the first attempt, I altered the unit for dbus.socket, putting it After=nscd.service. The OS booted, but when I (as root) tried to start a session using SSH (pubkey) or a console login, I authenticated successfully but was kicked off. The console login reported: "Cannot make/remove an entry for the specified session", i.e. console-kit-daemon is inoperative. I can see no sign that D-Bus started, though I could see the message for rsyslogd, and dbus-daemon usually comes just after that. In the second attempt I reverted dbus.socket, and made dbus.service After=nscd.service. "After" was not honored; dbus-daemon started just before console-kit-daemon (the first client of D-Bus), and just after rsyslogd (i.e. early, way before nscd). The OS booted normally and would let users on, except that early Gnome sessions came to a terrible demise,, i.e. users logging in early could not authenticate to the system D-Bus. Paranoia about security is certainly a good idea. Maybe you or one of the other developers could remember what bad thing might happen if you believe in EXTERNAL auth for a nonexistent user. Certainly DBUS_COOKIE_SHA1 is impossible without user info (the homedir). But I can't see any way a hacker could gain any advantage by being nonexistent, and I think it's reasonable to rely on the login authentication system to keep out the riff-raff. It's valuable to disentangle mission-critical infrastructure from the possibly unavailable network directory service. On unscd, OpenSuSE v12.2 had that and I used it, but in v12.3 they have reverted to the non-U nscd and I went along. In v11.2 or v11.4 I had nscd freeze on me once too often, and discontinued it, but I was attracted by the promised improvements in unscd. Likely unscd and nscd were merged in the past year or so. I think I have a feasible workaround. The unit that restarted D-Bus has been removed (yay!) /etc/nsswitch.conf was reverted to use files then NIS or DNS. Units that switched in a files-only nsswitch.conf were removed. nscd.service was hacked to be Before=dbus.service. See Simon's comment 3. This combination does not hang during booting, and as soon as the network comes up and user logins are allowed, users in NIS can start a Gnome session and successfully authenticate to the system D-Bus. I'm assuming that D-Bus gets user info from glibc's NSS, which first tries nscd, successfully because of Before=dbus.service. Apparently when the network is not yet up, nscd is less picky or prone to hanging than the lookup code intrinsic to NSS. So I think my issue is now finished. However, I do suggest that D-Bus could avoid potential nasty snarl-ups if it didn't need to look up passwd info, except for DBUS_COOKIE_SHA1. Thank you for your help in this, getting me pointed on a path that led to a solution. (In reply to comment #5) > nscd.service was hacked to be Before=dbus.service. This... > However, I do suggest that D-Bus could > avoid potential nasty snarl-ups if it didn't need to look up passwd info, > except for DBUS_COOKIE_SHA1 basically agrees with Bug #28355 (which is about LDAP, not NIS, but the general idea is the same - "get your users from the network"). *** This bug has been marked as a duplicate of bug 28355 *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.