When /var/lib/dbus/machine-id is missing, applications that uses dbus basically abort because dbus does so, sample pstack, $ pidgin process 8300: D-Bus library appears to be incorrectly set up; failed to read machine uuid: Failed to open "/var/lib/dbus/machine-id": No such file or directory See the manual page for dbus-uuidgen to correct this issue. D-Bus not built with -rdynamic so unable to print a backtrace Abort (core dumped) -bash-3.00$ pstack core core 'core' of 8300: pidgin fef3e8d7 _lwp_kill (1, 6) + 7 feef6a52 raise (6) + 22 feed4fdd abort (fb3e461c, 8047444, fb3c8a18, fb3e4f90, fb3e461c, 8047478) + cd fb3cc2b9 _dbus_abort (fb3e4f90, fb3e461c, 8047478, 8047478, fb3c8fcf, fb3d37f8) + 61 fb3c8a18 _dbus_warn_check_failed (fb3d37f8, 8136758) + 68 fb3c8fcf _dbus_get_local_machine_uuid_encoded (804749c) + 8f fb3ce8ca _dbus_get_autolaunch_address (80474fc, 8047540) + 2a fb3c57f1 _dbus_transport_open_autolaunch (8133e80, 8047560, 8047540) + 5d fb3c5929 _dbus_transport_open (8133e80, 80475b0) + 8d ... This seems a bit drastic, it gave the impression that applications crashes while in actual fact this is a set up problem for dbus. google dbus machine-id seems that there have been lots of bugs/problem assocaited with this. Can dbus lib uses a less severe form of error handling? like uses _dbus_warn() which leads eventually to _dbus_abort if it fatal_warning is set. Hence, allowing stable release to run without unncessary core dump!
Having no id file is just the same as installing dbus without its config files, or patching it to have "exit(1)" in it, or otherwise breaking things on purpose. The reason it aborts loudly is because you need to fix your system. The error message says *exactly* how to fix your system and points to the docs, and fixing it takes only a minute or two. Normally your distribution will have done it for you, though. "make install" should probably run dbus-uuidgen, but if the library is fatally misinstalled, there isn't a way to continue. Aborting is appropriate. Leaving the bug open since we need to add dbus-uuidgen to make install.
Thanks Havoc ;)
(In reply to comment #1) > "make install" should probably run dbus-uuidgen I started implementing this, but there are subtleties: - if cross-compiling, we can't run it - if installing into a DESTDIR to create packages, we don't want to run it, to avoid packagers accidentally setting the same machine ID on every machine running their distribution - if installing into a prefix, ideally the user would have set the machine-ID path to the normal system-wide one, to avoid split-brain problems - distcheck will fail, because we certainly don't want to uninstall the machine ID! so I'm not sure it's such a good idea. To avoid split-brain problems where two parts of an OS think they have different machine IDs, I think we want to encourage the use of a distro-provided D-Bus package, which will have a single value for the machine ID's path, and on any sane system will arrange for dbus-uuidgen --ensure to be run. If we accepted the patch from Bug #23679 (or similar), we'd solve this globally for Linux by using the kernel's idea of the boot ID. Bug #29618 is related.
Bug #42531 (Android support) has a related patch.
(In reply to comment #3) > To avoid split-brain problems where two parts of an OS think they have > different machine IDs, I think we want to encourage the use of a > distro-provided D-Bus package, which will have a single value for the > machine ID's path, and on any sane system will arrange for dbus-uuidgen > --ensure to be run. ... or systemd (which installs /etc/machine-id, with which dbus is perfectly happy), or any piece of OS infrastructure that installs a systemd-compatible /etc/machine-id. Having said that, making dbus_connection_open() / dbus_bus_get() fail with a DBusError seems like it would be a nicer way to signal "this system is broken". In practice, things that *need* D-Bus will respond to that by aborting/crashing/warning and exiting anyway, and things that have D-Bus as nonessential functionality (Pidgin) can presumably survive it.
I've come around to the opinion that we should deal with this much more gracefully than we do, and that Havoc's point of view in Comment #1, while certainly defensible, is unnecessarily dogmatic. (In reply to Havoc Pennington from comment #1) > Having no id file is just the same as installing dbus without its config > files, or patching it to have "exit(1)" in it, or otherwise breaking things > on purpose. Yes ish... but in practice, the dbus library actually mostly works without a machine ID (and it turns out Travis-CI has been testing that situation for months). I agree that the error message produced when failing to do something due to lack of a machine ID should indicate that this is a misconfiguration. However, we have better ways to deal with recoverable errors than just aborting, and in particular applications know better than we do whether using dbus is optional or mandatory for that particular application. Also, having libdbus installed does not *necessarily* mean you have dbus-daemon and dbus-uuidgen installed - some software links to libdbus at build time but does not hard-require a D-Bus bus at runtime. (In reply to Havoc Pennington from comment #1) > "make install" should probably run dbus-uuidgen It can't, because: * dbus-uuidgen doesn't understand DESTDIR * dbus-uuidgen is totally inappropriate to run when building packages for something like RPM or dpkg, which are a much more frequent way to obtain dbus than building it yourself, and IMO something we should be strongly recommending - if dbus is critical OS infrastructure then you should normally get it from your OS vendor * creating the machine ID but not uninstalling it makes distcheck fail * uninstalling the machine ID is never correct (In reply to Simon McVittie from comment #5) > Having said that, making dbus_connection_open() / dbus_bus_get() fail with a > DBusError seems like it would be a nicer way to signal "this system is > broken". In practice, things that *need* D-Bus will respond to that by > aborting/crashing/warning and exiting anyway, and things that have D-Bus as > nonessential functionality (Pidgin) can presumably survive it. In fact dbus_connection_open()/dbus_bus_get() work fine without a machine ID if DBUS_SESSION_BUS_ADDRESS is already set or if $XDG_RUNTIME_DIR/bus is available, but then crash out if and only if you try to use X11 autolaunch. Similarly, dbus-daemon actually works fine without a machine ID until you call Peer.GetMachineId() on it, at which point it crashes. This does not seem a consistent position to take. There are two things that would be consistent: * Work as well as we can without a machine ID, while still indicating in error messages for the things that don't work that this situation is misconfiguration. * Check for the machine ID up-front even if we don't need it, and refuse to work without it. We do not do either of those right now, and we never have. I strongly prefer the first, because the second seems like being difficult for the sake of being difficult.
The patches I'm about to attach depend on those from Bug #101257, but could be easily be rebased the other way round.
Created attachment 131783 [details] [review] [1/3] uuidgen: Remove unimplemented declaration As far as I can tell from git history, this function never existed.
Created attachment 131784 [details] [review] [2/3] Make _dbus_get_local_machine_uuid_encoded() properly failable This function already raised an error, and all callers handled that error as gracefully as they could (because _dbus_generate_uuid() is failable, since 2015). Given that, it seems unnecessarily hostile to do a _dbus_warn_check_failed() unless we have no better alternative: yes, it indicates that dbus has not been installed correctly, but during build-time tests it's entirely reasonable that dbus has not yet been installed. Callers are: * DBusConnection, to implement Peer.GetMachineId() * The bus driver, to implement Peer.GetMachineId() * X11 autolaunching * dbus_get_local_machine_id() Of those, only the last one is not in a position to return an error gracefully, so move the _dbus_warn_check_failed() to there. Migrate the text about the D-Bus library being incorrectly set up into the error emitted by the Unix implementation, and to make it less misleading, include separate error messages for both the files we try to read: $ bwrap --ro-bind / / --dev /dev --tmpfs /etc --tmpfs /var \ ./tools/dbus-uuidgen --get D-Bus library appears to be incorrectly set up: see the manual page for dbus-uuidgen to correct this issue. (Failed to open "/var/lib/dbus/machine-id": No such file or directory; Failed to open "/etc/machine-id": No such file or directory)
Created attachment 131785 [details] [review] [3/3] Add dbus_try_get_local_machine_id() --- dbus_get_local_machine_id() continues to issue a warning that defaults to fatal. We could demote it to _dbus_warn() so that it defaults to non-fatal if people would prefer that, but in any case the long-term solution is to use a function that has proper error handling.
(In reply to Simon McVittie from comment #9) > $ bwrap --ro-bind / / --dev /dev --tmpfs /etc --tmpfs /var \ > ./tools/dbus-uuidgen --get > D-Bus library appears to be incorrectly set up: see the manual > page for dbus-uuidgen to correct this issue. (Failed to open > "/var/lib/dbus/machine-id": No such file or directory; Failed to open > "/etc/machine-id": No such file or directory) Also: $ bwrap --ro-bind / / --dev /dev --tmpfs /etc --tmpfs /var \ ./tools/dbus-launch --exit-with-x11 dbus[12857]: D-Bus library appears to be incorrectly set up: see the manual page for dbus-uuidgen to correct this issue. (Failed to open "/var/lib/dbus/machine-id": No such file or directory; Failed to open "/etc/machine-id": No such file or directory) So that seems to be behaving as desired: you get a big fat warning, an unsuccessful exit, and no crash.
Comment on attachment 131783 [details] [review] [1/3] uuidgen: Remove unimplemented declaration Review of attachment 131783 [details] [review]: ----------------------------------------------------------------- Trivially yes.
Comment on attachment 131784 [details] [review] [2/3] Make _dbus_get_local_machine_uuid_encoded() properly failable Review of attachment 131784 [details] [review]: ----------------------------------------------------------------- ::: dbus/dbus-internals.c @@ -912,5 @@ > - > - if (!_dbus_read_local_machine_uuid (&machine_uuid, FALSE, > - &local_error)) > - { > -#ifndef DBUS_ENABLE_EMBEDDED_TESTS Is the conditional compilation on DBUS_ENABLE_EMBEDDED_TESTS no longer necessary? It’s been omitted from where the _dbus_warn_check_failed() has been moved to.
Comment on attachment 131785 [details] [review] [3/3] Add dbus_try_get_local_machine_id() Review of attachment 131785 [details] [review]: ----------------------------------------------------------------- This looks good. There are a couple of references to dbus_get_local_machine_id() in documentation comments elsewhere in the code (dbus/dbus-bus.c, dbus/dbus-connection.c) — do you want to update them to mention dbus_try_get_local_machine_id() instead, so people are consistently pointed in the right direction? Could be a follow-up patch if you prefer.
(In reply to Philip Withnall from comment #13) > Is the conditional compilation on DBUS_ENABLE_EMBEDDED_TESTS no longer > necessary? It’s been omitted from where the _dbus_warn_check_failed() has > been moved to. I think removing that conditional is fine. It was always rather a hack: either something is defined behaviour or it isn't, and I'm very wary of code that changes observable behaviour depending on whether we compiled in tests. Previously, the behaviour was this, if we assume fatal warnings: if no machine ID if embedded tests compiled in silently make up a new random UUID else if warnings are fatal warn crash else warn make up a new random UUID Now, the callers that can cope with errors get an error that contains the old warning, the caller that can't cope spams stderr with that warning, and we never just make up a new random UUID. Travis-CI's build workers do not have a machine ID (demonstrated by one of my new tests for o.fd.DBus.Peer failing until I fixed it), and the "embedded tests" pass, so we can infer that the "embedded tests" never exercised dbus_get_local_machine_id() anyway.
(In reply to Philip Withnall from comment #14) > There are a couple of references to > dbus_get_local_machine_id() in documentation comments elsewhere in the code > (dbus/dbus-bus.c, dbus/dbus-connection.c) — do you want to update them to > mention dbus_try_get_local_machine_id() instead, so people are consistently > pointed in the right direction? Could be a follow-up patch if you prefer. Good idea.
(In reply to Simon McVittie from comment #15) > Travis-CI's build workers do not have a machine ID (demonstrated by one of > my new tests for o.fd.DBus.Peer failing until I fixed it), and the "embedded > tests" pass, so we can infer that the "embedded tests" never exercised > dbus_get_local_machine_id() anyway. Sigh, no, this is not actually the case - the embedded tests rely on GetMachineId() making up a new UUID on the spot.
(In reply to Simon McVittie from comment #17) > (In reply to Simon McVittie from comment #15) > > Travis-CI's build workers do not have a machine ID (demonstrated by one of > > my new tests for o.fd.DBus.Peer failing until I fixed it), and the "embedded > > tests" pass, so we can infer that the "embedded tests" never exercised > > dbus_get_local_machine_id() anyway. > > Sigh, no, this is not actually the case - the embedded tests rely on > GetMachineId() making up a new UUID on the spot. How so? If it’s easy to do so, it might be better to fix the tests rather than keep the DBUS_ENABLE_EMBEDDED_TESTS conditional here, for all the reasons you gave in comment #15. Probably best if we keep the conditional in this patchset, though, and fix the tests and remove the conditional in a follow-up patch (in this bug or another).
(In reply to Philip Withnall from comment #18) > How so? If it’s easy to do so, it might be better to fix the tests rather > than keep the DBUS_ENABLE_EMBEDDED_TESTS conditional here Yes. I think I can see how to fix the tests without it being horrible, but I'm waiting for re-testing of Bug #101257 to finish first.
This failure mode can be exercised on a less deficient machine with: bwrap --bind / / \ --dev /dev \ --tmpfs /etc \ --bind /etc/passwd /etc/passwd \ --bind /etc/group /etc/group \ --bind /etc/nsswitch.conf /etc/nsswitch.conf \ --tmpfs /var \ make check (It turns out those are the only things in /etc that the dbus tests need.)
Created attachment 131809 [details] [review] [1/3] tests: Don't exercise GetMachineId() or autolaunch if no machine ID At the moment there is a hack in the implementation of GetMachineId() to stop tests from failing during "make check" on a system where dbus has never been installed, by silently generating a new unique fake "machine ID" for each process. I'm about to change that behaviour to report errors properly; skip affected test-cases if we can't read the real machine ID. The shell scripts to test dbus-launch are run both as "make check" tests (for which it is valid for dbus to be not correctly installed) and as installed-tests (for which that is not valid), so make them pass during "make check" but fail during installed testing. The tests in bus/ and test/name-test/ are only run during "make check" so they only have the code path where they are skipped.
Comment on attachment 131783 [details] [review] [1/3] uuidgen: Remove unimplemented declaration Applied, thanks
If https://travis-ci.org/smcv/dbus/builds/240863222 succeeds, then this is working fine. Attachment #131809 [details] is intended to fix the previously-failing "debug" and "cmake" builds. (If the build in a debian stretch container fails, that is probably not this bug - check the log. The apt mirror that it uses has been producing intermittent 503 errors recently.)
Comment on attachment 131809 [details] [review] [1/3] tests: Don't exercise GetMachineId() or autolaunch if no machine ID Review of attachment 131809 [details] [review]: ----------------------------------------------------------------- ++, with one nit. ::: test/name-test/test-autolaunch.c @@ +24,5 @@ > > + if (!_dbus_read_local_machine_uuid (&uuid, FALSE, &error)) > + { > + /* We can't expect autolaunching to work in this situation */ > + fprintf (stderr, "%s\n", error.message); Do you want to prefix this message with `*** ` like the others? It’s not TAP, but let’s be consistent.
(In reply to Simon McVittie from comment #16) > (In reply to Philip Withnall from comment #14) > > There are a couple of references to > > dbus_get_local_machine_id() in documentation comments elsewhere in the code > > (dbus/dbus-bus.c, dbus/dbus-connection.c) — do you want to update them to > > mention dbus_try_get_local_machine_id() instead, so people are consistently > > pointed in the right direction? Could be a follow-up patch if you prefer. > > Good idea. I pushed the obvious change unreviewed. (In reply to Philip Withnall from comment #24) > Do you want to prefix this message with `*** ` like the others? It’s not > TAP, but let’s be consistent. Done. One more patch: dbus-launch still used dbus_get_local_machine_id().
Created attachment 131813 [details] [review] dbus-launch: Use dbus_try_get_local_machine_id()
Comment on attachment 131813 [details] [review] dbus-launch: Use dbus_try_get_local_machine_id() Review of attachment 131813 [details] [review]: ----------------------------------------------------------------- Yes.
Thanks, pushed. Fixed in git for 1.11.14.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.