@hewittc on GitHub reported a boot sequence issue on an ARM device running Arch Linux, apparently triggered by a libexpat upgrade: with libexpat >= 2.2.1, system services like systemd-logind and systemd process 1 would frequently fail to connect to the dbus-daemon. This caused subsequent uses of systemd-logind (in particular system logins) to fail, making the system rather broken. Enabling verbose logging for the dbus-daemon indicated that the dbus-daemon starts 7.2 seconds after boot, stops logging anything at 7.3 seconds, then wakes up again and resumes logging things at 38.1 seconds. That's shortly after the kernel reports this: [ 37.789094] ngsa-2 kernel: random: nonblocking pool is initialized which I think is the time at which getrandom() stops blocking. So I suspect that the dbus-daemon is blocking in libexpat for approximately 30 seconds while waiting for entropy. 30ish seconds is too long for the timeouts used in systemd, systemd-logind and systemd-networkd - by the time the dbus-daemon has woken up, those system services have already timed out. The motivation for using high-quality entropy seems to be to avoid hash collision complexity attacks involving crafted XML files that collide. dbus-daemon does not actually need this: the only XML that we parse is completely trusted (it's the dbus-daemon's configuration). So this failure mode could be avoided by asking libexpat to not wait for high-quality entropy. XML_SetHashSalt() with some arbitrary nonzero hash_salt parameter is an undocumented way to do this, which a libexpat developer has confirmed should work. As a short term fix we can add a call to this. It might become a no-op one day, but it can't hurt. Longer term, libexpat might gain a function like XML_UseWeakRandomness() or something, which we could also call.
https://github.com/libexpat/libexpat/issues/91 is the Expat side of this.
See also https://github.com/systemd/systemd/issues/6418.
Created attachment 132813 [details] [review] [1.10] config-loader-expat: Tell Expat not to defend against hash collisions By default, Expat uses cryptographic-quality random numbers as a salt for its hash algorithm, and since 2.2.1 it gets them from the getrandom syscall on Linux. That syscall refuses to return any entropy until the kernel's CSPRNG (random pool) has been initialized. Unfortunately, this can take as long as 40 seconds on embedded devices with few entropy sources, which is too long: if the system dbus-daemon blocks for that length of time, important D-Bus clients like systemd and systemd-logind time out and fail to connect to it. We're parsing small configuration files here, and we trust them completely, so we don't need to defend against hash collisions: nobody is going to be crafting them to cause pathological performance. --- This is for 1.10. A slightly different version is needed for master because we changed the build around a bit. This also doesn't solve anything for CMake builds, but I'm not sure we care. The Autotools build system is recommended for anything other than: (a) MSVC on Windows (b) building on Linux to check that the CMake build system isn't completely broken
Created attachment 132814 [details] [review] [master] config-loader-expat: Tell Expat not to defend against hash collisions --- Same thing but for master, where XML_CFLAGS and XML_LIBS have become EXPAT_* because we stopped pretending to support any other XML parser.
Testing on the failing hardware, with the unpatched Expat >= 2.2.1, would be appreciated.
I reverted any local modifications to expat and dbus, verified that the boot behavior was broken, then applied the patch for dbus 1.10 and built a new dbus package. Indeed the system now boots as expected once again.
Thanks, added your Tested-by in my local version of those commits. Hopefully a reviewer can take a look at these soon. (Also at <https://github.com/smcv/dbus/commit/101858-expat-entropy> and <https://github.com/smcv/dbus/commit/101858-expat-entropy-1.10> for CI testing)
Comment on attachment 132813 [details] [review] [1.10] config-loader-expat: Tell Expat not to defend against hash collisions Review of attachment 132813 [details] [review]: ----------------------------------------------------------------- r+
Comment on attachment 132814 [details] [review] [master] config-loader-expat: Tell Expat not to defend against hash collisions Review of attachment 132814 [details] [review]: ----------------------------------------------------------------- r+
Thanks, fixed in git for 1.10.24 and 1.11.18.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.