Summary: | D-Bus connections fail | ||
---|---|---|---|
Product: | systemd | Reporter: | rdbirt |
Component: | general | Assignee: | systemd-bugs |
Status: | RESOLVED NOTABUG | QA Contact: | systemd-bugs |
Severity: | normal | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | ARM | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
boot log of wandboard quad running linux 3.18.3 and systemd-216
output of 'journalctl -b' after the failure has occurred strace before the issue occurs strace after the issue has occurred |
Description
rdbirt
2015-03-05 00:34:58 UTC
This could be a sign that PID 1 crashed (segfault, assertion error, etc). Can you provide the full boot log (journalctl -b)? Created attachment 114032 [details]
boot log of wandboard quad running linux 3.18.3 and systemd-216
This seems to have something to do with the time of day. The board has no battery so on each reset the time goes back to 'Wed Dec 31 1969'. Setting the time after reset, either manually using 'date' or over the network using 'rdate', causes something to fail about 14 minutes later. If the current time is not set immediately on reset the problem does not occur. Is the bootlog in #c2 from a failed boot? It doesn't fail at boot time. It fails about 14 minutes after boot if I set the time of day right after booting is complete. If I don't set the time of day the failure does not occur. So boot logs always look the same. Can you provide a log from boot all the way until after the failure? yes, i can, but there is nothing in the system log after the failure occurs. Created attachment 114331 [details]
output of 'journalctl -b' after the failure has occurred
sorry, i was incorrect; there is something in the log around the time the failure occurs. I now tested this a bit, and the result are a bit different as running as unprivileged user and as root. I tried removing /var/run/dbus/system_bus_socket and /run/systemd/private, killing dbus daemon, and freezing PID 1 with kill -ABRT. In all cases, the error is either "connection refused", "no such file or directory", or a timeout. You get a permission error, which suggest something different. Do you have SELinux or other LSM? No, no SELinux or any other LSM. Can you strace 'systemctl status'? Preferably with -e network,file to reduce the amount of logs. Created attachment 114357 [details]
strace before the issue occurs
Created attachment 114358 [details]
strace after the issue has occurred
So in the "good" trace, socket() is called. In the "bad" trace, this does not even happen, and systemctl fails with "Failed to get D-Bus connection: Unknown error -1". This smells like some strange dbus problem. Do you have any advice about what I should do next? if i set the time of day before the board has been up for 15 minutes then the problem occurs. if i set the time of day after the board has been up for at least 15 minutes then there is no problem. does anyone know what the magic is that happens at 15 minutes of up time? the problem seems to be caused by systemd-tmpfiles-clean.service. if i disable it then the problem does not occur. also, systemd-tmpfiles-clean.timer is set for 15 minutes after boot and each 24 hours thereafter. if either of the following lines is in /usr/lib/tmpfiles.d/tmp.conf d /tmp 1777 root root 10d d /var/tmp 1777 root root 30d then the problem occurs. my rather limited understanding is that these lines should just ensure that the specified directory should exist and, if it does not, to create it. it seems, however, to do more than that and a lot of file and directories are removed. (In reply to rdbirt from comment #19) > if either of the following lines is in /usr/lib/tmpfiles.d/tmp.conf > > d /tmp 1777 root root 10d > d /var/tmp 1777 root root 30d > > then the problem occurs. > > my rather limited understanding is that these lines should just ensure that > the specified directory should exist and, if it does not, to create it. it > seems, however, to do more than that and a lot of file and directories are > removed. It also deletes files and directories not modifies in the specified time (10 days in this case). See http://www.freedesktop.org/software/systemd/man/tmpfiles.d.html#Age. It also deletes files and directories not modifies in the specified time (10 days in this case). See http://www.freedesktop.org/software/systemd/man/tmpfiles.d.html#Age. But the board has been powered up for only 15 minutes. Right, but when the time is updated, files suddenly become much older. So what I think is happening is that a file is created with a timestamp, the time jumps forward by about three months, the files that were created previously now are apparently much older, some are deleted. For some reason it appears as if your /run/systemd/system has been cleaned up. The question is why, though? How does your mount table look like when this happens? If you say that the tmpfiles lines for /tmp or /var/tmp clean this up, this looks as if in some weird way those caused tmpfiles to iterate through /run? Any idea how that could happen? yes, because in the default Buildroot directory layout /run is a link to /tmp. Changing that resolves the issue. Thanks for the help! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.