Bug 89434 - D-Bus connections fail
Summary: D-Bus connections fail
Status: RESOLVED NOTABUG
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: ARM Linux (All)
: medium normal
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-05 00:34 UTC by rdbirt
Modified: 2015-04-21 01:48 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
boot log of wandboard quad running linux 3.18.3 and systemd-216 (34.82 KB, text/plain)
2015-03-05 17:28 UTC, rdbirt
Details
output of 'journalctl -b' after the failure has occurred (34.90 KB, text/plain)
2015-03-15 21:51 UTC, rdbirt
Details
strace before the issue occurs (26.44 KB, text/plain)
2015-03-17 02:02 UTC, rdbirt
Details
strace after the issue has occurred (9.53 KB, text/plain)
2015-03-17 02:02 UTC, rdbirt
Details

Description rdbirt 2015-03-05 00:34:58 UTC
running a system generated by Buildroot 2014.11 on a Freescale Wandboard quad; kernel version is 3.18.3

the system boots and runs fine for a while; after a seemingly random amount of time invoking the 'systemctl' command returns an error:

    'Failed to get D-Bus connection: Operation not permitted'

'halt' and 'shutdown' commands return

    'Failed to talk to init daemon.'

i have tried systemd-216 and systemd-219 and both have this behaviour.
Comment 1 Zbigniew Jedrzejewski-Szmek 2015-03-05 01:15:47 UTC
This could be a sign that PID 1 crashed (segfault, assertion error, etc). Can you provide the full boot log (journalctl -b)?
Comment 2 rdbirt 2015-03-05 17:28:20 UTC
Created attachment 114032 [details]
boot log of wandboard quad running linux 3.18.3 and systemd-216
Comment 3 rdbirt 2015-03-13 22:52:45 UTC
This seems to have something to do with the time of day.

The board has no battery so on each reset the time goes back to 'Wed Dec 31 1969'.  Setting the time after reset, either manually using 'date' or over the network using 'rdate', causes something to fail about 14 minutes later.

If the current time is not set immediately on reset the problem does not occur.
Comment 4 Zbigniew Jedrzejewski-Szmek 2015-03-15 19:11:54 UTC
Is the bootlog in #c2 from a failed boot?
Comment 5 rdbirt 2015-03-15 21:20:23 UTC
It doesn't fail at boot time.  It fails about 14 minutes after boot if I set the time of day right after booting is complete.  If I don't set the time of day the failure does not occur.

So boot logs always look the same.
Comment 6 Zbigniew Jedrzejewski-Szmek 2015-03-15 21:22:55 UTC
Can you provide a log from boot all the way until after the failure?
Comment 7 rdbirt 2015-03-15 21:32:11 UTC
yes, i can, but there is nothing in the system log after the failure occurs.
Comment 8 rdbirt 2015-03-15 21:51:45 UTC
Created attachment 114331 [details]
output of 'journalctl -b' after the failure has occurred
Comment 9 rdbirt 2015-03-15 21:52:46 UTC
sorry, i was incorrect; there is something in the log around the time the failure occurs.
Comment 10 Zbigniew Jedrzejewski-Szmek 2015-03-16 01:20:23 UTC
I now tested this a bit, and the result are a bit different as running as unprivileged user and as root. I tried removing /var/run/dbus/system_bus_socket and /run/systemd/private, killing dbus daemon, and freezing PID 1 with kill -ABRT. In all cases, the error is either "connection refused", "no such file or directory", or a timeout. You get a permission error, which suggest something different. Do you have SELinux or other LSM?
Comment 11 rdbirt 2015-03-16 17:01:06 UTC
No, no SELinux or any other LSM.
Comment 12 Zbigniew Jedrzejewski-Szmek 2015-03-16 17:21:36 UTC
Can you strace 'systemctl status'? Preferably with -e network,file to reduce the amount of logs.
Comment 13 rdbirt 2015-03-17 02:02:27 UTC
Created attachment 114357 [details]
strace before the issue occurs
Comment 14 rdbirt 2015-03-17 02:02:48 UTC
Created attachment 114358 [details]
strace after the issue has occurred
Comment 15 Zbigniew Jedrzejewski-Szmek 2015-03-17 02:30:39 UTC
So in the "good" trace, socket() is called. In the "bad" trace, this does not even happen, and systemctl fails with "Failed to get D-Bus connection: Unknown error -1". This smells like some strange dbus problem.
Comment 16 rdbirt 2015-03-18 00:07:26 UTC
Do you have any advice about what I should do next?
Comment 17 rdbirt 2015-03-19 02:12:40 UTC
if i set the time of day before the board has been up for 15 minutes then the problem occurs.  if i set the time of day after the board has been up for at least 15 minutes then there is no problem.  does anyone know what the magic is that happens at 15 minutes of up time?
Comment 18 rdbirt 2015-03-24 02:19:47 UTC
the problem seems to be caused by systemd-tmpfiles-clean.service.  if i disable it then the problem does not occur.  also, systemd-tmpfiles-clean.timer is set for 15 minutes after boot and each 24 hours thereafter.
Comment 19 rdbirt 2015-03-27 19:53:48 UTC
if either of the following lines is in /usr/lib/tmpfiles.d/tmp.conf

    d /tmp 1777 root root 10d
    d /var/tmp 1777 root root 30d

then the problem occurs.

my rather limited understanding is that these lines should just ensure that the specified directory should exist and, if it does not, to create it.  it seems, however, to do more than that and a lot of file and directories are removed.
Comment 20 Zbigniew Jedrzejewski-Szmek 2015-03-28 03:27:41 UTC
(In reply to rdbirt from comment #19)
> if either of the following lines is in /usr/lib/tmpfiles.d/tmp.conf
> 
>     d /tmp 1777 root root 10d
>     d /var/tmp 1777 root root 30d
> 
> then the problem occurs.
> 
> my rather limited understanding is that these lines should just ensure that
> the specified directory should exist and, if it does not, to create it.  it
> seems, however, to do more than that and a lot of file and directories are
> removed.
It also deletes files and directories not modifies in the specified time (10 days in this case). See http://www.freedesktop.org/software/systemd/man/tmpfiles.d.html#Age.
Comment 21 rdbirt 2015-03-30 18:47:47 UTC
It also deletes files and directories not modifies in the specified time (10 days in this case). See http://www.freedesktop.org/software/systemd/man/tmpfiles.d.html#Age.

But the board has been powered up for only 15 minutes.
Comment 22 Zbigniew Jedrzejewski-Szmek 2015-03-31 13:40:02 UTC
Right, but when the time is updated, files suddenly become much older. So what I think is happening is that a file is created with a timestamp, the time jumps forward by about three months, the files that were created previously now are apparently much older, some are deleted.
Comment 23 Lennart Poettering 2015-04-20 14:34:10 UTC
For some reason it appears as if your /run/systemd/system has been cleaned up. The question is why, though?

How does your mount table look like when this happens?

If you say that the tmpfiles lines for /tmp or /var/tmp clean this up, this looks as if in some weird way those caused tmpfiles to iterate through /run? Any idea how that could happen?
Comment 24 rdbirt 2015-04-21 01:47:25 UTC
yes, because in the default Buildroot directory layout /run is a link to /tmp.  Changing that resolves the issue.  Thanks for the help!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.