Bug 71483

Summary: Issues during boot with automounts
Product: systemd Reporter: David Strauss <david>
Component: generalAssignee: systemd-bugs
Status: NEW --- QA Contact: systemd-bugs
Severity: critical    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Screenshot of boot with bar Requires= in VM console
Non-debug boot log with bad Requires=
Debug boot log with bad Requires=
Debug boot log with good Requires=
Non-debug boot log with good Requires=
Screenshot of boot with bad Requires= in VM console

Description David Strauss 2013-11-11 10:52:19 UTC
Created attachment 89008 [details]
Screenshot of boot with bar Requires= in VM console

Initially, I thought this was a performance issue with the CGroups refactoring in v205+, but I'm seeing more curious things now. This is with master as of 2bba9a57e795ba873b76d8a63a743807d70b1603.

With 12 mount + automount pairs, boot hangs for 60 seconds *each* on the first five of them if there's a misconfigured dependency.

Attachments:

 * A screenshot of VM console bootup (non-debug). This shows the 60 second delay for each of those five automounts.
 * A non-debug log from journalctl --boot. Note that there's a time gap of 60 * 5 seconds.
 * A debug log (using "debug" in the kernel options) from journalctl --boot.

The bizarre thing is that the delay doesn't occur in debug mode.

If I fix the mounts to be correct, I don't get a boot delay, but the automounts still don't work after initial boot. This is the more critical issue, but I'm not sure if it's related to the first.

[root@systemd-f19 ~]# systemctl status var-tmp-1.automount
var-tmp-1.automount - test automounter 1
   Loaded: loaded (/etc/systemd/system/var-tmp-1.automount; enabled)
   Active: failed (Result: resources)
    Where: /var/tmp/1

[root@systemd-f19 ~]# systemctl start var-tmp-1.automount
[root@systemd-f19 ~]# systemctl status var-tmp-1.automount
var-tmp-1.automount - test automounter 1
   Loaded: loaded (/etc/systemd/system/var-tmp-1.automount; enabled)
   Active: active (waiting) since Mon 2013-11-11 02:50:55 PST; 1s ago
    Where: /var/tmp/1

Nov 11 02:50:55 systemd-f19 systemd[1]: Set up automount test automounter 1.
Comment 1 David Strauss 2013-11-11 10:53:44 UTC
Created attachment 89009 [details]
Non-debug boot log with bad Requires=
Comment 2 David Strauss 2013-11-11 10:54:10 UTC
Created attachment 89010 [details]
Debug boot log with bad Requires=
Comment 3 David Strauss 2013-11-11 10:55:20 UTC
Created attachment 89011 [details]
Debug boot log with good Requires=
Comment 4 David Strauss 2013-11-11 10:56:32 UTC
Created attachment 89012 [details]
Non-debug boot log with good Requires=
Comment 5 David Strauss 2013-11-11 10:57:26 UTC
Created attachment 89013 [details]
Screenshot of boot with bad Requires= in VM console
Comment 6 David Strauss 2013-11-11 11:05:15 UTC
This is a regression versus systemd v204, which is what Fedora 19 normally uses.
Comment 7 David Strauss 2013-11-12 04:17:25 UTC
Here are the unit files that should work fine. One thing I've newly noticed is that manually starting the automount from SSH not only makes it start fine; it also starts properly on its own during subsequent boots.

These are not the ones that hang. I'll post those, too.

I have 12 of these: var-tmp-1 through var-tmp-12



[root@systemd-f19 ~]# cat /etc/systemd/system/var-tmp-1.automount
[Unit]
Description=test automounter 1

[Automount]
Where=/var/tmp/1
DirectoryMode=0770

[Install]
WantedBy=multi-user.target



[root@systemd-f19 ~]# cat /etc/systemd/system/var-tmp-1.mount
[Unit]
Requires=var-tmp-1.automount
After=var-tmp-1.automount

[Mount]
What=/usr/bin
Where=/var/tmp/1
Options=bind



[root@systemd-f19 ~]# systemctl status var-tmp-3.automount
var-tmp-3.automount - test automounter 3
   Loaded: loaded (/etc/systemd/system/var-tmp-3.automount; enabled)
   Active: failed (Result: resources)
    Where: /var/tmp/3
Comment 8 David Strauss 2013-11-12 04:26:01 UTC
Here are some units that cause the aforementioned 60-second timeouts in non-debug boots.

I have test-1 through test-8. It's important that you have *more than seven or so* to trigger the bug. It doesn't occur at six or fewer mount/automount unit pairs. The makeitrainfds.service should not exist.



[root@systemd-f19 ~]# cat /etc/systemd/system/test-1.automount
[Unit]
Description=test automounter 1
Requires=makeitrainfds.service
After=makeitraindfds.service

[Automount]
Where=/test/1
DirectoryMode=0770

[Install]
WantedBy=multi-user.target



[root@systemd-f19 ~]# cat /etc/systemd/system/test-1.mount
[Unit]
Requires=test-1.automount
After=test-1.automount

[Mount]
What=/usr/bin
Where=/test/1
Options=bind
Comment 9 David Strauss 2013-12-07 01:15:25 UTC
We're seeing this on v208 as well.
Comment 10 Zbigniew Jedrzejewski-Szmek 2013-12-07 01:26:04 UTC
Does it help if you create /etc/sysctl.d/dgram-qlen.conf with the contents:

net.unix.max_dgram_qlen=1000

?
Comment 11 David Strauss 2013-12-07 02:10:50 UTC
We haven't. We'll try it now.
Comment 12 David Strauss 2013-12-07 02:38:21 UTC
Setting net.unix.max_dgram_qlen=1000 seems to have no effect.
Comment 13 David Strauss 2013-12-08 16:16:26 UTC
We're also seeing this on machines with mounts that have no invalid Requires=. Booting in debug mode still fixes it.
Comment 14 David Strauss 2013-12-08 18:31:43 UTC
Just for context, this is from #systemd on Freenode:
> <zbyszek> davidstrauss: If it works with debug, but not without, I'd guess that it has to do with hanging on journal queue. With debug, we output to kmsg, and straight to journal otherwise. One minute time-outs are also reminiscent of journal hangs. But if it doesn't change anything, than I was probably wrong.

Considering this, I tried booting with systemd.log_target=kmsg. This fixes the issue without enabling debug mode.
Comment 15 David Strauss 2013-12-10 00:53:40 UTC
kmsg turns out to not be a terribly great workaround, given the performance overhead of logging to it. If a bunch of units have, say, configuration values that are no longer valid, logging can be over 60% of systemd's daemon-reload runtime while logging to kmsg, enough to cause timeouts.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.