Summary: | systemd-networkd: VLAN on bond fails due to incorrect order of enslavement | ||
---|---|---|---|
Product: | systemd | Reporter: | Malte Starostik <bugs> |
Component: | general | Assignee: | Tom Gundersen <teg> |
Status: | RESOLVED NOTOURBUG | QA Contact: | systemd-bugs |
Severity: | normal | ||
Priority: | medium | CC: | jay |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Malte Starostik
2014-03-12 12:55:31 UTC
We were able to reproduce this bug, we were also able to solve it by doing a one-shot exec before systemd-networkd runs where we did a modprobe 8021q. Try that and see if it resolves the issue. If so, it's just dependant on loading the correct support. If loading the module manually solves the problem, then a newer kernel should hopefully fix it. If not, can you confirm that the problem is still precent in a recent kernel? Force-loading 8021q and/or bonding doesn't help. In the odd case it works, most of times it doesn't. Tested with 3.12.20, will try with 3.14 tomorrow to check if that does any good. Nope, Just checked with 3.14.5 and still the same. Whether bonding and/or 8021q modules are explicitly loaded in modules-load.d or not, there's still a race. If the NICs are enslaved to the bond before creating VLANs it works (the rare case). In most of the cases, the VLAN creation precedes the enslavement of the NICs, and that still failes with: kernel: 8021q: VLANs not supported on lan and the VLAN interfaces are missing. Trying this manually confirms that VLANs cannot be created on a bond before it has any slaves: # ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 19: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:54 brd ff:ff:ff:ff:ff:ff 20: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:55 brd ff:ff:ff:ff:ff:ff 21: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:56 brd ff:ff:ff:ff:ff:ff 22: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:57 brd ff:ff:ff:ff:ff:ff # ip link add test type bond # ip link add test950 link test type vlan id 950 8021q: VLANs not supported on test RTNETLINK answers: Operation not supported # ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 19: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:54 brd ff:ff:ff:ff:ff:ff 20: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:55 brd ff:ff:ff:ff:ff:ff 21: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:56 brd ff:ff:ff:ff:ff:ff 22: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:57 brd ff:ff:ff:ff:ff:ff 23: test: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT link/ether d8:9d:67:21:29:54 brd ff:ff:ff:ff:ff:ff # ip link set eno1 master test IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready bonding: test: enslaving eno1 as a backup interface with a down link. tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex tg3 0000:03:00.0 eno1: Flow control is off for TX and off for RX tg3 0000:03:00.0 eno1: EEE is disabled # ip link add test950 link test type vlan id 950 # ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 19: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master test state UP mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:54 brd ff:ff:ff:ff:ff:ff 20: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:55 brd ff:ff:ff:ff:ff:ff 21: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:56 brd ff:ff:ff:ff:ff:ff 22: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether d8:9d:67:21:29:57 brd ff:ff:ff:ff:ff:ff 23: test: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT link/ether d8:9d:67:21:29:54 brd ff:ff:ff:ff:ff:ff 24: test950@test: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT link/ether d8:9d:67:21:29:54 brd ff:ff:ff:ff:ff:ff This behaviour can be reproduced after deleting and recreating the bond. Finally got around to looking into this properly. It appears Malte's analysis in comment #4 is spot on. The situation is that VLAN's cannot be created on devices that has the feature NETIF_F_VLAN_CHALLENGED (some feature...) set. Annoyingly that includes empty bonds (but I don't know why[0]), and I have so far not figured out an elegant way for us to get notified about when this flag changes. I think we should do one of two things (preferably both): 1) hook up the feature flag with rtnetlink so we get notified about changes 2) figure out if empty bonds can be made to work with vlans (then the order will no longer matter). We could attempt to work around this in networkd by trying to add the VLAN every time the number of slaves of a bond changes, but that feels too hacky (I have already applied such hacks only to revert them four times in the past few months), so I'd really prefer if we could fix this in the kernel. [0]: <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/bonding/bond_main.c#n1005> I can confirm this bug. Workaround in my environment currently is writing out configs, restarting systemd-networkd, then waiting for 15 seconds, and restarting systemd-networkd again. This is being fixed in the kernel. Explanation and patch can be found in this thread: <http://www.spinics.net/lists/netdev/msg287573.html>. Closing for now, but please reopen if any action is required on the systemd side. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.