I've sent an email with this problem to the systemd-devel mailing list weeks ago (http://lists.freedesktop.org/archives/systemd-devel/2014-July/021602.html) and never received any feedback, so I'm just going to assume that this is indeed a bug. I'll reproduce what I've said in that email. I run an Archlinux home server connected to a netgear home router. I use a bridge because among other things, the server does KVM virtualization. My problem is that once the DHCP lease on br0 expires, it is not renewed. Here's `ip a` before and after br0 loses the ip. You can see the "valid_lft 42sec" just before the IP is lost. The lease appears to be valid for 24 hours. -------------before--------------- 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UP group default qlen 1000 link/ether bc:5f:f4:90:23:85 brd ff:ff:ff:ff:ff:ff inet6 fe80::be5f:f4ff:fe90:2385/64 scope link valid_lft forever preferred_lft forever 3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b6:29:d8:20:20:11 brd ff:ff:ff:ff:ff:ff inet 192.168.1.33/24 brd 192.168.1.255 scope global dynamic br0 valid_lft 42sec preferred_lft 42sec inet6 fe80::b429:d8ff:fe20:2011/64 scope link valid_lft forever preferred_lft forever 7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UNKNOWN group default qlen 500 link/ether fe:54:00:55:09:68 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe55:968/64 scope link valid_lft forever preferred_lft forever -------------after--------------- 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UP group default qlen 1000 link/ether bc:5f:f4:90:23:85 brd ff:ff:ff:ff:ff:ff inet6 fe80::be5f:f4ff:fe90:2385/64 scope link valid_lft forever preferred_lft forever 3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b6:29:d8:20:20:11 brd ff:ff:ff:ff:ff:ff inet6 fe80::b429:d8ff:fe20:2011/64 scope link valid_lft forever preferred_lft forever 7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UNKNOWN group default qlen 500 link/ether fe:54:00:55:09:68 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe55:968/64 scope link valid_lft forever preferred_lft forever ---------------------------------- Below are the relevant configs. /etc/systemd/network/bridge-data.netdev: ---------------------------------- [NetDev] Name=br0 Kind=bridge ---------------------------------- /etc/systemd/network/bridge-data.network: ---------------------------------- [Match] Name=br0 [Network] DHCP=yes ---------------------------------- /etc/systemd/network/enterprise.network: ---------------------------------- [Match] Name=en* [Network] Bridge=br0 ---------------------------------- I've enabled DEBUG logging for systemd-networkd, but it's not giving me anything useful. Here's what I see at the moment the IP is lost (at 17:44:49) with some context: ---------------------------------- Jul 23 17:39:02 enterprise systemd-networkd[25474]: DHCPv6 CLIENT: Next retransmission in 1min 48.764460s Jul 23 17:40:51 enterprise systemd-networkd[25474]: DHCPv6 CLIENT: Sent SOLICIT Jul 23 17:40:51 enterprise systemd-networkd[25474]: DHCPv6 CLIENT: Next retransmission in 1min 55.518924s Jul 23 17:42:47 enterprise systemd-networkd[25474]: DHCPv6 CLIENT: Sent SOLICIT Jul 23 17:42:47 enterprise systemd-networkd[25474]: DHCPv6 CLIENT: Next retransmission in 2min 4.769016s Jul 23 17:44:49 enterprise systemd-networkd[25474]: br0 : removed address: 192.168.1.33/24 Jul 23 17:44:52 enterprise systemd-networkd[25474]: DHCPv6 CLIENT: Sent SOLICIT Jul 23 17:44:52 enterprise systemd-networkd[25474]: DHCPv6 CLIENT: Next retransmission in 1min 51.959760s Jul 23 17:46:44 enterprise systemd-networkd[25474]: DHCPv6 CLIENT: Sent SOLICIT Jul 23 17:46:44 enterprise systemd-networkd[25474]: DHCPv6 CLIENT: Next retransmission in 2min 5.687352s Jul 23 17:48:49 enterprise systemd-networkd[25474]: DHCPv6 CLIENT: Sent SOLICIT ---------------------------------- As you can see, it just says "br0: removed address:" and that's it. Every now and then (less than once a day) there will be something like this: ---------------------------------- Jul 25 10:11:25 enterprise systemd-networkd[19106]: DHCPv6 CLIENT: Sent SOLICIT Jul 25 10:11:25 enterprise systemd-networkd[19106]: DHCPv6 CLIENT: Next retransmission in 1min 57.978432s Jul 25 10:12:31 enterprise systemd-networkd[19106]: DHCP CLIENT (0x461b140f): REQUEST (rebinding) Jul 25 10:12:31 enterprise systemd-networkd[19106]: DHCP CLIENT (0x461b140f): NAK Jul 25 10:12:31 enterprise systemd-networkd[19106]: DHCP CLIENT (0x461b140f): STOPPED: No lease Jul 25 10:12:31 enterprise systemd-networkd[19106]: br0 : IP address in use. Jul 25 10:13:23 enterprise systemd-networkd[19106]: DHCPv6 CLIENT: Sent SOLICIT Jul 25 10:13:23 enterprise systemd-networkd[19106]: DHCPv6 CLIENT: Next retransmission in 2min 1.184604s ---------------------------------- but it doesn't seem to have any negative effect and it's probably unrelated to the problem. I also don't see anything in dmesg. Other machines on the network are unaffected so I doubt it has anything to do with the router. It's a Realtek onboard RTL8111/8168/8411 adaptor and Linux uses the r8169 driver automatically. I'm aware that there's a dedicated r8168 driver by realtek, but I'm using the same mainboard (same revision) with Arch in another machine (without a bridge) and I never had any problems. Hence, I doubt that the r8169 driver is the problem but I will try using it on occasion. (You can imagine that debugging this problem is a pain because I always have to wait 24h to see if a change had any effect...) This is systemd 215 built on Jul. 8. For now, I've enabled a cron job that checks if the lease is about to expire within the next 1h and restarts systemd-networkd if necessary, but that's just silly.
Sorry for net getting back to you sooner. The fact that you are getting a NAK from your server is what is causing the problem. networkd is asking to renew the lease, but the DHCP server is refusing, so the lease simply expires and the kernel drops the address. A dirty work-around is to mark this as a CriticalConnection. To fix the problem: May I ask for a package dump aronud the time you get the NAK? That way we can verify whether or not the problem is with the request networkd sends to the server, with the server itself or with the way networkd parses the reply. Thanks!
Created attachment 104988 [details] journal
Created attachment 104989 [details] tcpdump
Thanks for your feedback. I've set up some timers inside a tmux session so I could run tcpdump a few seconds before the lease runs out and then restart systemd-networkd a few seconds later. I've attached both the journal and the tcpdump log from around that time. As can be seen in the journal at... ... 17:02:01 the connection went down ... 17:02:18 the timed script restarted systemd-networkd to re-establish connection The tcpdump doesn't show anything happening at 17:02:01, but one second later there's a dhcp6 solicit. The other stuff only happens once systemd-networkd was restarted. I stripped some ssh/nfs related stuff from the tcpdump but I'm 100% sure I didn't trim anything relevant. I'm not sure if this helps. I can post any additional info you need. Thanks again.
Wait, I should have done this for br0 and not enp3s0... Let me do that again and come back later. Sorry about that.
Ok, I'll check back later with the updated version. Just a note: what we are looking for is DHCP traffic, which sholud happen around the time you get the NAK (which is probably 50% or 75% of the way to the lease expiration).
I see. In that case, I'll just let tcpdump run until tomorrow to make sure I catch it. Is it sufficient if I monitor port 67 and 68, i.e. tcpdump -i br0 -n port 67 and port 68 ?
That should do it.
Created attachment 106133 [details] tcpdump-br0 Sorry for the delay. I attached a lengthy dump, hoping that it contains the information you require. I wasn't able to find anything about a NAK in it, though. The log contains close to 24h of packets of the br0 interface. At the beginning of the log (at 16:12:59) I restarted systemd-networkd.
I am pretty sure this works correctly on current systemd. CLosing hence.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.