Bug 79789 - systemd-networkd assertion failure with Xen vif
Summary: systemd-networkd assertion failure with Xen vif
Status: RESOLVED FIXED
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium normal
Assignee: Tom Gundersen
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-08 08:40 UTC by Steven Noonan
Modified: 2014-06-19 18:44 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
valgrind of systemd-networkd (38.01 KB, text/plain)
2014-06-19 17:30 UTC, Steven Noonan
Details
patch to fix refcounting (812 bytes, patch)
2014-06-19 17:43 UTC, Steven Noonan
Details | Splinter Review

Description Steven Noonan 2014-06-08 08:40:55 UTC
I've got a Xen domU running with systemd 213, set up to use systemd-networkd for initializing networking at boot time. Networking never comes up. journalctl shows that systemd-networkd had an assertion failure:

systemd[1]: Starting Network Service...
systemd-networkd[29143]:             eth0: gained carrier
systemd-networkd[29143]:             eth0: DHCPv4 address 10.0.8.120/19 via 10.0.0.1
systemd-networkd[29143]: [69B blob data]
systemd-networkd[29143]: Assertion 'link->manager->rtnl' failed at src/network/networkd-address.c:241, function address_configure(). Aborting.
systemd[1]: Started Network Service.
systemd[1]: systemd-networkd.service: main process exited, code=dumped, status=6/ABRT
systemd[1]: Unit systemd-networkd.service entered failed state.
systemd[1]: systemd-networkd.service has no holdoff time, scheduling restart.


Installed a build of the same systemd revision with debug symbols (compiled with -O0 -ggdb, split debug symbols) and reproduced the issue. Here's the backtrace:


(gdb) thread apply all bt full

Thread 1 (Thread 0x7f7a4a85d780 (LWP 3823)):
#0  0x00007f7a49c95d67 in raise () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007f7a49c97118 in abort () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00007f7a4a8bb7a9 in log_assert_failed (text=0x7f7a4a8f4bcb "link->manager->rtnl", file=0x7f7a4a8f4b38 "src/network/networkd-address.c", line=241, func=0x7f7a4a8f4fd0 <__PRETTY_FUNCTION__.11501> "address_configure") at src/shared/log.c:709
No locals.
#3  0x00007f7a4a8d32d5 in address_configure (address=0x7f7a4b3f3100, link=0x7f7a4b3f6a30, callback=0x7f7a4a8c663d <address_handler>) at src/network/networkd-address.c:241
        req = 0x0
        r = 32767
        __PRETTY_FUNCTION__ = "address_configure"
        __func__ = "address_configure"
#4  0x00007f7a4a8c6fdf in link_enter_set_addresses (link=0x7f7a4b3f6a30) at src/network/networkd-link.c:581
        address = 0x7f7a4b3f3100
        addr = {s_addr = 2013790218}
        netmask = {s_addr = 14745599}
        prefixlen = 19
        ad = 0x0
        r = 0
        __PRETTY_FUNCTION__ = "link_enter_set_addresses"
        __func__ = "link_enter_set_addresses"
#5  0x00007f7a4a8c87a3 in dhcp_lease_acquired (client=0x7f7a4b3fced0, link=0x7f7a4b3f6a30) at src/network/networkd-link.c:945
        lease = 0x7f7a4b3fba20
        address = {s_addr = 2013790218}
        netmask = {s_addr = 14745599}
        gateway = {s_addr = 16777226}
        prefixlen = 19
        r = -107
        __PRETTY_FUNCTION__ = "dhcp_lease_acquired"
        __func__ = "dhcp_lease_acquired"
#6  0x00007f7a4a8c8ae0 in dhcp_handler (client=0x7f7a4b3fced0, event=2, userdata=0x7f7a4b3f6a30) at src/network/networkd-link.c:1003
        link = 0x7f7a4b3f6a30
        r = 0
        __PRETTY_FUNCTION__ = "dhcp_handler"
        __func__ = "dhcp_handler"
#7  0x00007f7a4a8d8b2f in client_notify (client=0x7f7a4b3fced0, event=2) at src/libsystemd-network/sd-dhcp-client.c:200
No locals.
#8  0x00007f7a4a8db879 in client_handle_message (client=0x7f7a4b3fced0, message=0x7f7a4b3f978c, len=300) at src/libsystemd-network/sd-dhcp-client.c:1146
        r = 0
        notify_event = 2
        __PRETTY_FUNCTION__ = "client_handle_message"
        __func__ = "client_handle_message"
#9  0x00007f7a4a8dbe02 in client_receive_message_raw (s=0x7f7a4b3fb930, fd=11, revents=1, userdata=0x7f7a4b3fced0) at src/libsystemd-network/sd-dhcp-client.c:1270
        client = 0x7f7a4b3fced0
        packet = 0x7f7a4b3f9770
        cmsgbuf = "$\000\000\000\000\000\000\000\a\001\000\000\b\000\000\000\001\000\000\000H\001\000\000H\001\000\000\000\000\000\000\000\000\000"
        iov = {iov_base = 0x7f7a4b3f9770, iov_len = 328}
        msg = {msg_name = 0x0, msg_namelen = 0, msg_iov = 0x7fff9c80e3d0, msg_iovlen = 1, msg_control = 0x7fff9c80e420, msg_controllen = 36, msg_flags = 0}
        cmsg = 0x7fff9c80e420
        checksum = true
        buflen = 328
        len = 300
        r = 0
        __PRETTY_FUNCTION__ = "client_receive_message_raw"
        __func__ = "client_receive_message_raw"
#10 0x00007f7a4a88efbc in source_dispatch (s=0x7f7a4b3fb930) at src/libsystemd/sd-event/sd-event.c:2001
        r = 0
        __PRETTY_FUNCTION__ = "source_dispatch"
        __func__ = "source_dispatch"
#11 0x00007f7a4a88fe9d in sd_event_run (e=0x7f7a4b3f10c0, timeout=18446744073709551615) at src/libsystemd/sd-event/sd-event.c:2285
        ev_queue = 0x7fff9c80e4b0
        ev_queue_max = 11
        p = 0x7f7a4b3fb930
        r = 0
        i = 1
        m = 1
        __PRETTY_FUNCTION__ = "sd_event_run"
#12 0x00007f7a4a88ffb3 in sd_event_loop (e=0x7f7a4b3f10c0) at src/libsystemd/sd-event/sd-event.c:2304
        r = 1
        __PRETTY_FUNCTION__ = "sd_event_loop"
#13 0x00007f7a4a876728 in main (argc=1, argv=0x7fff9c80e708) at src/network/networkd.c:100
        m = 0x7f7a4b3f1010
        r = 1
        __func__ = "main"


Any ideas or further information needed?
Comment 1 mike@marineau.org 2014-06-15 00:40:10 UTC
I am seeing this assertion as well in both systemd 213 and 214 on Google Compute Engine.
Comment 2 Tom Gundersen 2014-06-19 16:19:51 UTC
This looks like a memory corruption to me. Any chance you can run it through valgrind? (valgrind /lib/systemd/systemd-networkd)?
Comment 3 Steven Noonan 2014-06-19 17:20:19 UTC
Interesting. I had trouble reproducing it again until I added this to my .network file:

[DHCPv4]
UseMTU=yes

I'll valgrind the result in a moment...
Comment 4 Steven Noonan 2014-06-19 17:30:25 UTC
Created attachment 101368 [details]
valgrind of systemd-networkd

Attached valgrind log for systemd-networkd.
Comment 5 Steven Noonan 2014-06-19 17:34:23 UTC
Well that explains why UseMTU was necessary. When setting the MTU it freed the data structure, which then caused the hostname setup to break it.

==5269== Invalid read of size 8
==5269==    at 0x161953: link_set_hostname (networkd-link.c:690)
==5269==    by 0x16296D: dhcp_lease_acquired (networkd-link.c:942)
==5269==    by 0x162D18: dhcp_handler (networkd-link.c:1007)
==5269==    by 0x172E35: client_notify (sd-dhcp-client.c:200)
==5269==    by 0x175BC3: client_handle_message (sd-dhcp-client.c:1146)
==5269==    by 0x17614E: client_receive_message_raw (sd-dhcp-client.c:1270)
==5269==    by 0x12902F: source_dispatch (sd-event.c:2001)
==5269==    by 0x129F1B: sd_event_run (sd-event.c:2285)
==5269==    by 0x12A031: sd_event_loop (sd-event.c:2304)
==5269==    by 0x110727: main (networkd.c:100)
==5269==  Address 0x5a86d30 is 0 bytes inside a block of size 144 free'd
==5269==    at 0x4C2999C: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5269==    by 0x15F43A: link_free (networkd-link.c:132)
==5269==    by 0x15F477: link_unref (networkd-link.c:137)
==5269==    by 0x161F1A: link_set_mtu (networkd-link.c:768)
==5269==    by 0x1628BF: dhcp_lease_acquired (networkd-link.c:930)
==5269==    by 0x162D18: dhcp_handler (networkd-link.c:1007)
==5269==    by 0x172E35: client_notify (sd-dhcp-client.c:200)
==5269==    by 0x175BC3: client_handle_message (sd-dhcp-client.c:1146)
==5269==    by 0x17614E: client_receive_message_raw (sd-dhcp-client.c:1270)
==5269==    by 0x12902F: source_dispatch (sd-event.c:2001)
==5269==    by 0x129F1B: sd_event_run (sd-event.c:2285)
==5269==    by 0x12A031: sd_event_loop (sd-event.c:2304)
Comment 6 Steven Noonan 2014-06-19 17:36:22 UTC
I wonder if this is the issue?

diff --git a/src/network/networkd-link.c b/src/network/networkd-link.c                                               
index 6677b94..88c224d 100644                                                                                        
--- a/src/network/networkd-link.c                                                                                    
+++ b/src/network/networkd-link.c                                                                                    
@@ -761,7 +761,7 @@ static int link_set_mtu(Link *link, uint32_t mtu) {                                              
                 return r;                                                                                           
         }                                                                                                           
 
-        link_unref(link);
+        link_ref(link);
 
         return 0;
 }
Comment 7 Steven Noonan 2014-06-19 17:43:35 UTC
Created attachment 101369 [details] [review]
patch to fix refcounting

Yeah, looks like there's just a refcounting error there. Please 'git am' the attached if it passes muster.
Comment 8 Lennart Poettering 2014-06-19 18:40:57 UTC
Indeed. applied the patch now. thanks! does everything work now, can the bug be closed?
Comment 9 Steven Noonan 2014-06-19 18:44:33 UTC
I tested with this patch and the issue went away, so yes. Closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.