Summary: | dbus-daemon on Windows: Failed to bind socket "localhost:59367": Address already in use | ||
---|---|---|---|
Product: | dbus | Reporter: | Ralf Habacker <ralf.habacker> |
Component: | core | Assignee: | D-Bus Maintainers <dbus> |
Status: | RESOLVED FIXED | QA Contact: | D-Bus Maintainers <dbus> |
Severity: | normal | ||
Priority: | medium | CC: | thomas.sondergaard |
Version: | 1.8 | Keywords: | patch |
Hardware: | Other | ||
OS: | Windows (All) | ||
Whiteboard: | review? | ||
i915 platform: | i915 features: | ||
Attachments: |
threaded test case
CMake build system file non threaded test case Patch Fix AF_UNSPEC issue Simplified test case Add manual tcp test case Add manual tcp test case (update 1) Fix dbus-daemon on Windows Failed to bind socket [1/2] _dbus_listen_tcp_socket: use NI_NUMERICSERV to determine port number [2/2] _dbus_listen_tcp_socket: comment on another reason to ignore EADDRINUSE _dbus_listen_tcp_socket: Keep windows code to fetch port from tcp listening socket in sync with unix. |
Description
Ralf Habacker
2015-01-03 17:52:15 UTC
Created attachment 111693 [details]
CMake build system file
Created attachment 111694 [details]
non threaded test case
Created attachment 111695 [details] [review] Patch Comment on attachment 111695 [details] [review] Patch Review of attachment 111695 [details] [review]: ----------------------------------------------------------------- ::: dbus/dbus-sysdeps-win.c @@ +1760,5 @@ > to use the same port */ > if (!port || !strcmp(port, "0")) > { > + int result; > + struct sockaddr_storage addr; I'd prefer to keep using mysockaddr_gen, adding a struct sockaddr_storage member to it if necessary... @@ +1762,5 @@ > { > + int result; > + struct sockaddr_storage addr; > + socklen_t addrlen; > + char portbuf[50]; Why 50? Numeric TCP ports fit in 6 bytes ("65535" + '\0') and you could use NI_NUMERICSERV to ensure that it's returned as a number. If we happen to end up on port 5432, say, I'm pretty sure "127.0.0.1:postgresql" is not the form we want. (Perhaps Windows never returns non-numeric service names, I don't know.) @@ +1765,5 @@ > + socklen_t addrlen; > + char portbuf[50]; > + > + addrlen = sizeof (addr); > + result = getsockname (fd, (struct sockaddr*) &addr, &addrlen); ... because then you could use &addr.Address here and not need the cast. @@ +1768,5 @@ > + addrlen = sizeof (addr); > + result = getsockname (fd, (struct sockaddr*) &addr, &addrlen); > + > + if (result == -1 || > + (res = getnameinfo ((struct sockaddr*)&addr, addrlen, NULL, 0, Likewise here. (In reply to Simon McVittie from comment #4) > > + int result; > > + struct sockaddr_storage addr; > > I'd prefer to keep using mysockaddr_gen, adding a struct sockaddr_storage > member to it if necessary... The other half of this review comment is of course > @@ +1765,5 @@ > > + result = getsockname (fd, (struct sockaddr*) &addr, &addrlen); > > ... because then you could use &addr.Address here and not need the cast. this one. Casting to (struct sockaddr*) is, strictly speaking, not allowed in standard C (it violates "strict aliasing" which can lead the compiler to make incorrect optimizations). On gcc, we specifically tell the compiler not to make the assumptions about aliasing that standard C would otherwise allow; but that's not ideal, and if we can easily replace a cast with a union, we should. On non-gcc, it's entirely possible that this sort of thing is silently mis-optimized. (In reply to Simon McVittie from comment #4) > Comment on attachment 111695 [details] [review] [review] > Patch ... > @@ +1762,5 @@ > > { > > + int result; > > + struct sockaddr_storage addr; > > + socklen_t addrlen; > > + char portbuf[50]; > > Why 50? I don't know. You may ask the author of the related parts in dbus-sysdeps-unix.c, from which this fragement is copied into >Numeric TCP ports fit in 6 bytes ("65535" + '\0') and you could use > NI_NUMERICSERV to ensure that it's returned as a number. which is used later (In reply to Ralf Habacker from comment #6) > (In reply to Simon McVittie from comment #4) > > Comment on attachment 111695 [details] [review] [review] [review] > > Patch > ... > > @@ +1762,5 @@ > > > { > > > + int result; > > > + struct sockaddr_storage addr; > > > + socklen_t addrlen; > > > + char portbuf[50]; > > > > Why 50? > > I don't know. You may ask the author of the related parts in > dbus-sysdeps-unix.c, from which this fragement is copied into http://cgit.freedesktop.org/dbus/dbus/tree/dbus/dbus-sysdeps-unix.c#n1508 Does _dbus_listen_tcp_socket() really have to be so complicated in the first place? I'm not really sure why it is doing the whole rebind thing. If it is because of IPv6 there must be a better way to do it. What is required to move the fix for this bug forward? (In reply to Thomas Sondergaard from comment #8) > Does _dbus_listen_tcp_socket() really have to be so complicated in the first > place? "Maybe". The TCP support originally existed (approx. a decade ago) for the benefit of people who were sharing home directories via NFS and displays via X11 and trying to have a shared dbus-daemon for the whole mess; later, it was re-used as the main transport on Windows, because we prefer AF_LOCAL (aka AF_UNIX) sockets but Windows doesn't have those. The core use-cases for D-Bus are that it provides the session and system message buses on modern Linux (and other Unixes like *BSD) over AF_LOCAL sockets, neither of which requires TCP; so I'm afraid TCP is a bit of a second class citizen. (In reply to Thomas Sondergaard from comment #9) > What is required to move the fix for this bug forward? Please adapt the patch to keep using a union instead of violating strict aliasing, and use NI_NUMERICSERV to guarantee getting a number rather than potentially a non-numeric service name. http://msdn.microsoft.com/en-us/library/windows/desktop/ms738532%28v=vs.85%29.aspx says that NI_NUMERICSERV exists on Windows (and in fact also that not using it would usually make this code fail on Windows Server 2003 or older). Optionally, also size the buffer more sensibly for the resulting requirement (5 digits + \0), but I'm willing to leave that bit alone. As for simplifications, I'd review a follow-up patch to simplify, but it looks as though the complexity might in fact be necessary (so my review of that change would be "sorry, but no"), because what we seem to be trying to do here is: * resolve the hostname from the requested address to one or more IPv4 and/or IPv6 addresses; * listen on each of those addresses. Suppose you try to listen on "tcp:host=myserver.example.com" and myserver.example.com resolves to 1.1.1.1, 2.2.2.2, 6666::6666 and 7777::7777 due to some sort of multi-homed arrangement. We want to end up listening on all of 1.1.1.1:54321, 2.2.2.2:54321, [6666::6666]:54321 and [7777::7777]:54321, where 54321 is arbitrary but consistent; so that we can tell our clients "hey I'm listening on tcp:host=myserver.example.com,port=54321" and whichever of the multiple resolutions of our name they choose, they can connect to us. The more likely scenario in practice is that you try to listen on "localhost", which resolves to 127.0.0.1 and maybe also ::1, or on "0.0.0.0", or on "::". However, the code seems to be trying to deal with connecting by arbitrarily complicated hostname for the benefit of the NFS/X11/TCP crowd... this is what you get when features were added a decade ago without the use-cases being made clear :-( Created attachment 112893 [details] [review] Fix AF_UNSPEC issue (In reply to Ralf Habacker from comment #0) > Created attachment 111692 [details] > threaded test case > > Reported by Thomas Sondergaard <ts@medical-insight.com>: > > When I run the attached test programs I get a handful or so of cases where > the dbus-daemon fails to start with a message like > > Failed to bind socket "localhost:59367": Address already in use I got the same error running cross compiled test-dbus-daemon.exe with wine. I rerun after applying your patch, which then returns /creds: Failed to start message bus: Failed to bind socket "localhost:40361": Invalid argument I only get rid of the error message in the test-dbus-daemon case with the 'Fix AF_UNSPEC issue' patch applied (with or without your patch). According to http://dbus.freedesktop.org/doc/dbus-specification.html#transports-tcp-sockets a workaround would be to use the following listen address: tcp:host=localhost,family=ipv4 without any additional patch from this bug applied. Can you check that ? (In reply to Ralf Habacker from comment #12) > (In reply to Ralf Habacker from comment #0) > > Created attachment 111692 [details] ... > I rerun after applying your patch, which then returns > > /creds: Failed to start message bus: Failed to bind socket > "localhost:40361": Invalid argument > This indicates that the dbus code tries to use a ipv6 address, which has been already documentated at bug 61922. (In reply to Simon McVittie from comment #10) > (In reply to Thomas Sondergaard from comment #8) ... > The more likely scenario in practice is that you try to listen on > "localhost", which resolves to 127.0.0.1 and maybe also ::1, or on > "0.0.0.0", or on "::". in the test-dbus-daemon case this is: /creds: _dbus_listen_tcp_socket localhost 0 (null) initial: host 127.0.0.1 port 0 first address: bind to -> host 127.0.0.1 port 6304 => okay second address bind to -> host ::1 port 6304 => fails Failed to start message bus: Failed to bind socket 1 "localhost:6304": Invalid argument (In reply to Ralf Habacker from comment #0) > This is using D-Bus Message Bus Daemon 1.8.10 on Windows. BTW: On which Windows version and variant (32bit /64bit) do you get this error ? (In reply to Simon McVittie from comment #10) > (In reply to Thomas Sondergaard from comment #8) ... > Optionally, also size the buffer more sensibly for the resulting requirement > (5 digits + \0), but I'm willing to leave that bit alone. from https://msdn.microsoft.com/de-de/library/windows/desktop/ms738532%28v=vs.85%29.aspx ... "To simplify determining buffer requirements for the host and serv parameters, the following values for maximum host name length and maximum service name are defined in the Ws2tcpip.h header file." ... #define NI_MAXSERV 32 #define NI_MAXHOST 1025 (In reply to Ralf Habacker from comment #15) > (In reply to Ralf Habacker from comment #0) > > This is using D-Bus Message Bus Daemon 1.8.10 on Windows. > > BTW: On which Windows version and variant (32bit /64bit) do you get this > error ? Windows 7 Pro 64-bit, dbus build as 32-bit with MSVC 2013. Created attachment 112923 [details] [review] Simplified test case The appended patch contains a simplified test case. (In reply to Ralf Habacker from comment #18) > Created attachment 112923 [details] [review] [review] > Simplified test case > > The appended patch contains a simplified test case. Here are a few results: running on opensuse 13.1 x86_64 wine-32bit (1.7.35) 285: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:47288": Address already in use 450: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:44718": Address already in use 640: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:56540": Address already in use 660: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:59881": Address already in use 661: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:48059": Address already in use running on opensuse 13.1 x86_64 wine-64bit (1.7.35) 293: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:59367": Address already in use 411: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:59881": Address already in use 708: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:42919": Address already in use running on Windows 7 32bit 227: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:63993": Address already in use 483: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:64250": Address already in use 738: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:64507": Address already in use 992: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:64764": Address already in use running on Windows 7 64bit 188: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:51657": Address already in use 444: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:51914": Address already in use 699: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:52171": Address already in use 953: org.freedesktop.DBus.Error.Failed Failed to bind socket "127.0.0.1:52428": Address already in use (In reply to Ralf Habacker from comment #12) > /creds: Failed to start message bus: Failed to bind socket > "localhost:40361": Invalid argument For this to be useful information I think we'd need to know the exact parameters to the failing system call: strace would probably be a useful tool. (In reply to Simon McVittie from comment #10) > Optionally, also size the buffer more sensibly for the resulting requirement > (5 digits + \0), but I'm willing to leave that bit alone. (In reply to Ralf Habacker from comment #16) > "To simplify determining buffer requirements for the host and serv > parameters, the following values for maximum host name length and maximum > service name are defined in the Ws2tcpip.h header file." > ... > #define NI_MAXSERV 32 > #define NI_MAXHOST 1025 Making it NI_MAXSERV + 1 bytes long would also be fine. (In reply to Ralf Habacker from comment #19) > Here are a few results: With which combination of patches applied? Comment on attachment 112893 [details] [review] Fix AF_UNSPEC issue Review of attachment 112893 [details] [review]: ----------------------------------------------------------------- If this is helpful I'd be fine with applying it. Bug #61922 apparently breaks IPv6 anyway. Comment on attachment 112923 [details] [review] Simplified test case Review of attachment 112923 [details] [review]: ----------------------------------------------------------------- Obviously if this is just throwaway code it's fine, but if you want to merge it: Please call it manual-tcp or something. I've been trying to be fully consistent about test-* = automated test, manual-* = manual test. Also please add it to test/Makefile.am. If I'm reading the CMake correctly, its Automake self can be the same as manual-dir-iter: manual_dir_iter_SOURCES = manual-dir-iter.c manual_dir_iter_CPPFLAGS = $(static_cppflags) manual_dir_iter_LDADD = $(top_builddir)/dbus/libdbus-internal.la (but with the obvious name change). ::: test/test-tcp.c @@ +13,5 @@ > + for (i = 0; i < 1000; i++) > + { > + dbus_error_init (&error); > + server = _dbus_server_new_for_tcp_socket ("localhost", "localhost", "0", "ipv4", &error, FALSE); > + if (server == 0) minor: absence of a server isn't 0, it's NULL (just for clarity, the result is functionally identical) @@ +18,5 @@ > + printf("%d: %s %s\n",i, error.name, error.message); > + else > + printf("%d: %s \n",i, dbus_server_get_address(server)); > + > + } Please unref the server (if non-NULL) or free the error (otherwise) each time round the loop, if you want to commit this as a useful future thing. (In reply to Simon McVittie from comment #24) > I've been trying to be fully > consistent about test-* = automated test, manual-* = manual test ... and by "manual test" I mean tests that are useless without human interpretation of their results, not just tests that need help to run (although as it happens, both our current manual tests are manual in both ways). (In reply to Simon McVittie from comment #22) > (In reply to Ralf Habacker from comment #19) > > Here are a few results: > > With which combination of patches applied? attachment 112923 [details] [review] only (I hope bugzilla could resolve it, I did not found the doc page describing how to use bugs/atachments keys) Created attachment 112938 [details] [review] Add manual tcp test case - fix of review issues - autotools support Comment on attachment 112938 [details] [review] Add manual tcp test case Review of attachment 112938 [details] [review]: ----------------------------------------------------------------- Build system is fine, but you have committed test-tcp.c (without fixes) instead of manual-tcp.c (with fixes). Created attachment 112939 [details] [review] Add manual tcp test case (update 1) - fix including wrong file Comment on attachment 112939 [details] [review] Add manual tcp test case (update 1) Review of attachment 112939 [details] [review]: ----------------------------------------------------------------- Much better :-) Please merge this to master if you think you'll want this test-case in future. (In reply to Ralf Habacker from comment #19) > (In reply to Ralf Habacker from comment #18) > > Created attachment 112923 [details] [review] [review] [review] > > Simplified test case > > > > The appended patch contains a simplified test case. > > Here are a few results: > > running on opensuse 13.1 x86_64 wine-32bit (1.7.35) > 285: org.freedesktop.DBus.Error.Failed Failed to bind socket > "127.0.0.1:47288": Address already in use > 450: org.freedesktop.DBus.Error.Failed Failed to bind socket > "127.0.0.1:44718": Address already in use > 640: org.freedesktop.DBus.Error.Failed Failed to bind socket > "127.0.0.1:56540": Address already in use > 660: org.freedesktop.DBus.Error.Failed Failed to bind socket > "127.0.0.1:59881": Address already in use > 661: org.freedesktop.DBus.Error.Failed Failed to bind socket > "127.0.0.1:48059": Address already in use Thanks to wine's build in trace support I got the following with: WINEDEBUG=trace+winsock wine bin/manual-tcp.exe > out.log 2>&1 ... ... i=148: trace:winsock:WS_getaddrinfo "localhost", "0" 0x6ffc80 -> 0x6ffc7c 0 trace:winsock:WS_getaddrinfo => 0x1245b0, flags 0x1, family 2, type 1, protocol 6, len 16, name (null), addr { family AF_INET, address 127.0.0.1, port 0 } trace:winsock:WS_socket af=2 type=1 protocol=0 trace:winsock:WSASocketA af=2 type=1 protocol=0 protocol_info=(nil) group=0 flags=0x1 trace:winsock:WSASocketW af=2 type=1 protocol=0 protocol_info=(nil) group=0 flags=0x1 trace:winsock:WSASocketW created 005c trace:winsock:WS_bind socket 005c, ptr 0x1115d0 { family AF_INET, address 127.0.0.1, port 0 }, length 16 -> note: binding to port 0 => bad trace:winsock:WS_listen socket 005c, backlog 30 trace:winsock:WS_getsockname socket: 005c, ptr 0x6ffc60, len 0000001c trace:winsock:WS_getsockname => { family AF_INET, address 127.0.0.1, port 54992 } trace:winsock:WS_getaddrinfo "localhost", "53462" 0x6ffc80 -> 0x6ffc7c 0 trace:winsock:WS_getaddrinfo => 0x1245b0, flags 0x1, family 2, type 1, protocol 6, len 16, name (null), addr { family AF_INET, address 127.0.0.1, port 53462 } trace:winsock:WS_socket af=2 type=1 protocol=0 trace:winsock:WSASocketA af=2 type=1 protocol=0 protocol_info=(nil) group=0 flags=0x1 trace:winsock:WSASocketW af=2 type=1 protocol=0 protocol_info=(nil) group=0 flags=0x1 trace:winsock:WSASocketW created 0060 trace:winsock:WS_bind socket 0060, ptr 0x1115d0 { family AF_INET, address 127.0.0.1, port 53462 }, length 16 -> note: binding to requested port => good trace:winsock:WS_listen socket 0060, backlog 30 trace:winsock:WSAIoctl 005c, _IOW('f', 126, 4), 0x6ffc1c, 4, 0x6ffc1c, 4, 0x6ffbbc, (nil), (nil) trace:winsock:WSAIoctl -> FIONBIO (1) trace:winsock:WSAIoctl 0060, _IOW('f', 126, 4), 0x6ffc1c, 4, 0x6ffc1c, 4, 0x6ffbbc, (nil), (nil) trace:winsock:WSAIoctl -> FIONBIO (1) trace:winsock:WS_closesocket socket 005c trace:winsock:WS_closesocket socket 0060 148: tcp:host=localhost,port=53462,guid=f28c08fde3728dc00372f7c754ca87f3 trace:winsock:WS_getaddrinfo "localhost", "0" 0x6ffc80 -> 0x6ffc7c 0 trace:winsock:WS_getaddrinfo => 0x1245b0, flags 0x1, family 2, type 1, protocol 6, len 16, name (null), addr { family AF_INET, address 127.0.0.1, port 0 } trace:winsock:WS_socket af=2 type=1 protocol=0 trace:winsock:WSASocketA af=2 type=1 protocol=0 protocol_info=(nil) group=0 flags=0x1 trace:winsock:WSASocketW af=2 type=1 protocol=0 protocol_info=(nil) group=0 flags=0x1 trace:winsock:WSASocketW created 005c trace:winsock:WS_bind socket 005c, ptr 0x1115d0 { family AF_INET, address 127.0.0.1, port 0 }, length 16 -> binding again to port 0 trace:winsock:WS_listen socket 005c, backlog 30 trace:winsock:WS_getsockname socket: 005c, ptr 0x6ffc60, len 0000001c trace:winsock:WS_getsockname => { family AF_INET, address 127.0.0.1, port 44461 } trace:winsock:WS_getaddrinfo "localhost", "44461" 0x6ffc80 -> 0x6ffc7c 0 trace:winsock:WS_getaddrinfo => 0x1245b0, flags 0x1, family 2, type 1, protocol 6, len 16, name (null), addr { family AF_INET, address 127.0.0.1, port 44461 } trace:winsock:WS_socket af=2 type=1 protocol=0 trace:winsock:WSASocketA af=2 type=1 protocol=0 protocol_info=(nil) group=0 flags=0x1 trace:winsock:WSASocketW af=2 type=1 protocol=0 protocol_info=(nil) group=0 flags=0x1 trace:winsock:WSASocketW created 0060 trace:winsock:WS_bind socket 0060, ptr 0x1115d0 { family AF_INET, address 127.0.0.1, port 44461 }, length 16 -> binding to requested port trace:winsock:WS_closesocket socket 0060 trace:winsock:WS_closesocket socket 005c 149: org.freedesktop.DBus.Error.Failed Failed to bind socket "localhost:44461": Address already in use conclusion: The error message is misleading, it should be 149: org.freedesktop.DBus.Error.Failed Failed to bind socket "localhost:0": Address already in use According to http://hea-www.harvard.edu/~fine/Tech/addrinuse.html a possible reuse may be delayed, which causes the "address already in use" error. -> This is an implementation error in _dbus_listen_tcp_socket() (In reply to Ralf Habacker from comment #31) .... > According to http://hea-www.harvard.edu/~fine/Tech/addrinuse.html a possible > reuse may be delayed, which causes the "address already in use" error. > > -> This is an implementation error in _dbus_listen_tcp_socket() and here is an answer: http://stackoverflow.com/questions/20789091/get-a-random-port-for-tcp-connection (In reply to Ralf Habacker from comment #31) > trace:winsock:WS_bind socket 005c, ptr 0x1115d0 { family AF_INET, address > 127.0.0.1, port 0 }, length 16 > -> note: binding to port 0 => bad No, on the first time through the loop (when you have no idea what you want), this is absolutely fine - binding to port 0 is precisely how you ask the kernel to give you an arbitrary unused port. > trace:winsock:WS_bind socket 005c, ptr 0x1115d0 { family AF_INET, address > 127.0.0.1, port 0 }, length 16 > > -> binding again to port 0 This is binding socket 0x5C to "127.0.0.1:0", which means "hello kernel, please give me a real port number, I don't care which one". Fine. > trace:winsock:WS_getsockname socket: 005c, ptr 0x6ffc60, len 0000001c > trace:winsock:WS_getsockname => { family AF_INET, address 127.0.0.1, port > 44461 } Next, we use getsockname() find out which port the kernel has given us. The answer turns out to be 44461. > trace:winsock:WSASocketW created 0060 > trace:winsock:WS_bind socket 0060, ptr 0x1115d0 { family AF_INET, address > 127.0.0.1, port 44461 }, length 16 *This* is bad though. We're trying to bind a different socket, 0x60, to exactly the same port that socket 0x5C already has, 127.0.0.1:44461 - there's no way that can possibly work... > 149: org.freedesktop.DBus.Error.Failed Failed to bind socket > "localhost:44461": Address already in use ... and indeed, it fails, with a correct error message - we tried to bind a second to localhost:44461 and it didn't work. > According to http://hea-www.harvard.edu/~fine/Tech/addrinuse.html a possible > reuse may be delayed, which causes the "address already in use" error. Perhaps that is possible, but that isn't the bug seen in your trace. I think what's going on here is this: * First we do the getaddrinfo for the requested address, localhost:0. Let's suppose it returns 127.0.0.1:0 and [::1]:0. * OK, so we go into the while loop, do a socket() (in your trace this is 0x5C), bind it to 127.0.0.1:0 and listen(). Good so far. * Now, under "if (!_dbus_string_get_length(retport))", we do a getsockname() and find that we have been given port 44461. * Now we do a goto back to redo_lookup_with_port, and look up localhost:44461. We get 127.0.0.1:44461 and [::1]:44461 (let's say). * Back into the while loop, and now we try to make two more socket()s, and bind the first (0x60 in your trace) to 127.0.0.1:44461 and the second to [::1]:44461 - but we never get as far as the second, because the first one quite reasonably fails, because we already have our original socket 0x5C listening on that same port! Based on that analysis, what I can't understand is how this function could ever work correctly with port 0 - as far as I can see, it should *always* hit this error! On Unix, we specifically ignore EADDRINUSE with a slightly misleading comment about IPv4 vs. IPv6. It is true that we can conceivably get EADDRINUSE for that reason, but I think it's also masking the fact that we consistently get EADDRINUSE because we're trying to bind to the same IP:port repeatedly and that can never work. Looking at your trace where it *did* work, to try to find out why it could ever succeed, here's something interesting: trace:winsock:WS_listen socket 005c, backlog 30 trace:winsock:WS_getsockname socket: 005c, ptr 0x6ffc60, len 0000001c trace:winsock:WS_getsockname => { family AF_INET, address 127.0.0.1, port 54992 } trace:winsock:WS_getaddrinfo "localhost", "53462" 0x6ffc80 -> 0x6ffc7c 0 trace:winsock:WS_getaddrinfo => 0x1245b0, flags 0x1, family 2, type 1, protocol 6, len 16, name (null), addr { family AF_INET, address 127.0.0.1, port 53462 } WTF? 53462 is not the same as 54992. Why are we looking up localhost:53462? $ python >>> hex(53462) '0xd0d6' >>> hex(54992) '0xd6d0' ... it's because we're not doing the correct ntohs() to fix the endianness of the port number! So the fact that we are getting the endianness of the port number wrong has *mostly* masked the fact that the function's logic is wrong - it fails if and only if both bytes happen to be the same. Sure enough: >>> hex(44461) '0xadad' Your failing case is failing because in this case ntohs(sin_port) == sin_port. Incidentally, I thin I've also just spotted why IPv6 doesn't work: snprintf( portbuf, sizeof( portbuf ) - 1, "%d", addr.AddressIn.sin_port ); Even with the ntohs() added, that line can only be correct for an IPv4 address. If it is really an IPv6 address, you are "type-punning" - accessing the wrong member of a union - and instead of the port number, you're going to get whatever 2 bytes of an IPv6 address happen to be at the same offset as the port number of an IPv4 address. If you change Attachment #111695 [details] as I suggested (use a union instead of casts, and use NI_NUMERICSERV) I think that should indeed fix this, and possibly also Bug #61922. Now that I understand why these changes work, I'm a bit happier about applying Attachment #111695 [details] as-is, if you'll review and test a follow-up patch from me to apply what I suggested in Comment #4 (to both Unix and Windows - the Unix implementation has the same issues) on master. Comment on attachment 112939 [details] [review] Add manual tcp test case (update 1) committed Created attachment 112983 [details] [review] Fix dbus-daemon on Windows Failed to bind socket Comment on attachment 112893 [details] [review] Fix AF_UNSPEC issue committed (In reply to Simon McVittie from comment #33) > (In reply to Ralf Habacker from comment #31) > > trace:winsock:WS_bind socket 005c, ptr 0x1115d0 { family AF_INET, address > > 127.0.0.1, port 0 }, length 16 > > -> note: binding to port 0 => bad > > No, on the first time through the loop (when you have no idea what you > want), this is absolutely fine - binding to port 0 is precisely how you ask > the kernel to give you an arbitrary unused port. you'r right, I thought this trace where on a lower function level inside wine, where the real port is used. ... > Your failing case is failing because in this case ntohs(sin_port) == > sin_port. nice catch :-) > Incidentally, I thin I've also just spotted why IPv6 doesn't work: > > snprintf( portbuf, sizeof( portbuf ) - 1, "%d", addr.AddressIn.sin_port ); > > Even with the ntohs() added, that line can only be correct for an IPv4 > address. If it is really an IPv6 address, you are "type-punning" - accessing > the wrong member of a union - and instead of the port number, you're going > to get whatever 2 bytes of an IPv6 address happen to be at the same offset > as the port number of an IPv4 address. I fixed that hopefully in the updated patch. Comment on attachment 112983 [details] [review] Fix dbus-daemon on Windows Failed to bind socket Review of attachment 112983 [details] [review]: ----------------------------------------------------------------- Looks good, ship it. Please leave the bug open for now, to remind me that I still need to fix the issues pointed out in my earlier review for both Unix and Windows (unless you feel like preparing that patch yourself of course). (In reply to Ralf Habacker from comment #37) > Comment on attachment 112893 [details] [review] > Fix AF_UNSPEC issue > > committed You can probably revert this, now that you've (hopefully) fixed the reason why IPv6 was broken. Comment on attachment 112983 [details] [review] Fix dbus-daemon on Windows Failed to bind socket committed (In reply to Simon McVittie from comment #40) > (In reply to Ralf Habacker from comment #37) > > Comment on attachment 112893 [details] [review] [review] > > Fix AF_UNSPEC issue > > > > committed > > You can probably revert this, now that you've (hopefully) fixed the reason > why IPv6 was broken. Unfortunally it is not fixed. wine bin/manual-tcp.exe ipv6 trace:winsock:DllMain 0x7ec90000 0x1 0x1 trace:winsock:WSAStartup verReq=2 trace:winsock:WSAStartup succeeded starts: 1 trace:winsock:WS_getaddrinfo "localhost", "0" 0x6ffc80 -> 0x6ffc7c 0 trace:winsock:WS_inet_ntop family 23, addr (0x111628), buffer (0x6ffbce), len 46 trace:winsock:WS_getaddrinfo => 0x1115f8, flags 0x1, family 23, type 1, protocol 6, len 28, name (null), addr { family AF_INET6, address ::1, port 0 } trace:winsock:WS_socket af=23 type=1 protocol=0 trace:winsock:WSASocketA af=23 type=1 protocol=0 protocol_info=(nil) group=0 flags=0x1 trace:winsock:WSASocketW af=23 type=1 protocol=0 protocol_info=(nil) group=0 flags=0x1 trace:winsock:WSASocketW created 005c trace:winsock:WS_inet_ntop family 23, addr (0x111628), buffer (0x6ffb7c), len 46 trace:winsock:WS_bind socket 005c, ptr 0x111620 { family AF_INET6, address ::1, port 0 }, length 28 trace:winsock:WS_closesocket socket 005c 0: org.freedesktop.DBus.Error.Failed Failed to bind socket "localhost:0": Invalid argument on linux: test/manual-tcp ipv6 | less0: org.freedesktop.DBus.Error.FileNotFound Failed to lookup host/port: "localhost:0": Name or service not known (-2) (In reply to Ralf Habacker from comment #42) > (In reply to Simon McVittie from comment #40) > > (In reply to Ralf Habacker from comment #37) > > > Comment on attachment 112893 [details] [review] [review] [review] > > > Fix AF_UNSPEC issue > > > > > > committed > > > > You can probably revert this, now that you've (hopefully) fixed the reason > > why IPv6 was broken. > > Unfortunally it is not fixed. > > wine bin/manual-tcp.exe ipv6 > ... > 0: org.freedesktop.DBus.Error.Failed Failed to bind socket "localhost:0": > Invalid argument > > on linux: > > test/manual-tcp ipv6 | > less0: org.freedesktop.DBus.Error.FileNotFound Failed to lookup host/port: > "localhost:0": Name or service not known (-2) ipv6 works on one machine I have access to, on the other machine not. The message indicates the problem that this system do not has ipv6 enabled. I guess a better error message would be useful. (In reply to Ralf Habacker from comment #43) > ipv6 works on one machine I have access to, on the other machine not. The > message indicates the problem that this system do not has ipv6 enabled. I > guess a better error message would be useful. Moved further ipv6 issue fixing to bug 61922. (In reply to Ralf Habacker from comment #36) > Created attachment 112983 [details] [review] [review] > Fix dbus-daemon on Windows Failed to bind socket As this affects basic dbus functionality on windows, any problems with backporting to 1.8 branch ? (In reply to Ralf Habacker from comment #45) > As this affects basic dbus functionality on windows, any problems with > backporting to 1.8 branch ? No problem, please go ahead, after doing whatever testing you feel is appropriate. (In reply to Simon McVittie from comment #39) > Please leave the bug open for now, to remind me that I > still need to fix the issues pointed out in my earlier review for both Unix > and Windows (unless you feel like preparing that patch yourself of course). This part (when done) should not be backported, I don't think. (In reply to Simon McVittie from comment #46) > (In reply to Ralf Habacker from comment #45) > > As this affects basic dbus functionality on windows, any problems with > > backporting to 1.8 branch ? > > No problem, please go ahead, after doing whatever testing you feel is > appropriate. which would be manual-tcp. > (In reply to Simon McVittie from comment #39) > > Please leave the bug open for now, to remind me that I > > still need to fix the issues pointed out in my earlier review for both Unix > > and Windows (unless you feel like preparing that patch yourself of course). > > This part (when done) should not be backported, I don't think. The remaining patches for unix are 1. the comment change (minor) 2. use NI_NUMERICSERV in getnameinfo() without manual-tcp returns something like this on unix (also minor ?): 3: tcp:host=localhost,port=reachout,guid=456aa710a90a078f7c15178754d21233 41: tcp:host=localhost,port=reachout,guid=ee6934d12f069cc09153faf954d21233 480: tcp:host=localhost,port=csccredir,guid=57c65a486dbbfa6906443e4654d21234 648: tcp:host=localhost,port=allpeers,guid=08bfd44456a7d793ee532a2954d21234 930: tcp:host=localhost,port=guttersnex,guid=ce5b90a831cbb0326f6de9f154d21235 (In reply to Ralf Habacker from comment #47) > The remaining patches for unix are > 1. the comment change (minor) > 2. use NI_NUMERICSERV in getnameinfo() without manual-tcp returns something > like this on unix (also minor ?): > > 3: tcp:host=localhost,port=reachout,guid=456aa710a90a078f7c15178754d21233 > 41: tcp:host=localhost,port=reachout,guid=ee6934d12f069cc09153faf954d21233 > 480: tcp:host=localhost,port=csccredir,guid=57c65a486dbbfa6906443e4654d21234 > 648: tcp:host=localhost,port=allpeers,guid=08bfd44456a7d793ee532a2954d21234 > 930: tcp:host=localhost,port=guttersnex,guid=ce5b90a831cbb0326f6de9f154d21235 Yes. Neither of those should go in 1.8 but it would be nice to fix them for master (on both platforms). Created attachment 113156 [details] [review] [1/2] _dbus_listen_tcp_socket: use NI_NUMERICSERV to determine port number If we happen to have been given (say) port 30865, we want to represent that as host=localhost,port=30865 and not host=localhost,port=csync2. Created attachment 113157 [details] [review] [2/2] _dbus_listen_tcp_socket: comment on another reason to ignore EADDRINUSE Not being aware of the second reason described here caused bug #87999 in the equivalent code on Windows. --- I thought I'd have to fix one or both of those in Windows too (and ask you to test it), but I was misremembering: some of the earlier patches had these issues, but the one you eventually merged was fine. So I'm not touching the Windows code right now, and would appreciate a review for the Unix side. If you later do a patch to make the Windows side use getnameinfo() as well, I'd like it to use NI_NUMERICSERV, but that change isn't important. Comment on attachment 113156 [details] [review] [1/2] _dbus_listen_tcp_socket: use NI_NUMERICSERV to determine port number Review of attachment 113156 [details] [review]: ----------------------------------------------------------------- looks good Comment on attachment 113157 [details] [review] [2/2] _dbus_listen_tcp_socket: comment on another reason to ignore EADDRINUSE Review of attachment 113157 [details] [review]: ----------------------------------------------------------------- looks good Created attachment 113159 [details] [review] _dbus_listen_tcp_socket: Keep windows code to fetch port from tcp listening socket in sync with unix. The usage of NI_MAXSERV is according to the example located at https://msdn.microsoft.com/de-de/library/windows/desktop/ms738532%28v=vs.85%29.aspx Comment on attachment 113159 [details] [review] _dbus_listen_tcp_socket: Keep windows code to fetch port from tcp listening socket in sync with unix. Review of attachment 113159 [details] [review]: ----------------------------------------------------------------- ::: dbus/dbus-sysdeps-win.c @@ +1826,4 @@ > > + if (getsockname(fd, &addr.Address, &addrlen) == SOCKET_ERROR || > + (res = getnameinfo (&addr.Address, addrlen, NULL, 0, > + portbuf, NI_MAXSERV, I'd still prefer (..., portbuf, sizeof (portbuf), ...) (even though the numeric value is identical) because that way it's instantly obvious that there can't be any overflow, without having to backtrack to the declaration of portbuf. @@ +1826,5 @@ > > + if (getsockname(fd, &addr.Address, &addrlen) == SOCKET_ERROR || > + (res = getnameinfo (&addr.Address, addrlen, NULL, 0, > + portbuf, NI_MAXSERV, > + NI_NUMERICSERV)) != 0) Strictly speaking, getsockname and getnameinfo have different error semantics - getnameinfo isn't guaranteed to set errno unless it returns EAI_SYSTEM - so they should probably each have their own error-handling block with different inputs to dbus_set_error(). But this is good enough until we fix that in Unix too, which we should do at some point. (In reply to Simon McVittie from comment #54) > Strictly speaking, getsockname and getnameinfo have different error > semantics - getnameinfo isn't guaranteed to set errno unless it returns > EAI_SYSTEM ... but looking at MSDN, Microsoft's getnameinfo has the additional guarantee that it sets the result of WSAGetLastError(), so I think this is wrong on Unix but your patch is fine for Windows. Comment on attachment 113156 [details] [review] [1/2] _dbus_listen_tcp_socket: use NI_NUMERICSERV to determine port number fixed for 1.9.10, thanks Comment on attachment 113157 [details] [review] [2/2] _dbus_listen_tcp_socket: comment on another reason to ignore EADDRINUSE fixed for 1.9.10, thanks Comment on attachment 113159 [details] [review] _dbus_listen_tcp_socket: Keep windows code to fetch port from tcp listening socket in sync with unix. committed to git master with mentioned NI_MAXSERV to sizeof(portbuf) change. *** Bug 88858 has been marked as a duplicate of this bug. *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.