I made a simple D-BUS server which writes a line into a file when it receives a message and a client which sends 100 messages to the server with the auto-activate flag up in every one of them. The server is not initially running (it's supposed to be activated by the first message). The D-BUS system bus is used and proper .service and .conf files are installed for the server. When the client is run, one of the two things happen: 1) D-BUS system bus crashes (the server is not activated). 2) Another(!) D-BUS system bus appears to ps command's listing and the server is activated, but the server does not receive any of the 100 messages. In case of 2) happened and the client is run again, the server receives all the 100 messages (but there's still two D-BUS system buses running, according to ps).
Created attachment 497 [details] Test server and client + .service and .conf files used
Hmm, works fine for me, using current CVS version and testing with the session bus instead of the system bus.
I tested it again with v0.21 session bus and the same bug is there, so it must be fixed in the CVS version.
Tested this with D-BUS CVS version (dated 21 Jul 04) and strange things happened (server not up, session bus running, only 5 messages with auto-activation flag set): kihamala@mylly:~/dbus-test/act-test-progs$ ./bug_client service bug.svc does not exist message 1 was put into the queue message 2 was put into the queue message 3 was put into the queue message 4 was put into the queue message 5 was put into the queue Message sent kihamala@mylly:~/dbus-test/act-test-progs$ Fd 4 did not have the close-on-exec flag set! Fd 5 did not have the close-on-exec flag set! 22758: assertion failed "sitter != NULL" file "dbus-spawn.c" line 244 Error when connecting to the session bus: Disconnected prior to receiving a reply [1]+ Aborted ./dbus-daemon-1 --session --print-address (wd: ~/dbus-prefix/bin) (wd now: ~/dbus-test/act-test-progs) I.e. the session bus crashed. Strange thing is that the bus does not crash if I comment out line 43 from the client, where I set the auto-activation flag (but of course does not launch the server either). I'll attach the source code soon.
Created attachment 511 [details] test server and client + .service file for activating the server client resided in: /home/kihamala/dbus-test/act-test-progs server resided in (as specified in .service file): /home/kihamala/dbus-test D-BUS daemon (CVS version 21th Jul 04) resided in: /home/kihamala/dbus-prefix/bin Ldd output for bug_client: kihamala@mylly:~/dbus-test/act-test-progs$ ldd bug_client libdbus-1.so.0 => /home/kihamala/dbus-prefix/lib/libdbus-1.so.0 (0x40018000) libdbus-glib-1.so.0 => /home/kihamala/dbus-prefix/lib/libdbus-glib-1.so.0 (0x4007c000) libc.so.6 => /lib/libc.so.6 (0x40092000) libnsl.so.1 => /lib/libnsl.so.1 (0x401c6000) libgobject-2.0.so.0 => /usr/lib/libgobject-2.0.so.0 (0x401db000) libglib-2.0.so.0 => /usr/lib/libglib-2.0.so.0 (0x40214000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
OK, there are several issues. First, we have the same problem as bug #894 , ie the daemon loses the last messages when the client exits. As for the assertion failure, here comes a patch. Then, about the message loss, what's happenning is that, by the time the server has spawn, registered with bus and acquired its service, the client has exited. The problem is that in the case of auto-activation message, policy checking is delayed until the server is up. So when the daemon wants to send the delayed messages with the auto-activation flag, the client connection is dead, so it cannot check the policy and it just bins these messages.
Created attachment 512 [details] [review] fix first hunk is a cheap optimization, second hunk fixes the bug
Created attachment 518 [details] [review] cleaner fix Basically, ignore the HANGUP condition
#$%!£ I posted in the wrong bug. Sorry, that's for bug #894
Comment on attachment 518 [details] [review] cleaner fix marked obsolete to make clearer that this is not a fix to this bug...
Tried out the patch (dated 2004-07-22 08:31) with CVS D-BUS (dated 21 Jul 04, except for patched with the "cleaner fix" for Bug #894) to replicate the test of Comment #4. Now the bus didn't crash but the server does not receive any messages on first run of the client; though, the server gets activated. The server (which is already running) receives all messages when running the client again. Here is my test session (code from the second attachment): kihamala@mylly:~/dbus-test/act-test-progs$ /home/kihamala/dbus-patched-prefix/bi n/dbus-daemon-1 --session --print-address & [6] 17565 unix:abstract=/tmp/dbus-pVqTHdAMW7 kihamala@mylly:~/dbus-test/act-test-progs$ export DBUS_SESSION_BUS_ADDRESS=unix: abstract=/tmp/dbus-pVqTHdAMW7 kihamala@mylly:~/dbus-test/act-test-progs$ ps xau | grep dbus message 218 0.0 0.0 2064 968 ? Ss Jul23 0:00 /usr/bin/dbus-dae mon-1 --system kihamala 17565 0.0 0.0 2200 788 pts/0 S 16:26 0:00 /home/kihamala/db us-patched-prefix/bin/dbus-daemon-1 --session --print-address kihamala@mylly:~/dbus-test/act-test-progs$ cat /tmp/bug_svc.out cat: /tmp/bug_svc.out: No such file or directory kihamala@mylly:~/dbus-test/act-test-progs$ ./bug_client service bug.svc does not exist message 1 was put into the queue message 2 was put into the queue message 3 was put into the queue message 4 was put into the queue message 5 was put into the queue Message sent kihamala@mylly:~/dbus-test/act-test-progs$ Fd 4 did not have the close-on-exec f lag set! Fd 5 did not have the close-on-exec flag set! going to the main loop kihamala@mylly:~/dbus-test/act-test-progs$ cat /tmp/bug_svc.out cat: /tmp/bug_svc.out: No such file or directory kihamala@mylly:~/dbus-test/act-test-progs$ ps xau | grep dbus message 218 0.0 0.0 2064 968 ? Ss Jul23 0:00 /usr/bin/dbus-dae mon-1 --system kihamala 17565 0.0 0.0 2304 1128 pts/0 S 16:26 0:00 /home/kihamala/db us-patched-prefix/bin/dbus-daemon-1 --session --print-address kihamala 17571 0.0 0.0 2748 936 pts/0 S 16:27 0:00 /home/kihamala/db us-test/bug_svc kihamala@mylly:~/dbus-test/act-test-progs$ ./bug_client service bug.svc exists message 1 was put into the queue message 2 was put into the queue message 3 was put into the queue message 4 was put into the queue message 5 was put into the queue Message sent kihamala@mylly:~/dbus-test/act-test-progs$ cat /tmp/bug_svc.out the string argument: 'hello world 1' the string argument: 'hello world 2' the string argument: 'hello world 3' the string argument: 'hello world 4' the string argument: 'hello world 5' kihamala@mylly:~/dbus-test/act-test-progs$ ps xau | grep dbus message 218 0.0 0.0 2064 968 ? Ss Jul23 0:00 /usr/bin/dbus-dae mon-1 --system kihamala 17565 0.0 0.0 2304 1128 pts/0 S 16:26 0:00 /home/kihamala/db us-patched-prefix/bin/dbus-daemon-1 --session --print-address kihamala 17571 0.0 0.0 2748 948 pts/0 S 16:27 0:00 /home/kihamala/db us-test/bug_svc kihamala@mylly:~/dbus-test/act-test-progs$
Yes, that's because of the policy checking: the client exits before the server is up. It would work fine if your client were listening for replies of the method calls or if it waited for the ServiceCreated signal before exiting.
committed the fix for the assertion failure. Ensuring messages do not get lost is a bit harder, maybe we could split bus_context_check_security_policy in two ?
Comment on attachment 512 [details] [review] fix this is obsolete right? marking it so
I'm not sure honestly how to fix this one. You have to fully dispatch the messages from a connection before you let that connection be shut down. We could try just popping each message off the connection and dispatching it at the top of bus_connection_disconnected(). Seems like it could create some pretty scary reentrancy issues though. It'd be good to have test suite coverage for "connection disconnecting while all its messages have not been dispatched"
Hm, if the problem is that we can't check the message policy due to the sender having gone away - might a solution be to cache connection credentials until all messages sent from that connection are destroyed?
I think this has been fixed when we fixed activation. Please confirm.
I think this is still unsolved (see Havoc's last comment). However, this is easy to workaround by making sure that the client does not quit too quickly after sending the messages, but e.g. sleeps few seconds before exiting.
Finally I took the time to test this in (slightly modified) v0.61 DBus. The messages are still getting lost, but as the workaround (sleeping) is quite simple and unproblematic, this bug could be marked as WONTFIX. This shows that when the service was not yet running (and was activated), it did not receive any of the sent messages. When it was already running, it received all five messages. ---- Nokia770-10:~# ps xau | grep bug_svc 943 root 432 S grep bug_svc Nokia770-10:~# ./bug_client sending message 0 sending message 1 sending message 2 sending message 3 sending message 4 Nokia770-10:~# going to the main loop iface: org.freedesktop.DBus, member: NameAcquired, path: /org/freedesktop/DBus iface: org.freedesktop.DBus, member: NameAcquired, path: /org/freedesktop/DBus Nokia770-10:~# ps xau | grep bug_svc 946 messageb 864 S /root/bug_svc 948 root 432 S grep bug_svc Nokia770-10:~# ./bug_client sending message 0 iface: foo.bar, member: foobar, path: /bug/svc sending message 1 iface: foo.bar, member: foobar, path: /bug/svc sending message 2 iface: foo.bar, member: foobar, path: /bug/svc sending message 3 iface: foo.bar, member: foobar, path: /bug/svc sending message 4 iface: foo.bar, member: foobar, path: /bug/svc Nokia770-10:~#
Created attachment 4976 [details] new test programs I used these test programs.
provided the info, so setting another state
this is a pretty bad race really, potentially breaks any dbus-send - john you might consider it for your blocker bug
(In reply to comment #22) > this is a pretty bad race really, potentially breaks any dbus-send - john you > might consider it for your blocker bug Yes, indeed this makes dbus-send unreliable unless it has the '--print-reply' option. I just spent two days to realise this.
So - this is badly hurting NetworkManager reliability for us too - we have rather a number of duplicates of this underlying issue from using dbus-send. The option of 'sleeping a while' before closing the connection is never the right solution :-) I'm working on a patch - also, cf. my description of the problem here: http://lists.freedesktop.org/archives/dbus/2008-March/009526.html
Created attachment 24013 [details] [review] Implement-org.freedesktop.DBus.Peer-for-th.patch We want the bus to reply to the Peer interface for obvious reasons. To implement, refactor the code in dbus-connection so that we have a (internal public) method to process a message and create the reply. The actual sending function differs from bus-internal versus from a DBusConnection, so we can't just call the peer filter directly. Create a test too.
Created attachment 24014 [details] [review] Have-dbus-send-always-wait-until-message-i.patch For method calls, we simply wait for the reply, even if we don't print it. For signals, we send an org.freedesktop.DBus.Peer.Ping message to the destination. Note as a side effect of this, the dbus-send exit code will now be nonzero if the remote side replies with an error, even if it's not printed. This is a desirable change, but could in theory break a script. I am of the opinion the script is buggy if it's broken by this.
Created attachment 24016 [details] [review] this version has the new test Forgot to git add the test.
Created attachment 24019 [details] [review] Ignore-EPIPE-on-writes.patch From 7a4fd0fb71e698f3c8a04a245df9bf0187c45945 Mon Sep 17 00:00:00 2001 From: Colin Walters <walters@verbum.org> Date: Wed, 18 Mar 2009 19:26:33 -0400 Subject: [PATCH] Bug 896 - Ignore EPIPE on writes http://lists.freedesktop.org/archives/dbus/2008-March/009526.html We can get an EPIPE if the remote end has closed the connection on a write. However, it's possible we still have messages to read (and the socket can still be valid for reading). This is referred to as a "half open" connection. It turns out that things are correct if we simply ignore the EPIPE. We actually want to treat the connection as closed only when reading is complete (i.e. we get a hangup/EOF).
Created attachment 24020 [details] [review] ignore sigpipe This patch is from http://lists.freedesktop.org/archives/dbus/2008-March/009531.html it need updating to the latest code.
Actually Avery claims (in http://lists.freedesktop.org/archives/dbus/2008-March/009544.html ) that something closer to my smaller patch will work. Opinions? Needs tests I know...I'll look at that.
Hi Colin, thanks for the CC. Of course - I prefer my patch, that (at least attempts) to get the underlying bug fixed right [ no doubt band-aiding this one short-lived client will cause grief for others in the future ]. My second post, even had a hacked up regression test [ which needed some love ]: http://lists.freedesktop.org/archives/dbus/2008-March/009543.html We don't ship my patch, and it clearly needed testing - but hopefully shows what is needed I guess. Happy to help test out whatever ultimate fix you come up with.
Created attachment 25304 [details] [review] updated version of Michael's patch (In reply to comment #31) > Hi Colin, thanks for the CC. > Of course - I prefer my patch, that (at least attempts) to get the underlying > bug fixed right [ no doubt band-aiding this one short-lived client will cause > grief for others in the future ]. By his smaller patch, I think he meant the one that just ignored EPIPE on writes, not the dbus-send patch that makes an extra method call. (I gather that the portion to implement the Peer interface on the bus daemon has already been applied though.) However, Colin's patch didn't work when I tested it; we also need the change you made to avoid closing the transport on hangup when the read watch is still active. Here's a patch which updates yours to apply to the current code, but with Colin's _dbus_get_is_errno_epipe() since that seemed a bit nicer style-wise. I haven't looked at the regression test.
Created attachment 27579 [details] failing trace with revents=POLLIN|POLLERR|POLLHUP This patch doesn't look to be quite enough. I'm able to reproduce this attached failing trace with it. My setup is a "strace -s 4000 -o /tmp/dbus.strace" on the bus, and then a while loop running: while true; do sleep 0.1; dbus-send --dest=org.freedesktop.DBus /org/freedesktop/DBus org.freedesktop.Introspectable.Introspect; done If the bus reads the message, I count that as success. System is Fedora 11, Linux megatron 2.6.29.5-191.fc11.x86_64, libc-2.10.1-2.x86_64. This is a different race though...
Created attachment 27585 [details] [review] 0001-Bug-896-Avoid-race-conditions-reading-message-from.patch This patch looks like it closes both races for me. Comments appreciated...let's get this fixed (sorry for being really slow on this bug).
Looks sane to me
Pushed the patch.
Whoot - thanks Colin, you rock :-) I guess we can bin the sillies in dbus-send now - I believe we added an unnecessary 'ping' after async methods there. As a more advanced question - it was unclear to me quite why we needed two sockets per client, instead of one - more to poll and fiddle with; ORBit2 makes do with one, and that makes this case a simple ordering problem in the poll handler rather than something more complex like this. But anyway, glad to see it fixed.
The dbus-send patch was never committed AFAICS, only proposed. Two sockets? I believe we only have one socket. Or you mean the two poll records? I agree that's unfortunate; in a quick glance I think it's an artifact of an attempt to handle threading issues. Havoc would probably know for sure.
> The dbus-send patch was never committed AFAICS, only proposed. Good 'oh. > Two sockets? I believe we only have one socket. Or you mean the two poll > records? I agree that's unfortunate; in a quick glance I think it's an > artifact of an attempt to handle threading issues. Havoc would probably > know for sure. Hmm ;-) I don't believe there needs to be any advantage wrt. threading issues, and there is certainly some performance hit both in code complexity, cost in allocating two of these, and cost in polling the same fd twice (~halving the peak performance there). It is not as if the two watches even have different callbacks: static dbus_bool_t socket_connection_set (DBusTransport *transport) { DBusTransportSocket *socket_transport = (DBusTransportSocket*) transport; _dbus_watch_set_handler (socket_transport->write_watch, _dbus_connection_handle_watch, transport->connection, NULL); _dbus_watch_set_handler (socket_transport->read_watch, _dbus_connection_handle_watch, transport->connection, NULL); Commit comments like: 2004-11-25 Havoc Pennington <hp@redhat.com> The primary change here is to always write() once before adding the write watch, which gives us about a 10% performance increase. Make this sound like a performance issue too; in theory adding a write watch should involve no allocation, no heavy lifting, but just be twiddling a single bit in a bit-mask of an already existing read/write watch record (surely) ? But anyhow ... ;-) perhaps something to investigate for those wanting to improve dbus performance.
I tested on Fremantle, using DBus 1.2.14-0maemo4+0m5, and I can still reproduce this bug: When the application is not pre-started, using dbus-send does activate the app, but don't call the method, unless I give --print-reply option. Reopening as Will asked.
Don't know exactly if its the same bug. But messages *are* still lost when the client exits early. These commands don't work: dbus-send --system --dest=org.freedesktop.NetworkManager \ /org/freedesktop/NetworkManager org.freedesktop.DBus.Properties.Set \ string:'org.freedesktop.NetworkManager' string:'WirelessEnabled' \ variant:boolean:false dbus-send --system --dest=org.freedesktop.NetworkManager \ /org/freedesktop/NetworkManager org.freedesktop.NetworkManager.Sleep \ boolean:false Adding '--print-reply' helps. For the reference, the patch from comment #34 is committed here: http://cgit.freedesktop.org/dbus/dbus/commit/?id=87ddff6b24d9b9d4bba225c33890db25022d8cbe So it doesn't fix the bug. $ rpm -qv dbus dbus-1.2.24-1.fc13.x86_64
It seems like this bug may have ended up mixing up some different issues. One bug (that the patches tried to fix) is failure to _read_ from a connection that has had a write failure. This is a problem too. However the other issue (I don't see a patch for) is that once a connection's socket is closed, which now with the fixes means all messages have been _read_, the messages have not all been _dispatched_. When dispatching a message I guess the bus tries to get unix credentials, which are not available on closed connections. So maybe that's why people still see issues?
Is there any progress? It seems this bug causes that gvfs-open doesn't work correctly, see: https://bugzilla.gnome.org/show_bug.cgi?id=746534 It seems g_dbus_connection_flush() after async call isn't enough.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/dbus/dbus/issues/1.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.