dbus-daemon has this logic for replies to method calls: * suppose :1.0 sends a method call to :1.1 * if :1.1 replies to that method call (with success or an error), everything's fine * if :1.1 disconnects without replying, synthesize and send a NoReply error * if reply_timeout > 0 and :1.1 does not reply within reply_timeout milliseconds, synthesize and send a NoReply error Since 2009, the default for reply_timeout is -1 (Bug #16841), so in practice there is no timeout. However, the message in the NoReply error is still "Message did not receive a reply (timeout by message bus)", which is misleading when it appears within milliseconds of the call (<http://bugs.debian.org/597133>).
Created attachment 95717 [details] [review] Use a better NoReply message for disconnection with reply pending As an implementation detail, dbus-daemon handles this situation by artificially triggering a timeout (even if its configured timeout for method calls is in fact infinite). However, using the same debug message for both is misleading, and can lead people who are debugging a service crash to blame dbus-daemon instead, wasting their time. --- Tested on master, but I'd like to apply this to dbus-1.8 too, since it's low-risk and is beneficial for debugging.
Reviews/feedback, anyone? (People who like kdbus: what messages would be produced there, and are they any better than mine?)
Patch looks good. In libsystemd-bus we do the same thing: the kernel will report two different notifications to us depending on timeout or peer dying. However, to stay compatible we make a NoReply bus error from that, but do use different messgae strings. All good from me.
I'm getting intermittent test failures, so the actual implementation is probably fine, but the tests might not be. Here's one such failure: ERROR:/home/smcv/src/fdo/dbus/test/dbus-daemon.c:408:test_no_reply: assertion failed (f->e.name == DBUS_ERROR_NO_REPLY): ("org.freedesktop.DBus.Error.ServiceUnknown" == "org.freedesktop.DBus.Error.NoReply") so that will need some debugging.
Comment on attachment 95717 [details] [review] Use a better NoReply message for disconnection with reply pending Review of attachment 95717 [details] [review]: ----------------------------------------------------------------- connection.c looks good, I can't comment on the test because I don't understand the test framework.
(In reply to comment #4) > I'm getting intermittent test failures, so the actual implementation is > probably fine, but the tests might not be. Here's one such failure: > > ERROR:/home/smcv/src/fdo/dbus/test/dbus-daemon.c:408:test_no_reply: > assertion failed (f->e.name == DBUS_ERROR_NO_REPLY): > ("org.freedesktop.DBus.Error.ServiceUnknown" == > "org.freedesktop.DBus.Error.NoReply") > > so that will need some debugging. How do you run the tests? The test passes with: $ cd test ; make check TESTS=test-dbus-daemon /no-reply/disconnect: OK /no-reply/timeout: OK PASS: test-dbus-daemon But it fails with: $ cd test ; DBUS_TEST_DATA=data ./test-dbus-daemon /no-reply/disconnect: ** ERROR:dbus-daemon.c:414:test_no_reply: assertion failed (f->e.message == "Message recipient disconnected from message bus without replying"): ("Message did not receive a reply (timeout by message bus)" == "Message recipient disconnected from message bus without replying") Aborted Because it runs /usr/bin/dbus-daemon instead of running dbus-daemon built from the source tree I am testing. So I wonder if your $DBUS_TEST_DAEMON is set correctly. I can't reproduce the error when DBUS_TEST_DAEMON is set correctly and your patch looks good to me.
(In reply to comment #4) > I'm getting intermittent test failures, so the actual implementation is > probably fine, but the tests might not be. Here's one such failure: > > ERROR:/home/smcv/src/fdo/dbus/test/dbus-daemon.c:408:test_no_reply: > assertion failed (f->e.name == DBUS_ERROR_NO_REPLY): > ("org.freedesktop.DBus.Error.ServiceUnknown" == > "org.freedesktop.DBus.Error.NoReply") I think this is a rare race condition. I'm not surprised you couldn't reproduce it. % for x in `seq 1 1000`; do DBUS_TEST_DATA=$builddir/test/data DBUS_TEST_DAEMON=$builddir/bus/dbus-daemon $builddir/test/test-dbus-daemon -p /no-reply/disconnect || { echo "run $x failed with $?"; break; } ; done ... ERROR:/home/smcv/src/fdo/dbus/test/dbus-daemon.c:434:test_no_reply: assertion failed (f->e.name == DBUS_ERROR_NO_REPLY): ("org.freedesktop.DBus.Error.ServiceUnknown" == "org.freedesktop.DBus.Error.NoReply") run 598 failed with 134
Created attachment 106341 [details] [review] dbus-daemon test: don't disconnect before WaitForever call is delivered --- To be squashed into Attachment #95717 [details]. The race condition was that on rare occasions, dbus-daemon would process the disconnection of the "right" connection before it processed the method call from the "left" connection; then when it got round to processing the method call, it would reply "no such name", because the "right" connection already went away.
(In reply to comment #7) > I think this is a rare race condition. It seems to work reliably with that patch: > 2000 consecutive runs succeeded.
This has now been here for several weeks with no objections. If nobody reviews it in the next 2 weeks, I intend to go for "easier to ask forgiveness than permission" and commit it to 1.9.
(In reply to Simon McVittie from comment #10) > This has now been here for several weeks with no objections. If nobody > reviews it in the next 2 weeks, I intend to go for "easier to ask > forgiveness than permission" and commit it to 1.9. Done. Fixed in git for 1.9.2
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.