Dbus seems to lose messages when sending lots of messages in a loop. I made a simple server that writes a line into a file whenever a message is received and a client that sends 3000 messages to the server in a loop. It seems that five or so last messages are lost. Interestingly, about the same amount of messages is lost when sending 10000 messages.
Created attachment 496 [details] Code of the test server and client.
are you syncing everything? (flushing all dbus messages from the client, ensuring you've read them all on the server, ensuring you've written out all the text lines)
The bus drops some messages but you do not see it since you're not watching for replies to your messages. I modified your code to use dbus_connection_send_with_reply instead of dbus_connection_send and I get 'org.freedesktop.DBus.Error.LimitsExceeded' errors.
Probably NOTABUG then, but we should probably understand why the limit is exceeded (in particular I don't know why it would be the same with 3000 and 10000 messages?) I still haven't looked at the code, so apologies if this is silly ;-)
(In reply to comment #4) > Probably NOTABUG then, but we should probably understand why the limit is > exceeded (in particular I don't know why it would be the same with 3000 and > 10000 messages?) It's the max_replies_per_connection limit that is exceeded. By default it is 32, indeed not very high. It works fine when the NO_REPLY flag is set on the client messages.
So the client isn't reading replies quickly enough and they get backlogged? I'd be curious whether a higher max replies value helps or not.
(In reply to comment #6) > So the client isn't reading replies quickly enough and they get backlogged? no, the _server_ isn't replying quickly enough, so the daemon stops accepting method calls from the client. Actually the server code in the attachement doesn't reply at all. So the only moment the backlog of pending replies empties is when the daemon timeouts them ... But even once this is fixed by having the server sending a reply right away, the default limit of 32 triggers some rejection on my system (FC2). Maybe some scheduling issue (the client sends the 32 messages before the server has even time to run ??) > I'd be curious whether a higher max replies value helps or not. It sure does.
I tried with the NO_REPLY flag set on, but same message loss happened. I did it like this, using the dbus_message_set_no_reply function: ... if (!succ) { printf("appending an arg. to message %d failed!\n", i); exit(1); } /* set the NO_REPLY flag */ dbus_message_set_no_reply(msg, 1); succ = dbus_connection_send(conn, msg, &ret); if (!succ) { printf("sending of message %d failed\n", i); exit(1); ... So, it seems that the server has to reply, or messages are lost...
(In reply to comment #8) > I tried with the NO_REPLY flag set on, but same message loss happened. OK. Actually the max_replies_per_connection isn't enforced in DBUS 0.21, only in the CVS. So that wasn't explaining your message losses. (But I still think the default value of 32 is too low)
OK, I think I understand what is going on: it's probably a bug in the watch handler logic. unix_handle_watch() in dbus-transport-unix calls do_reading() which reads data from the socket. But do_reading() never reads more than max_bytes_read_per_iteration bytes at a time. Here, the client sends a lot of large messages and then exits; on the daemon side, the socket buffer contains more than max_bytes_read_per_iteration bytes and poll() signals a HUP condition. unix_handle_watch() reads part of the buffer via do_reading() and then disconnects the transport (closes the fd) although there is still unread data. A possible solution is to have do_reading() ignore the max_bytes_read_per_iteration limit when the watch has the HUP condition. I'm not sure that's the cleanest way to solve this bug though :).
Created attachment 508 [details] [review] possible fix
This looks like an unrelated fix, might go ahead and commit it so we don't lose it in this conversation: - if (_dbus_transport_queue_messages (transport) == DBUS_DISPATCH_NEED_MEMORY) + if (!_dbus_transport_queue_messages (transport)) I'm not sure what the HUP semantics are, if we have a bunch of bytes in the buffer and we repeatedly poll and read, do we get: IN IN IN | HUP IN | HUP IN | HUP HUP i.e. can we just say "ignore HUP while there's an IN flag"?
Verified this bug with D-BUS CVS version (session bus), dating 21th July 2004.
(In reply to comment #12) > I'm not sure what the HUP semantics are, if we have a bunch of bytes in the > buffer and we repeatedly poll and read, do we get: > > IN > IN > IN | HUP > IN | HUP > IN | HUP > HUP Apparently the last step (HUP alone) does not occur, ie the IN flag is always present (but I don't know if that's a portable behaviour of poll). What happens is that do_reading eventually do a _dbus_read() which returns 0 and causes the transport disconnection.
Created attachment 519 [details] [review] cleaner fix Basically, ignore the HANGUP condition
Verified the "cleaner fix" by patching D-BUS CVS version (session bus), dating 21th July 2004. Tried with 5, 3000 and 10000 messages, and no message loss was detected. Maybe we get this fix to the next release...
committed, with one difference: I moved the test for ERROR and HANGUP at the beginning of unix_handle_watch (it was there originally)
I think this bug should be closed, seems to work in 0.22...
Right, let's close this bug.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.