We are trying to send file paths across DBus between two Tracker processes. We found that the trackerd (which sends the messages to tracker-indexer) was exiting.
It turns out that DBus calls _exit() for us from inside a library. That is incredible. No error message, no warning, just _exit(). No library should ever do this. What if you use DBus for some critical application/system?
So here is the back trace:
(gdb) break _exit
Breakpoint 2 at 0xb7b9f044
[Thread 0xb58feb90 (LWP 21904) exited]
[Thread 0xb7173b90 (LWP 21900) exited]
[Thread 0xb60ffb90 (LWP 21903) exited]
[Thread 0xb34ffb90 (LWP 21924) exited]
[Thread 0xb46ffb90 (LWP 21912) exited]
[Thread 0xb3efeb90 (LWP 21917) exited]
[Thread 0xb2cfeb90 (LWP 21927) exited]
Breakpoint 2, 0xb7b9f044 in _exit () from /lib/tls/i686/cmov/libc.so.6
Breakpoint 2, 0xb7b9f044 in _exit () from /lib/tls/i686/cmov/libc.so.6
#0 0xb7b9f044 in _exit () from /lib/tls/i686/cmov/libc.so.6
#1 0xb7e5f3dd in _dbus_exit (code=1) at dbus-sysdeps-unix.c:2480
#2 0xb7e41a31 in _dbus_connection_update_dispatch_status_and_unlock (connection=0x8100048, new_status=DBUS_DISPATCH_COMPLETE) at dbus-connection.c:4032
#3 0xb7e44777 in _dbus_connection_send_and_unlock (connection=0x8100048, message=0x8105cc8, client_serial=0x0) at dbus-connection.c:2015
#4 0xb7e447fd in dbus_connection_send (connection=0x8100048, message=0x8105cc8, serial=0x0) at dbus-connection.c:3086
#5 0xb7e3d2a7 in send_no_return_values (connection=0x8100048, msg=0x8105cc8, error=<value optimized out>) at dbus-bus.c:1372
#6 0xb7e3d326 in dbus_bus_remove_match (connection=0x8100048,
rule=0x8101b90 "type='signal',sender='org.freedesktop.Tracker.Indexer',path='/org/freedesktop/Tracker/Indexer',interface='org.freedesktop.Tracker.Indexer'", error=0x0) at dbus-bus.c:1530
#7 0xb7e79589 in ?? () from /usr/lib/libdbus-glib-1.so.2
#8 0xb7e7a5b5 in ?? () from /usr/lib/libdbus-glib-1.so.2
#9 0xb7d66d70 in g_object_run_dispose () from /usr/lib/libgobject-2.0.so.0
After futher conversations on IRC between walters and pvanhoof. Phillip kindly looked into this and all these IRC logs/back traces come from him:
pvanhoof What makes check_disconnected_message_arrived_unlocked happen in dbus-connection.c ?
pvanhoof And why does this take place? connection->disconnected_message_arrived = TRUE;
pvanhoof Which will result in _dbus_connection_update_dispatch_status_and_unlock calling _dbus_exit, which will unsolicited exit my process
pvanhoof Even in such a way that it's hardly debuggable too, abort() would have been ... nicer
walters by default dbus will exit when the bus dies, just like how xlib will when the X server connection is lost
pvanhoof So the bus dies
pvanhoof What can make it die? :)
walters you can use _set_exit_on_disconnect (), but you should try to avoid that
* abock_ (n=abock@nat/novell/x-0c2477506c243d42) has joined #dbus
walters when the session ends
pvanhoof walters, it looks like dbus_g_bus_get does that
walters what kind of connection is this? to the session bus? system bus? private connection?
pvanhoof I don't seem to have control over that with dbus-glib
pvanhoof connection = dbus_g_bus_get (DBUS_BUS_SESSION, &error);
walters yeah, you'd have to get the DBusConnection using dbus_g_connection_get_connection
pvanhoof Is that better than dbus_g_bus_get ?
pvanhoof I also wonder what is making my session-bus die
walters no you call it after you call dbus_g_bus_get
pvanhoof We are being quite aggressive with what we do with poor little dbus
pvanhoof but dying :) ?
pvanhoof We're also not seeing this on all versions of DBus, only on Ubuntu Hardy's
walters is the session bus crashing?
pvanhoof If there's a way how I can debug this (if it's indeed a session-bus bug), let me know
pvanhoof I can reproduce it easily
walters a stack trace would be useful...does hardy have apport? it should be in /var/crash i think then
pvanhoof I did in /root: DEB_BUILD_OPTIONS=nostrip apt-get source dbus -b
pvanhoof And I installed those packages
pvanhoof Just FYI about my environment
* KaL is now known as KaL_out
walters you could attach gdb to the session bus too
pvanhoof apport is already the newest version.
pvanhoof 108 360 0.0 0.1 2876 1320 ? Ss 20:24 0:00 /usr/bin/dbus-daemon --system
pvanhoof that one?
walters no, that's the system bus
pvanhoof pvanhoof 1134 0.0 0.1 2776 1196 ? Ss 20:26 0:00 dbus-daemon --fork --print-address 24 --print-pid 26 --session
pvanhoof that one?
pvanhoof (gdb) cont
pvanhoof ok, let's see
pvanhoof it didn't crash, yet my application had the _exit() problem
* danpb_ltop has quit (Read error: 104 (Connection reset by peer))
* danpb_ltop (email@example.com) has joined #dbus
walters another possibility is the session bus is kicking your application off because it sent something invalid
pvanhoof Detaching from program: /usr/bin/dbus-daemon, process 1134
walters highly probable is invalid UTF-8
pvanhoof oh, yeah. martyn is sending filenames in a GStrv over the dbus
pvanhoof Can that trigger this?
walters you might try dbus-monitor --session, i think you would see the error return from that
walters certainly if the filenames are not UTF-8
pvanhoof that's dbus-monitor --session, and making it happen
* bergie has quit ()
walters hm, actually you wouldn't see an invalid message in the monitor - i think - because the bus would drop it before it sent it to the monitor handler
pvanhoof But the cause is quite likely an invalid UTF-8 filename on my filesystem, that's being collected into the GStrv and being sent over the bus
pvanhoof Sounds likely, because I'm the only guy who's having these problems
pvanhoof (trackerd:28849): Tracker-DEBUG: Found :'/home/pvanhoof/Documents/Presentations/jokela/zami.pp.fi/stuff/Kuvat/2007/Jyv\xe4skyl\xe4ss\xe4 31.10 - 4.11.2007/index.html' (16)
* thiago_home (n=thiago@kde/thiago) has joined #dbus
pvanhoof like that one :)
walters yeah, whichever app is doing that should probably use g_utf8_validate and if that fails try reencoding it using the locale
This is where the _exit() takes place:
3997 static void
3998 _dbus_connection_update_dispatch_status_and_unlock (DBusConnection *connection,
3999 DBusDispatchStatus new_status)
changed = new_status != connection->last_dispatch_status;
connection->last_dispatch_status = new_status;
function = connection->dispatch_status_function;
data = connection->dispatch_status_data;
if (connection->disconnected_message_arrived &&
connection->disconnected_message_processed = TRUE;
/* this does an unref, but we have a ref
* so we should not run the finalizer here
* inside the lock.
3031 _dbus_verbose ("Exiting on Disconnected signal\n");
4032 _dbus_exit (1);
4033 _dbus_assert_not_reached ("Call to exit() returned");
So from the superb advice from walters, we tried checking every string for UTF-8 validity and it stopped DBus from calling _exit().
Can we please change this implementation so we don't call _exit() EVER in DBus? It would be much better for DBus to just drop the message or return a GError in such cases so people know what is going on.
Read the docs for dbus_connection_set_exit_on_disconnect(). Exit is the default on disconnect for the same reason it is in Xlib: nearly all apps should exit on disconnect from dbus, because the session bus only exits on session end, and it's a bug for an app to persist after the session ends.
It is _allowed_ by the API to persist after session end, or exit in a custom way, but you have to take explicit action to turn off the default behavior. The reason this makes sense is that if you turn off the default behavior you MUST write a handler for the Disconnected signal from your DBusConnection - if you don't exit on this signal, you do at least need to unref and free the connection object. So ignoring it is not OK. If you're going to know to write that handler, you'll also know to call dbus_connection_set_exit_on_disconnect() to turn off the default handler which exits. But if you don't think about it, exiting is the right default, just as Xlib exits when you lose the X server connection.
Xlib doesn't have a way (short of longjmp()) to even turn this off, which I think is wrong, but Xlib does have the proper default. Without this Xlib default we'd be plagued by apps that don't die when the session ends. Same for libdbus.
With libdbus, you must do something on disconnect, there's a default handler, and you can replace it if you like.
Now, the real issue you have is that you're getting disconnected. The reason you're disconnected is that you sent not-well-formed protocol. DBUS_TYPE_STRING is required to be UTF-8. The bus daemon strictly validates well-formedness for security reasons and robustness-of-session reasons, so the whole desktop does not crash. If you don't want UTF-8 validation, then use an array of bytes, not a string.
Unfortunately, it's quite challenging to modify dbus-daemon to try sending back an error reply before it closes the socket; there may be another open bug about that, or at least a thread in list archives, and I think a patch makes sense, but, it isn't an easy patch to write iirc. (I don't remember the details.)
The daemon would need to write out an error reply to the not-well-formed message, flush the socket as possible without blocking, then close the socket, all synchronously. 99% of the time the client would get the error, for debugging purposes, though it would not be guaranteed.
That would be a good patch, in the meantime there's a simple fix that would have saved you: add a _dbus_return_if_fail(validate_utf8(arg)) to whatever public API function allowed you to marshal an invalid UTF-8 string. (In fact there's a general rule that libdbus should never send not-well-formed protocol: it should trap any application mistakes in this respect with return_if_fail() statements. But as you see, some of the return_if_fail are missing; patches to add them are welcome.)
And your app when loading untrusted data should be fixed to validate it before passing it in to APIs that require valid UTF-8 (including dbus and GTK). Both libdbus and GTK have undefined behavior if you give invalid UTF-8 to APIs that require valid strings.
Relevant archive link on how to fix this:
I saw simular happen in one project of mine.. My program just exited. After some debugging I found out that dbus called exit(). More debugging showed that my client was disconnected from the dbus server due to communcation error (I had some garbage strings sent over).
But I do agree, would be better if dbus_* functions in the client returns error on disconnected stated instead of calling exit()
One possibility is to have an environment variable like LIBDBUS_VALIDATE=1 or something which would cause the library to check UTF-8 client side, and if there are any other checks that would make sense, we could do those as well.
I don't think any fancy env variables are needed; most things should already be checked by _dbus_return_if_fail
Why not just add _dbus_return_if_fail(validate_utf8(arg)) to the public API entry points that take a utf-8 string? That's the simple fix...
For actual non-checks validation, I'd stick to having just the recipient not client do it, and fix the bug to send an error back before dropping connection. But that case should basically never happen if we add a couple return_if_fail(valid_utf8) in key places; invalid utf8 is one of the few things not checked already.
I have a branch to validate UTF-8 strings, and the syntax of signatures and object paths, before sending. I'll attach the patch in a moment.
(The validation can be turned off with --disable-checks, at the cost of turning programming errors from a noisy assertion failure into a silent crash. As with GLib g_return_if_fail checks, I don't recommend doing this.)
(In reply to comment #5)
> For actual non-checks validation, I'd stick to having just the recipient not
> client do it, and fix the bug to send an error back before dropping connection.
> But that case should basically never happen if we add a couple
> return_if_fail(valid_utf8) in key places
I've cloned Bug #34726 to represent the request to send back an error when invalid messages are received.
Created attachment 43809 [details] [review]
dbus_message_iter_append_basic: check string-like arguments for validity
Strings: UTF-8 with no embedded NULs, by adding a new internal function,
Object paths, signatures: the obvious syntactic checks
This moves some of the burden of validation to the sender.
When sending <http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt>
10240 times with up to 1024 parallel calls pending, on a single-core ARM
Linux device, I found that user CPU time in dbus-spam increased by up to 80%
as a result of the validation. However, when sending messages to dbus-daemon,
overall throughput only reduced by 15%, and when sending messages to an echo
service, overall throughput actually improved by around 14% (presumably
because making the sender CPU-bound influenced kernel scheduling).
(In reply to comment #7)
> When sending ... in dbus-spam
> to an echo service
These are provided by Bug #34140 (to which I'll attach an updated patch).
Non a d-bus developer, but the patch seems OK to me.
Review of attachment 43809 [details] [review]:
Just one observation, actually.
1225 /* this validation function doesn't follow the same naming pattern :-( */
1226 #define _dbus_validate_utf8(s,b,e) _dbus_string_validate_utf8 (s, b, e)
to the .h file, hiding the implementation of the macro.
It makes the reading easier for who does not want to know how the checking actually works.
(In reply to comment #10)
> Review of attachment 43809 [details] [review]:
> Just one observation, actually.
> I'd move
> 1225 /* this validation function doesn't follow the same naming pattern :-(
> 1226 #define _dbus_validate_utf8(s,b,e) _dbus_string_validate_utf8 (s, b, e)
> to the .h file, hiding the implementation of the macro.
> It makes the reading easier for who does not want to know how the checking
> actually works.
Oh, I'd forgotten that hack was still there. :-(
What I'm working around there is:
* there are two sets of API for validating things:
the _dbus_validate_THING family takes a byte-range inside a DBusString,
whereas the _dbus_check_is_valid_THING family are wrappers, only
defined when --disable-checks is not given, which take a const char *
and assume it's \0-terminated
* for this check, we need to do the same for UTF-8 validation
* DEFINE_DBUS_NAME_CHECK expands to definition of a function
_dbus_check_is_valid_THING that calls _dbus_validate_THING
* ... but UTF-8 validation doesn't follow the same naming pattern:
the DBusString version is called _dbus_string_validate_utf8,
not just _dbus_validate_utf8
* ... so I "rename" it locally
I'm not sure why you think it'd be better to move this hack to the header file and make it API? If we keep it at all, I think its scope should be as narrow as possible, so, only in the implementation.
However, a better fix would probably be to either rename _dbus_string_validate_utf8 to _dbus_validate_utf8, or rename the rest of the _dbus_validate_THING family to _dbus_string_validate_THING. Colin/whoever: any opinions on which direction would be preferred?
(Or, I could write _dbus_check_is_valid_utf8 out without using the weird macros, at the cost of a bit more boilerplate.)
Created attachment 44082 [details] [review]
dbus-marshal-validate: expose _dbus_validate_utf8 as API
On second thoughts, giving it an alias consistent with
_dbus_validate_signature etc. (as Cosimo suggests) seems reasonable.
(This could be squashed into Attachment #43809 [details] if desired.)
Created attachment 44083 [details] [review]
dbus_message_iter_append_basic: validate booleans too
Sending, for instance, ((dbus_bool_t) 666) is a programming error and
should be diagnosed as such.
(Again, this could be squashed into Attachment #43809 [details] if desired. The validation done in message recipients will reject these badly-valued booleans, so it has the same problems as a non-UTF-8 string.)
Created attachment 44084 [details] [review]
dbus_message_iter_append_fixed_array: add a check for valid booleans
The reasoning is the same as for dbus_message_iter_append_basic.
Review of attachment 44083 [details] [review]:
Review of attachment 44084 [details] [review]:
Review of attachment 44082 [details] [review]:
I was thinking the same place :) OK, again no hat.
Anyone interested in reviewing this, or opining whether it should be in 1.4 or 1.5? Colin? Will? Lennart?
(In reply to comment #18)
> Anyone interested in reviewing this, or opining whether it should be in 1.4 or
> 1.5? Colin? Will? Lennart?
These patches look reasonable. Given that this stuff is validated by the receiver, who'll drop the connection if it's invalid, I think applying these to 1.4 is acceptable: it will give better error messages than the current behaviour of unexplained disconnection from the bus.
Fixed in dbus-1.4 for 1.4.8 and in master for 1.5.0.