Summary: | assert in dbus-connection.c line 1331 | ||
---|---|---|---|
Product: | dbus | Reporter: | frederic heem <frederic.heem> |
Component: | core | Assignee: | Havoc Pennington <hp> |
Status: | RESOLVED NOTABUG | QA Contact: | John (J5) Palmieri <johnp> |
Severity: | normal | ||
Priority: | high | CC: | kimmo.hamalainen |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
dumpcap-dbus-ok.log
dumpcap-dbus-ko.log possible workaround dbus-mutex-check-return-value.patch dbus-mutex-check-return-value.patch |
Description
frederic heem
2006-09-06 10:18:23 UTC
Created attachment 6846 [details]
dumpcap-dbus-ok.log
Created attachment 6847 [details]
dumpcap-dbus-ko.log
I think this is fixed in CVS. I mistakenly used connection_unref instead of connection_unref_unlocked. Can you try from CVS? Actually come to think of it I think the fix is in my patch which hasn't been committed yet. http://lists.freedesktop.org/archives/dbus/2006-September/005589.html The patch is not the final one but it can be used for testing. I have also seen this when compiling the asserts in with --enable-asserts. My solution was a bit radical: disable connection locks from the daemon, since it's not multi-threaded. I'll attach the patch shortly. Created attachment 6856 [details] [review] possible workaround Here is the workaround I used. The assert is not in the deamon but in a dbus server application (a modified version of dumpcap). The main thread invokes an async callback (start) but an other thread invokes dbus_g_method_return. Is there any test case for this ? I think D-Bus servers use the same code as the daemon. (I saw the assert failing in the daemon.) Here are the results: Version from cvs : ko Version from cvs + patch : ko Actaully, applying the patch wasn't straightforward, default response has been given [heefre@heefre dbus]$ patch -p0 < dbus-connection-fix.bin patching file ChangeLog Hunk #1 FAILED at 1. 1 out of 1 hunk FAILED -- saving rejects to file ChangeLog.rej patching file bus/bus.c patching file bus/connection.c Reversed (or previously applied) patch detected! Assume -R? [n] I committed the patch yesterday a fresh cvs update that includes the patch doesn't solve the problem. Instead of using an asynchrounous method (DBusGMethodInvocation + dbus_g_method_return), a normal synchronous non blocking method is used, and a method with the attribute "NoReply" is used to send the result back. Note another thread is use to invoke this method The assert is still triggered. Actually, I'm wondering if this assert makes sense. Suppose that the main threads has locked the connection mutex and another thread is trying to lock the same mutex, the assert will be always hit. Isn't the purpose of a mutex to protect share resources ? The other case is when the same thread is trying to lock the same mutex. Anyway, the return value of _dbus_mutex_lock and _dbus_mutex_unlock *must* be checked because it detects some problems such as locking recursively a non recursive mutex, destroying a locked mutex or unlocking a non locked mutex etc... I really don't understand this. The code is: lock_mutex(); assert_mutex_locked(); Now how the heck is that failing? Nobody has just explained that, and it isn't obvious to me. lock_mutex() should block until it gets the lock. Is there just some simple bug I'm missing that you guys all see? Are you guys installing a _dbus_mutex_lock implementation that sometimes fails? It's true that this return value is never checked. In fact, john maybe we should remove this return value for 1.0, since I have no idea what one would do if mutex_lock failed. Are you trying to use recursive mutexes? That will not work right now, dbus_mutex_new must create a non-recursive mutex. Yes, this would replace the assertion failure with a deadlock; right now recursive dispatch situations are broken (this is in doc/TODO on the 1.0 list) This is real bug, not just an invention. I'll try to explain in other words on how it fails. The thread where the dispatch is done, inside the glib main loop, locks the mutex, the callback invoked by the dispatch function, typically a function of the glib object registered with dbus, starts a thread to call a blocking legacy function, this function calls dbus_g_method_return immediately on some cases and try to lock the already locked mutex. If the assert is hit, it means that the dbus_mutex_new doesn't block until the glib main loop thread unlocks the mutex, in other word the dbus mutex implementation has a bug. Created attachment 6940 [details] [review] dbus-mutex-check-return-value.patch This patch makes dbus_mutex_lock and dbus_mutex_unlock assert is the mutex is null or if the underlying mutex functions fails. It results in asserts in dbus-deamon and my application that use dbus. [root@heefre ~]# gdb /usr/bin/dbus-daemon GNU gdb Red Hat Linux (6.3.0.0-1.122rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) r --config-file=/etc/dbus-1/system.conf --nofork Starting program: /usr/bin/dbus-daemon --config-file=/etc/dbus-1/system.conf --nofork Reading symbols from shared object read from target memory...done. Loaded system supplied DSO at 0x7f5000 [Thread debugging using libthread_db enabled] [New Thread -1208923952 (LWP 17370)] 17370: assertion failed "return_value == TRUE" file "dbus-threads.c" line 144 function _dbus_mutex_lock Program received signal SIGABRT, Aborted. [Switching to Thread -1208923952 (LWP 17370)] 0x007f5402 in __kernel_vsyscall () (gdb) bt #0 0x007f5402 in __kernel_vsyscall () #1 0x0083b069 in raise () from /lib/libc.so.6 #2 0x0083c671 in abort () from /lib/libc.so.6 #3 0x080ab425 in _dbus_abort () at dbus-sysdeps.c:69 #4 0x08098467 in _dbus_real_assert (condition=Variable "condition" is not available. ) at dbus-internals.c:486 #5 0x0808fb81 in _dbus_mutex_lock (mutex=0x0) at dbus-threads.c:144 #6 0x08094613 in _dbus_data_slot_allocator_alloc (allocator=0x80e4294, mutex_loc=0x80e4290, slot_id_p=0x80e3a28) at dbus-dataslot.c:73 #7 0x0808ad61 in dbus_server_allocate_data_slot (slot_p=0x80e3a28) at dbus-server.c:1052 #8 0x0804e864 in bus_context_new (config_file=0xbfcf05fc, force_fork=FORK_NEVER, print_addr_fd=-1, print_pid_fd=-1, error=0xbfcf0620) at bus.c:541 #9 0x0806a185 in main (argc=3, argv=0xbfcf06e4) at main.c:421 #0 0x007f5402 in __kernel_vsyscall () #1 0x001b8069 in raise () from /lib/libc.so.6 #2 0x001b9671 in abort () from /lib/libc.so.6 #3 0x00920bb5 in _dbus_abort () at dbus-sysdeps.c:69 #4 0x0090dbf7 in _dbus_real_assert (condition=Variable "condition" is not available. ) at dbus-internals.c:486 #5 0x00905311 in _dbus_mutex_lock (mutex=0x0) at dbus-threads.c:144 #6 0x008d9ecf in internal_bus_get (type=DBUS_BUS_SYSTEM, error=0xbfa511bc, private=0) at dbus-bus.c:344 #7 0x00af9539 in dbus_g_bus_get (type=DBUS_BUS_SYSTEM, error=0xbfa51218) at dbus-gmain.c:769 #8 0x08050e5a in sniffer_dbus_register_service (pSniffer=0x83fe8a0) at dbus/wireshark-dbus-sniffer.c:209 #9 0x0805122e in sniffer_dbus_init (pCaptureOptions=0x8054500) at dbus/wireshark-dbus-sniffer.c:140 #10 0x0804f6f1 in main (argc=2, argv=0xbfa51354) at dumpcap.c:381 Created attachment 6941 [details] [review] dbus-mutex-check-return-value.patch This patch doesn't assert if the thread system is not initialized > If the assert is hit, it means
> that the dbus_mutex_new doesn't block until the glib main loop thread
> unlocks the mutex, in other word the dbus mutex implementation has a
> bug.
but dbus doesn't *have* a mutex implementation. What mutex implementation are
you using?
dbus_g_thread_init() was not called... IMHO, this has to be done inside dbus_g_bus_get(), otherwise, this is too error prone. An API shall be easy to use and hard to misuse. Mutex's are never NULL so I am a bit worried about the assert in the last backtrace. They should be the dummy mutex shouldn't they? Or is this just for certaint locks like the connection lock? In any case dbus_g_bus_get most likely should not be calling the thread init functions. If anything we should have a dbus_g_init(gboolean is_threaded); since we also need to initalize the type system apsent of a dbus_g_bus_get call. Have you reviewed the last patch ? Do you mind commit it if validate it. I aggree with you, creating a function dbus_g_init(gboolean is_threaded) is the way forward, Also creating dbus_g_shutdown should be nice to make dbus valgrind friendly Rather than checking that return value, we should just remove it from the function signature. The change might even be binary compatible though it will create compile warnings/errors. glib itself has no shutdown() so if you're using glib you are going to have to use valgrind in garbage collector mode ("detect unreachable blocks") rather than "free was called on everything" mode. Checking the return value of system function such pthread_mutex_lock is not an option, it's necessary. glib and dbus shall have shutdown() function, every professional library have the couple MyLibInit()/MyLibShutdown(). Actually, that's the first 2 two function that must be written when developping a library. pthread_mutex_lock() is guaranteed not to fail in a correct program. dbus has a shutdown, glib does not and it would be very hard to add. feel free to argue with the glib developers if you like. They will just tell you to ignore reachable blocks when looking at valgrind output though. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.