Bug 8150

Summary: assert in dbus-connection.c line 1331
Product: dbus Reporter: frederic heem <frederic.heem>
Component: coreAssignee: Havoc Pennington <hp>
Status: RESOLVED NOTABUG QA Contact: John (J5) Palmieri <johnp>
Severity: normal    
Priority: high CC: kimmo.hamalainen
Version: unspecified   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: dumpcap-dbus-ok.log
dumpcap-dbus-ko.log
possible workaround
dbus-mutex-check-return-value.patch
dbus-mutex-check-return-value.patch

Description frederic heem 2006-09-06 10:18:23 UTC
Hi,
A problem has been found with dbus-0.92 and dbus-lib-0.71. It only happens 
when used with valgrind, since it slows down the program, it exhibits some 
problems that are not found in normal use. Actually, it can also happen when 
DBUS_VERBOSE=1
Here the following assert:
assertion failed "!(connection)->have_connection_lock" 
file "dbus-connection.c" line 1331 function _dbus_connection_handle_watch
To get ths assert, dbus has been configured with:
./configure --enable-verbose-mode --enable-asserts --enable-checks --prefix=/usr
Please find attached the log files, dumpcap-dbus-ko.log is asserting whereas  
dumpcap-dbus-ok.log is working.
Comment 1 frederic heem 2006-09-06 10:19:44 UTC
Created attachment 6846 [details]
dumpcap-dbus-ok.log
Comment 2 frederic heem 2006-09-06 10:20:44 UTC
Created attachment 6847 [details]
dumpcap-dbus-ko.log
Comment 3 John (J5) Palmieri 2006-09-06 10:26:03 UTC
I think this is fixed in CVS.  I mistakenly used connection_unref instead of
connection_unref_unlocked. Can you try from CVS?
Comment 4 John (J5) Palmieri 2006-09-06 10:40:19 UTC
Actually come to think of it I think the fix is in my patch which hasn't been
committed yet.
Comment 5 John (J5) Palmieri 2006-09-06 10:42:26 UTC
http://lists.freedesktop.org/archives/dbus/2006-September/005589.html

The patch is not the final one but it can be used for testing.
Comment 6 Kimmo Hämäläinen 2006-09-07 00:12:28 UTC
I have also seen this when compiling the asserts in with --enable-asserts. My
solution was a bit radical: disable connection locks from the daemon, since it's
not multi-threaded. I'll attach the patch shortly.
Comment 7 Kimmo Hämäläinen 2006-09-07 00:13:57 UTC
Created attachment 6856 [details] [review]
possible workaround

Here is the workaround I used.
Comment 8 frederic heem 2006-09-07 00:30:57 UTC
The assert is not in the deamon but in a dbus server application (a modified 
version of dumpcap). The main thread invokes an async callback (start) but an 
other thread invokes dbus_g_method_return.
Is there any test case for this ? 
Comment 9 Kimmo Hämäläinen 2006-09-07 00:51:57 UTC
I think D-Bus servers use the same code as the daemon. (I saw the assert failing
in the daemon.)
Comment 10 frederic heem 2006-09-07 01:20:59 UTC
Here are the results:
Version from cvs : ko
Version from cvs + patch : ko
Actaully, applying the patch wasn't straightforward, default response has been 
given
[heefre@heefre dbus]$ patch -p0 < dbus-connection-fix.bin
patching file ChangeLog
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file ChangeLog.rej
patching file bus/bus.c
patching file bus/connection.c
Reversed (or previously applied) patch detected!  Assume -R? [n]      
Comment 11 John (J5) Palmieri 2006-09-07 09:08:03 UTC
I committed the patch yesterday
Comment 12 frederic heem 2006-09-12 08:48:45 UTC
a fresh cvs update that includes the patch doesn't solve the problem.
Instead of using  an asynchrounous method (DBusGMethodInvocation + 
dbus_g_method_return), a normal synchronous non blocking method is used, and a 
method with the attribute "NoReply" is used to send the result back. Note 
another thread is use to invoke this method
The assert is still triggered.
Actually, I'm wondering if this assert makes sense. Suppose that the main 
threads has locked the connection mutex and another thread is trying to lock 
the same mutex, the assert will be always hit. Isn't the purpose of a mutex to 
protect share resources ?
The other case is when the same thread is trying to lock the same mutex. 
Anyway,  the return value of _dbus_mutex_lock and _dbus_mutex_unlock *must* be 
checked because it detects some problems such as locking recursively a non 
recursive mutex, destroying a locked mutex or unlocking a non locked mutex 
etc...

Comment 13 Havoc Pennington 2006-09-12 09:21:27 UTC
I really don't understand this. The code is:
 lock_mutex();
 assert_mutex_locked();

Now how the heck is that failing? Nobody has just explained that, and it isn't 
obvious to me. lock_mutex() should block until it gets the lock. Is there just 
some simple bug I'm missing that you guys all see?

Are you guys installing a _dbus_mutex_lock implementation that sometimes 
fails? It's true that this return value is never checked. In fact, john maybe 
we should remove this return value for 1.0, since I have no idea what one 
would do if mutex_lock failed.

Are you trying to use recursive mutexes? That will not work right now, 
dbus_mutex_new must create a non-recursive mutex. Yes, this would replace the 
assertion failure with a deadlock; right now recursive dispatch situations are 
broken (this is in doc/TODO on the 1.0 list)
Comment 14 frederic heem 2006-09-13 00:48:00 UTC
This is real bug, not just an invention. I'll try to explain in other words on 
how it fails.
The thread where the dispatch is done, inside the glib main loop, locks the 
mutex, the callback invoked by the dispatch function, typically a function of 
the glib object registered with dbus, starts a thread to call a blocking 
legacy function, this function calls dbus_g_method_return immediately on some 
cases and try to lock the already locked mutex. If the assert is hit, it means 
that the dbus_mutex_new doesn't block until the glib main loop thread unlocks 
the mutex, in other word the dbus mutex implementation has a bug.

Comment 15 frederic heem 2006-09-13 01:07:39 UTC
Created attachment 6940 [details] [review]
dbus-mutex-check-return-value.patch

This patch makes dbus_mutex_lock and dbus_mutex_unlock assert is the mutex is
null or if the underlying mutex functions fails.
It results in asserts in dbus-deamon and my application that use dbus.
[root@heefre ~]# gdb /usr/bin/dbus-daemon
GNU gdb Red Hat Linux (6.3.0.0-1.122rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/libthread_db.so.1".

(gdb) r --config-file=/etc/dbus-1/system.conf --nofork
Starting program: /usr/bin/dbus-daemon --config-file=/etc/dbus-1/system.conf
--nofork
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0x7f5000
[Thread debugging using libthread_db enabled]
[New Thread -1208923952 (LWP 17370)]
17370: assertion failed "return_value == TRUE" file "dbus-threads.c" line 144
function _dbus_mutex_lock

Program received signal SIGABRT, Aborted.
[Switching to Thread -1208923952 (LWP 17370)]
0x007f5402 in __kernel_vsyscall ()
(gdb) bt
#0  0x007f5402 in __kernel_vsyscall ()
#1  0x0083b069 in raise () from /lib/libc.so.6
#2  0x0083c671 in abort () from /lib/libc.so.6
#3  0x080ab425 in _dbus_abort () at dbus-sysdeps.c:69
#4  0x08098467 in _dbus_real_assert (condition=Variable "condition" is not
available.
) at dbus-internals.c:486
#5  0x0808fb81 in _dbus_mutex_lock (mutex=0x0) at dbus-threads.c:144
#6  0x08094613 in _dbus_data_slot_allocator_alloc (allocator=0x80e4294,
mutex_loc=0x80e4290, slot_id_p=0x80e3a28) at dbus-dataslot.c:73
#7  0x0808ad61 in dbus_server_allocate_data_slot (slot_p=0x80e3a28) at
dbus-server.c:1052
#8  0x0804e864 in bus_context_new (config_file=0xbfcf05fc,
force_fork=FORK_NEVER, print_addr_fd=-1, print_pid_fd=-1, error=0xbfcf0620)
    at bus.c:541
#9  0x0806a185 in main (argc=3, argv=0xbfcf06e4) at main.c:421

#0  0x007f5402 in __kernel_vsyscall ()
#1  0x001b8069 in raise () from /lib/libc.so.6
#2  0x001b9671 in abort () from /lib/libc.so.6
#3  0x00920bb5 in _dbus_abort () at dbus-sysdeps.c:69
#4  0x0090dbf7 in _dbus_real_assert (condition=Variable "condition" is not
available.
) at dbus-internals.c:486
#5  0x00905311 in _dbus_mutex_lock (mutex=0x0) at dbus-threads.c:144
#6  0x008d9ecf in internal_bus_get (type=DBUS_BUS_SYSTEM, error=0xbfa511bc,
private=0) at dbus-bus.c:344
#7  0x00af9539 in dbus_g_bus_get (type=DBUS_BUS_SYSTEM, error=0xbfa51218) at
dbus-gmain.c:769
#8  0x08050e5a in sniffer_dbus_register_service (pSniffer=0x83fe8a0) at
dbus/wireshark-dbus-sniffer.c:209
#9  0x0805122e in sniffer_dbus_init (pCaptureOptions=0x8054500) at
dbus/wireshark-dbus-sniffer.c:140
#10 0x0804f6f1 in main (argc=2, argv=0xbfa51354) at dumpcap.c:381
Comment 16 frederic heem 2006-09-13 01:53:42 UTC
Created attachment 6941 [details] [review]
dbus-mutex-check-return-value.patch

This patch doesn't assert if the thread system is not initialized
Comment 17 Havoc Pennington 2006-09-13 06:57:13 UTC
> If the assert is hit, it means 
> that the dbus_mutex_new doesn't block until the glib main loop thread 
> unlocks the mutex, in other word the dbus mutex implementation has a
> bug.

but dbus doesn't *have* a mutex implementation. What mutex implementation are 
you using?

Comment 18 frederic heem 2006-09-13 07:04:19 UTC
dbus_g_thread_init() was not called...
IMHO, this has to be done inside dbus_g_bus_get(), otherwise, this is too 
error prone. An API shall be easy to use and hard to misuse.


Comment 19 John (J5) Palmieri 2006-09-13 08:48:40 UTC
Mutex's are never NULL so I am a bit worried about the assert in the last
backtrace.  They should be the dummy mutex shouldn't they?  Or is this just for
certaint locks like the connection lock?

In any case dbus_g_bus_get most likely should not be calling the thread init
functions.  If anything we should have a dbus_g_init(gboolean is_threaded);
since we also need to initalize the type system apsent of a dbus_g_bus_get call.
Comment 20 frederic heem 2006-09-13 09:00:05 UTC
Have you reviewed the last patch ? Do you mind commit it if validate it.
I aggree with you, creating a function dbus_g_init(gboolean is_threaded) is 
the way forward, 
Also creating dbus_g_shutdown should be nice to make dbus valgrind friendly
Comment 21 Havoc Pennington 2006-09-13 09:14:54 UTC
Rather than checking that return value, we should just remove it from the 
function signature. The change might even be binary compatible though it will 
create compile warnings/errors.

glib itself has no shutdown() so if you're using glib you are going to have to 
use valgrind in garbage collector mode ("detect unreachable blocks") rather 
than "free was called on everything" mode.
Comment 22 frederic heem 2006-09-14 00:26:33 UTC
Checking the return value of system function such pthread_mutex_lock is not an 
option, it's necessary.
glib and dbus shall have shutdown() function, every professional library have 
the couple MyLibInit()/MyLibShutdown(). Actually, that's the first 2 two 
function that must be written when developping a library.
Comment 23 Havoc Pennington 2006-09-14 06:42:35 UTC
pthread_mutex_lock() is guaranteed not to fail in a correct program.

dbus has a shutdown, glib does not and it would be very hard to add. feel free 
to argue with the glib developers if you like. They will just tell you to 
ignore reachable blocks when looking at valgrind output though.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.