Bug 9164

Summary: accesses cached message outside message cache lock
Product: dbus Reporter: Jonathan Matthew <notverysmart>
Component: coreAssignee: Havoc Pennington <hp>
Status: RESOLVED FIXED QA Contact: John (J5) Palmieri <johnp>
Severity: normal    
Priority: high CC: kimmo.hamalainen
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: test programs

Description Jonathan Matthew 2006-11-26 21:19:22 UTC
There's a tiny race condition here:

static void
dbus_message_cache_or_finalize(DBusMesasge *message)
{
 ...

  _DBUS_LOCK (message_cache);

 ...

  message_cache[i] = message;

 ...

 out:
  _DBUS_UNLOCK (message_cache);

  _dbus_assert (message->refcount.value == 0);

 ...
}

If the message is cached, it's possible for another thread to remove the message
from the cache and set the reference count to 1 between the unlock and the
assertion check.

I suspect this is causing gnome bug 369214, in which the gnome-vfs dbus code
dies with this message:

24035: assertion failed "message->refcount.value == 0" file "dbus-message.c"
line 672 function dbus_message_cache_or_finalize

under multithreaded use.  I've only seen this happen once or twice in (probably)
hundreds of thousands of operations when I've been deliberately stressing it.
Reporters of that bug and its duplicates don't seem to be able to reproduce it
reliably.
Comment 1 Havoc Pennington 2006-11-26 21:43:41 UTC
Good catch - looks bogus indeed...
Comment 2 Jonathan Matthew 2006-11-27 06:08:48 UTC
Created attachment 7905 [details]
test programs

A pair of test programs (client and server) that attempt to mimic gnome-vfs
dbus usage.  That is, each client thread has its own dbus connection, and each
connection at the server is handled in a new thread.  The client just throws
empty messages at the server as fast as it can, and the server responds with
messages containing a single integer (the number of requests handled by the
thread).

For me, this triggers the bug within a couple of minutes.  After patching dbus
to perform the check while still holding the lock, I've left them running for
about half an hour with no problems.
Comment 3 Havoc Pennington 2007-07-18 14:43:35 UTC
Fixed in 1.0.x and 1.1.x, thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.