Bug 17754 - Race condition in protected_change_timeout
Summary: Race condition in protected_change_timeout
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: 1.2.x
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Havoc Pennington
QA Contact: John (J5) Palmieri
: 20495 (view as bug list)
Depends on:
Reported: 2008-09-24 08:05 UTC by Plácido Revilla
Modified: 2010-07-01 03:24 UTC (History)
8 users (show)

See Also:
i915 platform:
i915 features:

PoC patch (728 bytes, patch)
2008-09-24 08:05 UTC, Plácido Revilla
Details | Splinter Review

Description Plácido Revilla 2008-09-24 08:05:30 UTC
Created attachment 19153 [details] [review]
PoC patch

If two threads are simultaneously sending messages to the bus it is possible for one to enter this function to add a timeout, drop the connection lock (setting timeouts to NULL) and for the other one to enter at the same time to remove a timeout and find that timeouts is NULL, thus not trying to remove the timeout.
If this last timeout expires in the event loop, an invalid access is due, as the pending structure is already freed.
The attached patch is an "elegant solution" (read: dirty, ugly, shameful hack) to test this condition.
Comment 1 Colin Walters 2008-09-26 06:43:45 UTC
This does look like a race condition.
Comment 2 Havoc Pennington 2008-09-26 20:39:00 UTC
This thread may be helpful:

As the thread and I think some comments in the code explain, protected_change_timeout is a broken hack to begin with.

Note that any fixes will also apply to protected_change_watch()

Comment 3 Havoc Pennington 2008-09-26 20:40:01 UTC
Hmm, see also 
Comment 4 Havoc Pennington 2009-03-05 10:54:18 UTC
*** Bug 20495 has been marked as a duplicate of this bug. ***
Comment 5 Thiago Macieira 2010-05-18 11:05:32 UTC
It's time we applied this patch. The error is serious.

See Qt bugs reported:
Comment 6 Colin Walters 2010-05-19 10:48:25 UTC
I did a little investigating with Google Code Search, and while a lot of things appear fine, e.g. this code would break with Havoc's patch:


(For anyone who hasn't tried it, I highly recommend Code Search as a way to find out how people are actually using one's APIs in the real world)
Comment 7 Colin Walters 2010-05-19 10:48:54 UTC
The specific problem with the Perl bindings is that it calls dbus_connection_get_data, which tries to acquire the lock.
Comment 8 Colin Walters 2010-05-19 10:59:05 UTC
Almost everything else appears fine.  I honestly don't have a strong opinion.  Havoc knows the code here the best, so I'd be fine taking his patch and just saying the Perl bindings need to be patched for 1.4 (and doing so probably isn't hard). 

If we wanted to be maximally compatible we could make it some sort of opt-in behavior (dbus_connection_set_blocking_timeout_functions?  very eww, I admit).

In the big picture, I'm still really uncomfortable with these ad-hoc thread changes over time, because I'm not at all convinced that we're telling a coherent story about how things work to API consumers.  If we need to e.g. change which thread things get called in in certain circumstances, it's going to be really painful.
Comment 9 Thiago Macieira 2010-05-19 11:09:04 UTC
Havoc introduced a separate lock for the get_data / set_data functions, so that those can be used from inside the timeout/watch callbacks. The perl bindings shouldn't break.

I'm not sure I understand what you meant by "these ad-hoc thread changes".
Comment 10 Colin Walters 2010-05-19 11:21:34 UTC
(In reply to comment #9)
> Havoc introduced a separate lock for the get_data / set_data functions, so that
> those can be used from inside the timeout/watch callbacks. The perl bindings
> shouldn't break.

Ah, indeed.  OK.
Comment 11 Ralf Habacker 2010-06-09 02:19:52 UTC
Could this patch be applied unconditional yet (In reply to comment #9)
> Havoc introduced a separate lock for the get_data / set_data functions, so that
> those can be used from inside the timeout/watch callbacks. The perl bindings
> shouldn't break.
I do not see a patch in the last weeks from havoc in dbus-1.2 branch 

Could this patch be applied unconditional now or is still some "api fix enable" functions like dbus_set_api_policy(DBUS_API_THREAD_SAFETY_FIX,1) required, which have to be called from the related binding to activate this fix.
Comment 12 Thiago Macieira 2010-06-09 10:13:21 UTC
That would be bad API.

The patch that is in this bug report is for the 1.3 branch. Forget 1.2 for it.

For 1.2, I would recommend a separate configuration for the callback functions. Forget "policy" -- it doesn't make sense and the config is not global, it has to be per connection anyway.
Comment 13 Ralf Habacker 2010-06-11 09:42:08 UTC
patch applied to master branch
Comment 14 Havoc Pennington 2010-06-12 11:24:30 UTC
I thought you were going to use my patch to add a separate lock, instead of the loop-and-sleep hack
Comment 15 Plácido Revilla 2010-06-15 01:56:03 UTC
Please, take out that patch from the repository. Especially with my name on it. :-P

That patch wasn't intended as a solution to the problem, only as a proof of concept that there is a problem related to the current locking/unlocking mechanism, as I told in my first comment.

I guess the correct solution is Havoc's patch with a separate lock for the protected_change_* functions.
Comment 16 Thiago Macieira 2010-06-22 05:55:22 UTC
Patch reverted with e40c45fb4635321f17f3d6e4f9a4566fb62b3623
Comment 17 rwman 2010-06-23 01:38:36 UTC
Ok, but will Havoc's patch make its way to 1.3 ?
Comment 18 Thiago Macieira 2010-06-23 03:21:05 UTC

I'm running the patch for a few days in my own computer under real-life situations to see if there are problems. Then I'll push it to 1.3.
Comment 19 Thiago Macieira 2010-07-01 03:24:19 UTC
Fix is committed as 6ff1d079316cb730a54b4e0e95bd3e6e31f439de (dbus-1.3.1-2-g6ff1d07)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.