Description
sigmund.augdal
2011-12-12 03:52:41 UTC
See also <http://lists.freedesktop.org/archives/dbus/2011-December/014848.html> and the messages that follow, where I wrote: > If it has the right semantics, then the solution given on the bug is probably > better still, because it delegates correct implementation of a recursive > mutex to the pthreads implementation, which is considerably more likely to > have got it right than we are! But I'm a bit concerned about doing that, > because there's a stated reason in the source why ordinary pthreads > recursive mutexes weren't used (although I haven't worked out why, or > indeed whether, it's still important). Havoc replied: > The approach here is lifted from glib I think, so you could see how they > solve it or if they have the same problem and also why they don't use > pthread builtin recursive locks. The "stated reason in the source" that I referred to is this: /** Creates a new recursively-lockable mutex, or returns #NULL if not * enough memory. Can only fail due to lack of memory. Found in * #DBusThreadFunctions. Do not just use PTHREAD_MUTEX_RECURSIVE for * this, because it does not save/restore the recursion count when * waiting on a condition. libdbus requires the Java-style behavior * where the mutex is fully unlocked to wait on a condition. */ and this: /** Waits on a condition variable. Found in * #DBusThreadFunctions. Must work with either a recursive or * nonrecursive mutex, whichever the thread implementation * provides. Note that PTHREAD_MUTEX_RECURSIVE does not work with * condition variables (does not save/restore the recursion count) so * don't try using simply pthread_cond_wait() and a * PTHREAD_MUTEX_RECURSIVE to implement this, it won't work right. * * Has no error conditions. Must succeed if it returns. */ I still don't know where the assumption of "Java-style" behaviour is actually made, though... The only place where we actually wait on a condition variable is connection->io_path_cond in DBusConnection, which is protected by connection->io_path_mutex. It appears that it might actually be safe to just assume that io_path_mutex is never locked more than once - or indeed divide mutexes into two categories, recursive (but potentially incompatible with condition variables) and non-recursive (but compatible with condition variables - since user code is never called while under io_path_mutex. (In reply to comment #2) > It appears that it might actually be safe to [...] > divide mutexes into two categories, > recursive (but potentially incompatible with condition variables) and > non-recursive (but compatible with condition variables) I tried this, and it seems a reasonable approach. If you care about threads in conjunction with D-Bus, please test/review this (I've cc'd the reporters of various thread-related bugs). If I don't get any feedback, I'm inclined to stop pretending that libdbus works in complex threaded situations at all, break ABI for 1.7.x, and have the new rule be either "you may use each DBusConnection from at most one thread at a time, you must do your own locking" or "libdbus is not thread-safe at all, you must do your own locking". Created attachment 57049 [details] [review] [1] Make _dbus_mutex_new, _dbus_mutex_free static They're only called within their module. Created attachment 57050 [details] [review] [2/4] Distinguish between two flavours of mutex dbus-threads.h warns that recursive pthreads mutexes are not compatible with our expectations for condition variables. However, the only two condition variables we actually use only have their corresponding mutexes locked briefly (and we don't call out to user code from there), so the mutexes don't need to be recursive anyway. That's just as well, because it turns out our implementation of recursive mutexes on pthreads is broken! The goal here is to be able to distinguish between "cmutexes" (mutexes compatible with a condition variable) and "rmutexes" (mutexes which are recursive if possible, to avoid deadlocking if we hold them while calling user code). This is complicated by the fact that callers are not guaranteed to have provided us with both versions of mutexes, so we might have to implement one by using the other (in particular, DBusRMutex *aims to be* recursive, it is not *guaranteed to be* recursive). Created attachment 57051 [details] [review] [3/4] Allow both recursive and non-recursive mutexes to be supplied Created attachment 57052 [details] [review] [4/4] Use actual recursive pthreads mutexes, rather than NIH'ing them, wrong Very loosely based on a patch from Sigmund Augdal. Comment on attachment 57049 [details] [review] [1] Make _dbus_mutex_new, _dbus_mutex_free static Review of attachment 57049 [details] [review]: ----------------------------------------------------------------- You didn't need to remove the docs... Otherwise ti's fine. Comment on attachment 57049 [details] [review] [1] Make _dbus_mutex_new, _dbus_mutex_free static Review of attachment 57049 [details] [review]: ----------------------------------------------------------------- You didn't need to remove the docs... Otherwise ti's fine. Comment on attachment 57050 [details] [review] [2/4] Distinguish between two flavours of mutex Review of attachment 57050 [details] [review]: ----------------------------------------------------------------- I'll assume that it compiles. The change looks correct. But I think we should start requiring both types of mutex so we don't have to try and work around it. ::: dbus/dbus-threads.c @@ +90,5 @@ > + * avoid deadlocks. However, that cannot be relied on. > + * > + * The extra level of indirection given by allocating a pointer > + * to point to the mutex location allows the threading > + * module to swap out dummy mutexes for real a real mutex so libraries s/real a real/a real/ Comment on attachment 57051 [details] [review] [3/4] Allow both recursive and non-recursive mutexes to be supplied Review of attachment 57051 [details] [review]: ----------------------------------------------------------------- ship it Comment on attachment 57052 [details] [review] [4/4] Use actual recursive pthreads mutexes, rather than NIH'ing them, wrong Review of attachment 57052 [details] [review]: ----------------------------------------------------------------- ::: dbus/dbus-sysdeps-pthread.c @@ +96,5 @@ > > + return DBUS_MUTEX (pmutex); > +} > + > +#ifdef PTHREAD_MUTEX_RECURSIVE PTHREAD_MUTEX_RECURSIVE is not a #define, it's an enum value. This seems to be a common mistake, since libupnp does the same thing. PHTREAD_MUTEX_RECURSIVE is in POSIX 2008 Base and was in Single Unix v2. @@ +272,5 @@ > + DBUS_THREAD_FUNCTIONS_MUTEX_NEW_MASK | > + DBUS_THREAD_FUNCTIONS_MUTEX_FREE_MASK | > + DBUS_THREAD_FUNCTIONS_MUTEX_LOCK_MASK | > + DBUS_THREAD_FUNCTIONS_MUTEX_UNLOCK_MASK | > +#ifdef PTHREAD_MUTEX_RECURSIVE Ditto (In reply to comment #10) > The change looks correct. But I think we should start requiring both > types of mutex so we don't have to try and work around it. In theory they're supplied (or can be supplied) by library-users, not necessarily by libdbus itself (in theory we support platforms that aren't Windows and don't have pthreads!); requiring both (or indeed requiring recursive mutexes) is an ABI break. If we change this at all, my preference would be to go the same route as recent GLib: - refuse to build on non-pthreads, non-Windows platforms, and accept patches for additional platforms' thread systems in the unlikely event that anyone cares - probably require PTHREAD_MUTEX_RECURSIVE too, while we're at it - make dbus_threads_init() and dbus_threads_init_default() not do anything - have mutex/condvar creation just always work, and implicitly do any initialization that's desired - optionally, for applications known to be single-threaded (i.e. dbus-daemon) where thread locking is a performance hit, have a global _dbus_disable_thread_safety() which you MUST call before doing anything else (or make it public if it would have users beyond dbus-daemon); only do this if benchmarks prove that it actually helps Given that our recursive mutexes have in fact always been broken, and distributors who sync with GNOME releases (e.g. Lennart on behalf of Fedora) want a dbus-1.6 relatively imminently, I'd be inclined to defer this to dbus-1.7, but I could be persuaded do just do this straight away if other maintainers think it's a better idea. > ::: dbus/dbus-threads.c > @@ +90,5 @@ > > + * avoid deadlocks. However, that cannot be relied on. > > + * > > + * The extra level of indirection given by allocating a pointer > > + * to point to the mutex location allows the threading > > + * module to swap out dummy mutexes for real a real mutex so libraries > > s/real a real/a real/ Thanks, I'll fix that. (In reply to comment #8) > You didn't need to remove the docs... They're no longer going into the Doxygen (not useful for non-public API), so anyone who'd be looking at the (nearly content-free) doc-comment can equally easily look at the implementation; but the version of them at the end of this branch is a bit more complex (so more deserving of documentation), so OK, I'll reinstate a doc-comment. (In reply to comment #12) > PTHREAD_MUTEX_RECURSIVE is not a #define, it's an enum value. > > This seems to be a common mistake, since libupnp does the same thing. > > PHTREAD_MUTEX_RECURSIVE is in POSIX 2008 Base and was in Single Unix v2. Thanks, good catch. Which would you prefer: Autoconf check, or just don't support anything older? To be honest I'm leaning towards just not supporting anything older, and if people running ye olde proprietary Unix want to contribute an Autoconf check, they can do so. For the record, I think the fact that we have mutexes which are "best-effort recursive" is crazy: either they should be guaranteed-recursive, or we should just use non-recursive (i.e. fast) mutexes, and structure code in such a way that we never call out to user code holding a lock. In a couple of places we're basically obliged to call out to user code under a lock for main loop integration. I'm inclined to think that if you call into DBusConnection API in your "add watch" or "toggle timeout" (or whatever) callback, you deserve to deadlock... The thing for which recursive mutexes were added appears to be <http://lists.freedesktop.org/archives/dbus/2006-February/004128.html> in which dbus_connection_dispatch() ends up calling back into dispatching, recursively - which is said to be safe iff you have recursive mutexes, and deadlock otherwise. I'm not sure what high-level API use would make this happen, and I don't think we have a test case. Created attachment 57078 [details] [review] [4 v2] Use actual recursive pthreads mutexes, rather than NIH'ing them, wrong Very loosely based on a patch from Sigmund Augdal. For the moment, we make PTHREAD_MUTEX_RECURSIVE a hard requirement: it's required by POSIX 2008 Base and SUSv2. If your (non-Windows) platform doesn't have PTHREAD_MUTEX_RECURSIVE, please report a bug on freedesktop.org bugzilla with details of the platform in question. --- This removes the faulty PTHREAD_MUTEX_RECURSIVE check that Thiago noticed, and fixes the recursive mutex code to actually compile. Nothing I maintain uses D-Bus in a threaded way, so this could do with testing from anyone who has threaded test-cases! Created attachment 57079 [details] [review] [5] dbus-threads: improve documentation This reinstates documentation for _dbus_mutex_new() and _dbus_mutex_free(), and fixes some typos spotted during review. It also documents the newly-introduced functions. I'd: 1) start requiring pthreads (with PTHREAD_RECURSIVE_MUTEX) everywhere except Windows 2) Windows code uses whatever Windows developers think is best 3) apply this to 1.5, let people know this is happening (changelog, etc.) and then make a release of 1.6 soonish. That implies that dbus_init_threads do nothing. Also, dbus-daemon does not link to libdbus-1, it has a private copy of it. So we can disable locks quite easily on it if we want to. (In reply to comment #17) > 1) start requiring pthreads (with PTHREAD_RECURSIVE_MUTEX) everywhere > except Windows It turns out we already require either pthreads (but possibly without recursive mutexes) everywhere except Windows, since _dbus_threads_init_platform_specific() must be present in libdbus, and the only two implementations of that are for pthreads and for Windows. So that seems fine. > That implies that dbus_init_threads do nothing. ... so, about that. I tried going for a GLib-like approach: automatically initialize threads whenever someone mentions a mutex, have the two threading backends (pthreads and Windows) implement the functions directly, and make dbus_threads_init_default() and dbus_threads_init() into no-ops. Unfortunately, it's looking as though we also have some interesting chicken-and-egg problems. For instance, allocating memory with dbus_new uses an atomic operation on the "number of blocks outstanding" counter, which might be emulated with a mutex, which requires a global mutex, which is memory that has to be allocated. Similarly, initializing threading sets a function to be called on dbus_shutdown(), which takes a global lock which isn't necessarily possible to create yet. So, I wonder whether we might still need to require dbus_threads_init() or dbus_threads_init_default(), and make them both equivalent to what dbus_threads_init_default() now does. This is yet another thing that'd be simpler if we didn't pretend to be simultaneously thread-safe, OOM-safe, usable without a main loop and n other hoops we have to jump through... > Also, dbus-daemon does not link to libdbus-1, it has a private copy > of it. So we can disable locks quite easily on it if we want to. Yes, that was my thought. (In reply to comment #0) > running dbus_signal_get and dbus_signal_send simultaneously in > different threads causes many forms of different crashes. Can you provide a test-case for this, please? Neither of these functions exists under this name in libdbus. I'm sorry. I did not at the time realize that these were names internal to our application. I will try to provide a test case shortly. Created attachment 57166 [details]
testcase
Test case showing bad thread handling in libdbus-1. It repeatedly emits signals in one thread and receives them in another. prints a debug message every ten iterations, stops after 500000.
Using this test I've observed the following:
* deadlocks. Can happen at any time, also after several hundred thousand iterations. The most common issue
* segfaults. Very very seldom
* The following messages followed by the process being aborted:
gone 681 iterations
process 4784: The last reference on a connection was dropped without closing the connection. This is a bug in an application. See dbus_connection_unref() documentation for details.
Most likely, the application called unref() too many times and removed a reference belonging to libdbus, since this is a shared connection.
D-Bus not built with -rdynamic so unable to print a backtrace
zsh: abort ./dbus-external-test
This typically only happens relatively quickly (within the first thousand iterations)
Confirms problem in dbus lib in fedora core 15. dbus-lib in fedora core 13 with my patch from before does not show any problems with this test
Created attachment 57212 [details] [review] [6] Never use non-libdbus threading primitives This lets us simplify considerably, by assuming that we always have both recursive and suitable-for-condition-variable mutexes. The Windows implementation has been compiled (on 32-bit mingw-w64) but not tested. Justification for the approach used on Windows, and in particular, using the existing "non-recursive" locks as if they were recursive: * We've been using them in conjunction with condition variables all along, so they'd better be suitable * On fd.o #36204, Ralf points out that mutexes created via CreateMutex are, in fact, recursive * Havoc's admonitions about requiring "Java-style" recursive locking (waiting for a condition variable while holding a recursive lock requires releasing that lock n times) turn out not to apply to either of our uses of DBusCondVar in DBusConnection, because the lock is only held for a short time, without calling into user code; indeed, our Unix implementation isn't recursive anyway, so if the Windows implementation reaches the deadlocking situation somehow (waiting for condition variable while locked more than once), the Unix implementation would already have deadlocked on the same code path (trying to lock more than once) One possible alternative to a CreateMutex mutex for use with condition variables would be a CRITICAL_SECTION. I'm not going to implement this, but Windows developers are welcome to do so. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=36204 Bug: https://bugs.freedesktop.org/show_bug.cgi?id=43744 --- This additionally "fixes" Bug #36204 (by re-documenting the Windows mutexes as having been recursive all along) and makes Bug #36205 irrelevant. I tried Attachment #57166 [details]; it passed. Thanks for this test case! Ralf, do you have any test cases for the threading support on Windows, and could you please test my patches 1-6 there? I think I got the Windows implementation here right, and the tests cross-compile successfully, but I haven't got most of them to work under Wine yet (there are probably some wrong assumptions when built with autotools, I suspect). (In reply to comment #23) > Ralf, do you have any test cases for the threading support on Windows sorry, no > and could you please test my patches 1-6 there? how to test - with the test case mentioned in comment 22 ? (In reply to comment #24) > how to test - with the test case mentioned in comment 22 ? That'd be ideal, if you can adapt it to work on Windows. Thanks! (Bug #46234 might also be this bug.) I tried to git am the patched according there order in the attachments list and got problems with the patch number 5. C:\kde\trunk\git\dbus-src-git>git am 43744\57078.patch Applying: Use actual recursive pthreads mutexes, rather than NIH'ing them, wrong error: patch failed: dbus/dbus-sysdeps-pthread.c:44 error: dbus/dbus-sysdeps-pthread.c: patch does not apply Patch failed at 0001 Use actual recursive pthreads mutexes, rather than NIH'ing them, wrong When you have resolved this problem run "git am --resolved". If you would prefer to skip this patch, instead run "git am --skip". To restore the original branch and stop patching run "git am --abort". BTW: There is also a pthread implementation on windows http://sourceware.org/pthreads-win32/. If this library would be feature compatible http://sourceware.org/pthreads-win32/conformance.html, which I cannot estimate it would be possible to switch to pthread also on windows. Created attachment 57326 [details] [review] [1 v2] Make _dbus_mutex_new, _dbus_mutex_free static They're only called within their module. --- Updated version of Attachment #57049 [details], definitely applies cleanly to master as of e90b160011. Created attachment 57327 [details] [review] [2 v2] Distinguish between two flavours of mutex dbus-threads.h warns that recursive pthreads mutexes are not compatible with our expectations for condition variables. However, the only two condition variables we actually use only have their corresponding mutexes locked briefly (and we don't call out to user code from there), so the mutexes don't need to be recursive anyway. That's just as well, because it turns out our implementation of recursive mutexes on pthreads is broken! The goal here is to be able to distinguish between "cmutexes" (mutexes compatible with a condition variable) and "rmutexes" (mutexes which are recursive if possible, to avoid deadlocking if we hold them while calling user code). This is complicated by the fact that callers are not guaranteed to have provided us with both versions of mutexes, so we might have to implement one by using the other (in particular, DBusRMutex *aims to be* recursive, it is not *guaranteed to be* recursive). Created attachment 57328 [details] [review] [3 v2] Allow both recursive and non-recursive mutexes to be supplied Created attachment 57329 [details] [review] [4 v3] Use actual recursive pthreads mutexes, rather than NIH'ing them, wrong Very loosely based on a patch from Sigmund Augdal. For the moment, we make PTHREAD_MUTEX_RECURSIVE a hard requirement: it's required by POSIX 2008 Base and SUSv2. If your (non-Windows) platform doesn't have PTHREAD_MUTEX_RECURSIVE, please report a bug on freedesktop.org bugzilla with details of the platform in question. Created attachment 57330 [details] [review] [5 v2] dbus-threads: improve documentation This reinstates documentation for _dbus_mutex_new() and _dbus_mutex_free(), and fixes some typos spotted during review. It also documents the newly-introduced functions. Created attachment 57331 [details] [review] [6 v2] Never use non-libdbus threading primitives This lets us simplify considerably, by assuming that we always have both recursive and suitable-for-condition-variable mutexes. The Windows implementation has been compiled (on 32-bit mingw-w64) but not tested. Justification for the approach used on Windows, and in particular, using the existing "non-recursive" locks as if they were recursive: * We've been using them in conjunction with condition variables all along, so they'd better be suitable * On fd.o #36204, Ralf points out that mutexes created via CreateMutex are, in fact, recursive * Havoc's admonitions about requiring "Java-style" recursive locking (waiting for a condition variable while holding a recursive lock requires releasing that lock n times) turn out not to apply to either of our uses of DBusCondVar in DBusConnection, because the lock is only held for a short time, without calling into user code; indeed, our Unix implementation isn't recursive anyway, so if the Windows implementation reaches the deadlocking situation somehow (waiting for condition variable while locked more than once), the Unix implementation would already have deadlocked on the same code path (trying to lock more than once) One possible alternative to a CreateMutex mutex for use with condition variables would be a CRITICAL_SECTION. I'm not going to implement this, but Windows developers are welcome to do so. --- With some compiler warnings fixed. Created attachment 57332 [details] [review] [7] Remove _dbus_condvar_wake_all and both of its implementations Neither was used, and the Windows version could lead to live-locks. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=43744 Bug: https://bugs.freedesktop.org/show_bug.cgi?id=44609 --- New patch. (In reply to comment #26) > I tried to git am the patched according there order in the > attachments list and got problems with the patch number 5. That was the same as the order of numeric prefixes, right? I've re-attached a version of the patches that certainly does apply. Also available in git: ssh://people.freedesktop.org/~smcv/dbus.git sane-threads-43744 > BTW: There is also a pthread implementation on windows > http://sourceware.org/pthreads-win32/. If this library would be feature > compatible http://sourceware.org/pthreads-win32/conformance.html, which > I cannot estimate it would be possible to switch to pthread also on windows. I have no idea how good that library is - do any other projects used by KDE on Windows use it? - but the feature list does look sufficient for our Unix threading implementation to work. If it does and you don't mind depending on that library, it'd be a good way to share D-Bus threading work between Unix and Windows. pthreads do have one very nice feature for D-Bus, which normal Windows threading seems to lack: you can initialize a mutex statically, without allocating heap memory. That'd be a really useful feature for libdbus, since the need to allocate memory while initializing threads (which we assume can fail) is one of the things that forces us to keep dbus_threads_init_default(), rather than just having thread support all the time (and a way for dbus-daemon to "opt out"). Comment on attachment 57326 [details] [review] [1 v2] Make _dbus_mutex_new, _dbus_mutex_free static Review of attachment 57326 [details] [review]: ----------------------------------------------------------------- Ship it! Comment on attachment 57327 [details] [review] [2 v2] Distinguish between two flavours of mutex Review of attachment 57327 [details] [review]: ----------------------------------------------------------------- Looks ok. I'd remove completely DBusMutex though. Comment on attachment 57328 [details] [review] [3 v2] Allow both recursive and non-recursive mutexes to be supplied Review of attachment 57328 [details] [review]: ----------------------------------------------------------------- Ship it! Comment on attachment 57329 [details] [review] [4 v3] Use actual recursive pthreads mutexes, rather than NIH'ing them, wrong Review of attachment 57329 [details] [review]: ----------------------------------------------------------------- Ship it! Comment on attachment 57330 [details] [review] [5 v2] dbus-threads: improve documentation Review of attachment 57330 [details] [review]: ----------------------------------------------------------------- Doc only change, so it's fine. But I don't get why we need to keep the _dbus_mutex_* functions at all. Comment on attachment 57331 [details] [review] [6 v2] Never use non-libdbus threading primitives Review of attachment 57331 [details] [review]: ----------------------------------------------------------------- The change looks sane. If there are any bugs in the code being added, the bugs were already present in the copy of the code that was there before. This also answers my question about the functions that should be gone. ::: dbus/dbus-threads.c @@ -76,5 @@ > - * > - * @param ctor a preferred constructor for a new mutex > - */ > -static DBusMutex * > -_dbus_mutex_new (DBusMutexNewFunction ctor) There. It's gone now :-) Comment on attachment 57332 [details] [review] [7] Remove _dbus_condvar_wake_all and both of its implementations Review of attachment 57332 [details] [review]: ----------------------------------------------------------------- Ship it! I was reading whether live-locks or fairness problems existed on the Windows code when I was reading the previous patch. All merged for 1.5.10, thanks! Created attachment 57761 [details]
testcase for windows
Created attachment 57762 [details]
Cmake file for windows test case
(In reply to comment #43) > Created attachment 57761 [details] > testcase for windows Testcase runs without any problem on windows7 (debug build) |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.