Bug 90318 - TSan data races with freed_pool_t's |top| data member
Summary: TSan data races with freed_pool_t's |top| data member
Status: RESOLVED FIXED
Alias: None
Product: cairo
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: cairo-bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-05 15:30 UTC by Nathan Froyd
Modified: 2016-03-05 13:37 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
data races detected by TSan in Firefox (50.03 KB, text/plain)
2015-05-05 15:30 UTC, Nathan Froyd
Details
Hack adding C11 atomics for setting pool->top (2.48 KB, patch)
2016-02-26 20:39 UTC, Uli Schlachter
Details | Splinter Review
Proposed patch (6.25 KB, patch)
2016-03-02 02:07 UTC, Wan-Teh Chang
Details | Splinter Review

Description Nathan Froyd 2015-05-05 15:30:10 UTC
Created attachment 115548 [details]
data races detected by TSan in Firefox

The attached log file shows two different data races coming from freed_pool_t during a Firefox test run.  (The stacks are from Firefox's in-tree cairo, which, while old, is not significantly different from upstream in this particular case.)

While freed_pool_t is careful to use atomic operations for accessing the pool's list of free things, no such caution is taking with the pointer for the next free thing in the pool, |top|.  |top| can be accessed by multiple threads without any synchronization.

Normally I'd suggest fixing this via mutexes, but it appears cairo doesn't support dynamically allocated mutexes (?).  Atomic accesses to |top| are also a possibility, though I'm not entirely convinced that |top| and |pool| couldn't get out-of-sync somehow.
Comment 1 Chris Wilson 2015-05-05 18:19:35 UTC
The fun here is that the race is expected. All that we need to annotate is that the read/write is atomic (without actually setting a barrier). So long as the actual use of the slot is atomic, there can not be two threads that succeed in accessing the same slot simultaneously.
Comment 2 Dmitry Vyukov 2016-02-25 08:47:34 UTC
A data race cannot be intentional (unless you intention is to spread exploits in software). The C standard is pretty clear that any data race results in undefined behavior of the program. Even a read or a write of an int variable can corrupt memory and crash the program. See the following for some examples:
https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
http://hboehm.info/boehm-hotpar11.pdf
In this case a very real possibility is overflow/underflow of the index. E.g. compiler can compile update of top in increment in _freed_pool_put, then you can easily get overflow and corrupt memory. Or compiler can re-read top after the if (i < 0) check, then you get underflow and corrupt memory again.
There is already _cairo_atomic_int_get. Please add _cairo_atomic_int_set and use these for top manipulation.
Comment 3 Uli Schlachter 2016-02-26 20:39:59 UTC
Created attachment 121989 [details] [review]
Hack adding C11 atomics for setting pool->top

The attached patch uses C11 atomics for setting and reading pool->top. Could someone check if this patch makes TSan happy?

If so, that someone should also pick up this work and add C11 atomic support for cairo-atomic-private.h (and configure). Afterwards, _cairo_atomic_int_set() could be added to this and the freed pool can be fixed up to work properly.
I'm not interested in doing this work myself.
Comment 4 Wan-Teh Chang 2016-03-02 02:06:00 UTC
Comment on attachment 121989 [details] [review]
Hack adding C11 atomics for setting pool->top

Review of attachment 121989 [details] [review]:
-----------------------------------------------------------------

Hi Uli,

I've tested such a patch in
https://bugzilla.mozilla.org/show_bug.cgi?id=1161170#c4.
It works. I will attach a proper patch based on the GCC
__atomic builtin functions next.


Why did't I use C11 atomics in the patch I'm about
to attach? A new implementation of the
_cairo_atomic_xxx functions based on C11 atomics will
require changes to function prototypes because C11
distinguishes between the |atomic_int| and |int| types
whereas Cairo uses the cairo_atomic_int_t type for both
purposes.
Comment 5 Wan-Teh Chang 2016-03-02 02:07:51 UTC
Created attachment 122070 [details] [review]
Proposed patch
Comment 6 Wan-Teh Chang 2016-03-03 22:40:47 UTC
Comment on attachment 122070 [details] [review]
Proposed patch

Please review this patch.

The patch adds atomic int load and store functions, with the relaxed
memory order, to cairo-atomic-private.h and cairo-atomic.c, and uses
the new functions in cairo-freed-pool-private.h and cairo-freed-pool.c.

Thank you.
Comment 7 Behdad Esfahbod 2016-03-04 22:50:14 UTC
lgtm.  Chris, Uli, can we get this in?
Comment 8 Uli Schlachter 2016-03-05 13:37:35 UTC
$ git am -s /tmp/attachment.cgi\?id=122070
 * Running commit-msg hook
fatal: Summary is way too long. (153 characters)

Ok, let's try that again.

commit 3538ac3e68f997d95d76865247717c90ae73630d
Author: Wan-Teh Chang <wtc@google.com>
Date:   Tue Mar 1 17:48:55 2016 -0800

    Fix data race in freed_pool
    
    This adds _cairo_atomic_int_get_relaxed and _cairo_atomic_int_set_relaxed which
    are meant to have a behaviour of relaxed read/writes in C11's memory model. This
    patch also uses these new function to fix a data race with freed_pool_t's |top|
    data member.
    
    Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=90318
    Signed-off-by: Wan-Teh Chang <wtc@google.com>
    Signed-off-by: Uli Schlachter <psychon@znc.in>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.