Bug 74355

Summary: Line drawing not thread-safe
Product: cairo Reporter: H. Guijt <hguijtra>
Component: image backendAssignee: Chris Wilson <chris>
Status: RESOLVED MOVED QA Contact: cairo-bugs mailing list <cairo-bugs>
Severity: major    
Priority: medium CC: soren.sandmann
Version: 1.10.2   
Hardware: x86-64 (AMD64)   
OS: Windows (All)   
Whiteboard:
i915 platform: i915 features:

Description H. Guijt 2014-02-02 09:40:43 UTC
I'm drawing graphs with lots (millions) of tiny line pieces. To speed up this process I used multiple threads to draw the line pieces. This greatly improves performance, but typically causes a crash within a few minutes.

I've tried to do this in two ways:

1. share a single cairo_surface_t between the threads, and have each thread draw using its own cairo_t. This crashes, but maybe I'm hoping for too much (although an image surface is essentially just a big array of bytes that should be writable from multiple threads). 

2. create a cairo_surface_t and cairo_t for each thread, and merge the results after drawing has completed. This crashes in pixman. The only Cairo calls that happen on a per-thread basis are cairo_set_line_cap, cairo_set_source_rgba, cairo_set_line_width, cairo_set_dash, cairo_move_to, cairo_line_to, and cairo_stroke.

Because I'm using the Gnome binary DLLs from http://www.gtk.org/download/win32.php I cannot provide a meaningful stacktrace. I have tried compiling a later version of Cairo but although I managed to compile it, the resulting DLL is utterly unstable on Windows. However, I do believe this might have something to do with the problem described at http://lists.freedesktop.org/archives/cairo/2011-July/022123.html.
Comment 1 Uli Schlachter 2014-02-03 18:25:06 UTC
(In reply to comment #0)
> 1. share a single cairo_surface_t between the threads, and have each thread
> draw using its own cairo_t. This crashes, but maybe I'm hoping for too much
> (although an image surface is essentially just a big array of bytes that
> should be writable from multiple threads). 

Sure, just an array. And this works as long as you expect anything like useful results. Cairo is supposed to be thread-safe as long as the threads don't share any state (well, this is an oversimplification, but your first approach isn't supposed to work).

> 2. create a cairo_surface_t and cairo_t for each thread, and merge the
> results after drawing has completed. This crashes in pixman. The only Cairo
> calls that happen on a per-thread basis are cairo_set_line_cap,
> cairo_set_source_rgba, cairo_set_line_width, cairo_set_dash, cairo_move_to,
> cairo_line_to, and cairo_stroke.

Since this is so simple, do you have some small, self-contained C code that only depends on some threading API and cairo and that reproduces this problem?
Comment 2 Søren Sandmann Pedersen 2014-02-05 22:52:35 UTC
Once thing you may want to check is whether thread-local state works correctly on your system. Pixman relies on this for an internal cache, and it has been known to be broken with some compilers on Windows.
Comment 3 H. Guijt 2014-02-06 18:41:24 UTC
The following program should reproduce the problem but _doesn't_. I'm at a loss why since it goes through the exact same steps the larger application does; the only thing I can think of right now is that it might actually be a stack problem for the worker threads - this might not occur if the whole thing runs on a single thread. Not quite sure what to make of it right now, but I will continue to investigate.

------

#include <array>
#include <cairo-win32.h>
#include <future>
#include <iostream>

const int Width = 2000;
const int Height = 1000;
const int NumThreads = 4;

double RandomValue (double Range)
{	return (Range * rand ()) / RAND_MAX;
}

void DrawCurve (cairo_t *Cairo)
{	::cairo_set_source_rgba (Cairo, RandomValue (1.0), RandomValue (1.0), RandomValue (1.0), RandomValue (1.0));
	::cairo_set_line_width (Cairo, RandomValue (10.0));

	const struct {
		int Size;
		double Pattern [6];
	} Patterns [] = {
		{ 0, {  0.0, 0.0, 0.0, 0.0, 0.0, 0.0 } },
		{ 2, {  0.0, 4.0, 0.0, 0.0, 0.0, 0.0 } },
		{ 2, { 12.0, 4.0, 0.0, 0.0, 0.0, 0.0 } },
		{ 4, { 12.0, 4.0, 0.0, 4.0, 0.0, 0.0 } },
		{ 6, { 12.0, 4.0, 0.0, 4.0, 0.0, 4.0 } },
	};

	const int LinePattern = static_cast<int> (RandomValue (7.0));
	if (LinePattern == 6) 
		::cairo_set_dash (Cairo, nullptr, 0, 0.0);
	else 
		::cairo_set_dash (Cairo, Patterns [LinePattern].Pattern, Patterns [LinePattern].Size, 0.0);

	::cairo_move_to (Cairo, RandomValue (Width), RandomValue (Height));
	const int NumLines = static_cast<int> (RandomValue (2000.0));
	for (int x=0; x<NumLines; ++x) 
		::cairo_line_to (Cairo, RandomValue (Width), RandomValue (Height));

	::cairo_stroke (Cairo);
}

int main (int argc, const char* argv [])
{
	int It = 0;
for (;;) {
	std::cout << "Iteration " << ++It << "\n";

	struct THREADDATA {
		std::future<void> Future;
		cairo_surface_t *CairoSurface;
		cairo_t *Cairo;

		THREADDATA () {};
		THREADDATA (const THREADDATA &Other) {};
	};

	std::array<THREADDATA, NumThreads> Threads;

	// Set up the surfaces.

	for (auto &Thread : Threads) {
		Thread.CairoSurface = ::cairo_image_surface_create (CAIRO_FORMAT_ARGB32, Width, Height);
		Thread.Cairo = ::cairo_create (Thread.CairoSurface);

		// Make the surface transparent.
		::cairo_set_operator (Thread.Cairo, CAIRO_OPERATOR_SOURCE);
		::cairo_set_source_rgba (Thread.Cairo, 0.0, 0.0, 0.0, 0.0);
		::cairo_rectangle (Thread.Cairo, 0, 0, Width, Height);
		::cairo_fill (Thread.Cairo);
		::cairo_set_operator (Thread.Cairo, CAIRO_OPERATOR_OVER);
	}
	
	// Draw the curves on separate threads.

	for (auto &Thread : Threads) {
		Thread.Future = std::async (std::launch::async, [&Thread] {
			for (int x=0; x<250; ++x)
				DrawCurve (Thread.Cairo);
		});
	}

	// Wait for completion.

	for (auto &Thread : Threads) 
		Thread.Future.get ();

	// Create the final graph.

	cairo_surface_t *CairoSurface = ::cairo_image_surface_create (CAIRO_FORMAT_ARGB32, Width, Height);
	cairo_t *Cairo = ::cairo_create (CairoSurface);

	for (auto &Thread : Threads) {
		::cairo_set_source_surface (Cairo, Thread.CairoSurface, 0, 0);
		::cairo_rectangle (Cairo, 0, 0, Width, Height);
		::cairo_fill (Cairo);

		::cairo_destroy (Thread.Cairo);
		::cairo_surface_destroy (Thread.CairoSurface);
	}

	::cairo_surface_write_to_png (CairoSurface, "c:\\result.png");
	::cairo_destroy (Cairo);
	::cairo_surface_destroy (CairoSurface);
}
	return 0;
}
Comment 4 H. Guijt 2014-02-07 17:52:33 UTC
This is what the call stack looks like when the application crashes:

 	0000000000000000()	Unknown
 	libpixman-1-0.dll!0000000065351f85()	Unknown
 	libpixman-1-0.dll!00000000653898cf()	Unknown
 	libpixman-1-0.dll!0000000065302671()	Unknown
 	libcairo-2.dll!0000000068ddd2f1()	Unknown
 	libcairo-2.dll!0000000068ddebe3()	Unknown
 	libcairo-2.dll!0000000068ddff8b()	Unknown
 	libcairo-2.dll!0000000068de0219()	Unknown
 	libcairo-2.dll!0000000068dfd8e5()	Unknown
 	libcairo-2.dll!0000000068dd6901()	Unknown
 	libcairo-2.dll!0000000068dcf54f()	Unknown
 	libcairo-2.dll!0000000068dcf57d()	Unknown
	DRP_Common.exe!salsa_gui::cCurveDrawer_Quality::DrawCurvePoint(double X, double Y) Line 353	C++

The function it was calling is cairo_fill. It appears to be calling a nullptr but tracing down the exact cause without function names is exceedingly difficult.
Comment 5 H. Guijt 2014-02-07 18:04:09 UTC
Here's another:

 	00000000063e5de0()	Unknown
 	libpixman-1-0.dll!0000000065351f85()	Unknown
 	libpixman-1-0.dll!00000000653898cf()	Unknown
 	libpixman-1-0.dll!0000000065302671()	Unknown
 	libcairo-2.dll!0000000068ddd2f1()	Unknown
 	libcairo-2.dll!0000000068ddebe3()	Unknown
 	libcairo-2.dll!0000000068ddff8b()	Unknown
 	libcairo-2.dll!0000000068de07a5()	Unknown
 	libcairo-2.dll!0000000068dfd789()	Unknown
 	libcairo-2.dll!0000000068dd67f9()	Unknown
 	libcairo-2.dll!0000000068dcf4ef()	Unknown
 	libcairo-2.dll!0000000068dcf51d()	Unknown
>	DRP_Common.exe!salsa_gui::cCurveDrawer_Quality::TerminateCurveLine() Line 370	C++

The function called was cairo_stroke, here.
Comment 6 Søren Sandmann Pedersen 2014-02-10 19:10:17 UTC
When pixman calls a NULL pointer, it's essentially always because thread-local storage is broken on the system. Basically, the pixman binary you are using is miscompiled.
Comment 7 H. Guijt 2014-02-15 18:46:46 UTC
Ok, thanks. I'll take this up with the Gnome people then. Although it boggles the mind that TLS could be broken on any compiler in use in 2013...

Have you considered offering Windows binaries for download yourself? I haven't been able to find a precompiled 1.12 anywhere, and had no luck building it myself either (the binary I managed to build was highly unreliable).

Thanks again!
Comment 8 Søren Sandmann Pedersen 2014-02-17 16:49:02 UTC
It is possible that pixman's support for TLS on Windows is simply buggy; it may be that not a lot of people have been using pixman in a multithreaded way on Windows (or have worked around the problem in some way). We will need some kind of way to reproduce the issue to know.

In pixman 0.32.0 and later there is a test program called 'thread-test' that may reproduce this issue if you can get it running on Windows.

Also, if TLS is indeed the problem, then your test program may reproduce the issue if you modify it to use a larger set of operators (more than 8), so that pixman's caching code will be exercised more.

Finally, if you do file a bug against the GNOME binaries, please CC "sandmann@cs.au.dk".
Comment 9 Fan, Chun-wei 2014-04-21 03:58:20 UTC
Hi,

Just saw this thing going on here, probably this might be info that could be as of note (as I am still looking making PangoCairo thread-safe on Windows)...

I tried to build pixman-0.32.4 with both Visual Studio 2008 and Visual Studio 2012 in x64 flavors (as this bug indicates), using pthreads-win32 2.9.1 (built with the same compilers), and the test program that Søren was talking about does run successfully.

Guijt: BTW, probably you knew already, the Windows binaries from GNOME are built with some version of mingw-w64, quite a while ago, so it could be so that those pixman needs to be rebuilt with the compiler that you are using.

With blessings.
Comment 10 GitLab Migration User 2018-08-25 13:55:26 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/cairo/cairo/issues/266.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.