Bug 34335

Summary: Breaks the GNUstep cairo backend
Product: pixman Reporter: Yavor Doganov <yavor>
Component: pixmanAssignee: Søren Sandmann Pedersen <soren.sandmann>
Status: RESOLVED NOTOURBUG QA Contact: Søren Sandmann Pedersen <soren.sandmann>
Severity: normal    
Priority: medium CC: kibi
Version: 0.18.x   
Hardware: x86 (IA32)   
OS: other   
Whiteboard:
i915 platform: i915 features:
Attachments: Example backtrace
Example program
Makefile to compile the example program
Test program

Description Yavor Doganov 2011-02-16 05:13:00 UTC
Created attachment 43426 [details]
Example backtrace

OS: GNU/Linux

(I reported this to Debian initially, but one of the maintainers said to report it upstream as well.)

All GNUstep GUI apps crash immediately on startup when pixman 0.18.0 or newer is installed (there is no problem 0.16.4).  Example backtrace attached.

Another user said:
"It's caused by the thread-local fast_path_cache variable in pixman.c. If you make that non-thread-local (a normal static variable) the problem will go away.

The root problem here is interaction between thread local storage and dlopen, because the gnustep-back bundle, which dynamically links to libpixman, is dlopened by gnustep-gui. However, I'm not sure how to properly fix it other than building pixman without TLS."
Comment 1 Søren Sandmann Pedersen 2011-02-16 08:53:38 UTC
Do you have a link to the debian bug?
Comment 2 Søren Sandmann Pedersen 2011-02-16 08:57:30 UTC
Nevermind, it's at

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613221
Comment 3 Yavor Doganov 2011-02-16 09:00:14 UTC
(In reply to comment #2)
> Nevermind, it's at
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613221

Yes, sorry for not including the link in the first place.
Comment 4 Søren Sandmann Pedersen 2011-02-16 09:57:09 UTC
It isn't clear to me that this is a pixman bug. __thread is definitely supposed to work with dlopen().

A self-contained application that demonstrates the issue would be very useful.
Comment 5 Yavor Doganov 2011-02-16 12:41:25 UTC
Created attachment 43449 [details]
Example program

Here is a simple "hello world" app that exibits the bug.
You can set the backend with `defaults write NSGlobalDomain GSBackend libgnustep-foo', where `foo' can be "art", "cairo" or "x11".  The crash happens with cairo and pixman >= 0.18.0
Comment 6 Yavor Doganov 2011-02-16 12:43:30 UTC
Created attachment 43450 [details]
Makefile to compile the example program

Needs gnustep-make.  Type `gs_make' (on Debian) or `make GNUSTEP_MAKEFILES=/path/to/GNUstep/makefiles' (usually /usr/share/GNUstep/Makefiles, but it's distro/installation specific).
Comment 7 Søren Sandmann Pedersen 2011-02-17 04:21:02 UTC
By "self-contained" I meant "does not depend on anything outside of pixman".

For what it's worth, the attached program works here (Fedora 14), but it doesn't appear to use pixman either.
Comment 8 Søren Sandmann Pedersen 2011-02-17 04:50:52 UTC
Created attachment 43482 [details]
Test program

Here is a program that opens pixman with dlopen() and does a bunch of compositing. It works here on x86-64 Fedora 14.

We are going to need some clearer evidence that this is a problem in pixman.
Comment 9 Yavor Doganov 2011-02-17 05:16:53 UTC
(In reply to comment #7)
> For what it's worth, the attached program works here (Fedora 14),

Interesting.  Are you sure you're using the cairo backend?  (You can check if you start the program with --GNU-Debug=dflt.)

> but it doesn't appear to use pixman either.

Right, GNUstep apps do not use pixman, neither does gnustep-back-cairo.  It just happens to link indirectly with pixman.
Comment 10 Søren Sandmann Pedersen 2011-02-17 05:35:22 UTC
> Interesting.  Are you sure you're using the cairo backend?  (You can check if
> you start the program with --GNU-Debug=dflt.)

I followed your instructions, but no it does not appear to be using the cairo backend. As far as I can tell, that is not available for Fedora 14.

To move forward here, we need a *self-contained* application demonstrating a problem with pixman.

I am not going to debug GNUstep.
Comment 11 Yavor Doganov 2011-02-18 11:41:47 UTC
(In reply to comment #10)
> To move forward here, we need a *self-contained* application demonstrating a
> problem with pixman.

Sounds fair.  Please give me some time to investigate the issue from the ground up -- I'm not convinced at all it's a pixman bug; I reported it against pixman only because of the observation that downgrading fixes the problem.
 
> I am not going to debug GNUstep.

Of course.

BTW, your test program does not follow the scenario of GNUstep programs.  GNUstep apps link against libgnustep-gui (which is something like GTK+ but with the backend in a *separate shared object*).  When an application is being started, libgnustep-gui's NSApplication class dynamically loads the GUI backend at runtime using the NSBundle class (done under the hood with dlopen).  Only that particular backend is linked _indirectly_ with libpixman.

Either way, you're right that the __thread usage in recent pixman releases should not be a problem, especially on GNU platforms.
Comment 12 Cyril Brulebois 2011-02-24 05:36:40 UTC
> I'm not convinced at all it's a pixman bug; I reported it against pixman
> only because of the observation that downgrading fixes the problem.

OK for my downgrading the Debian bug's severity to “important” then?
Comment 13 Yavor Doganov 2011-02-24 07:14:56 UTC
(In reply to comment #12)
> OK for my downgrading the Debian bug's severity to “important” then?

Yes, I guess at this point of the cycle it's the lesser evil to have GNUstep broken in testing rather than holding the migration of pixman, X, cairo and a growing number of packages.
Comment 14 Yavor Doganov 2011-03-02 08:38:28 UTC
FYI, I rewrote your test program to mimic the GNUstep behavior (program -> library -> module.so -> libpixman) but could not reproduce.  Then I thought it might be due to some improper usage of pixman within cairo (like http://cgit.freedesktop.org/cairo/commit/?id=71e8a4c23019b01aa43b334fcb2784c70daae9b5), but applying this commit against Debian's cairo version does not lead to reproducibility either.

Finally, I installed the latest pixman and cairo releases on a machine with an old GNUstep version (gNewSense DeltaD, which is based on Ubuntu Hardy), and to my surprise things are working flawlessly.  Which leads me to the horrible suspicion that the bug may lie in GNUstep (most probably somewhere in the NSLock rewrite) and is just being exposed by recent pixman releases...  Or it might be a toolchain issue that is not present on such an old system :-(

FWIW, changing the TLS model in the PIXMAN_DEFINE_THREAD_LOCAL macro, e.g.

  static __thread type name __attribute__((tls_model("local-exec")))

makes the bug go away ("global-dynamic" is implied by -fPIC, AFAIK).  Does this observation give you any clues?
Comment 15 Julien Cristau 2011-03-16 02:34:52 UTC
Seems likely to actually be due to mesa libGL using the initial-exec tls model where it shouldn't (bug#35268).
Comment 16 Søren Sandmann Pedersen 2011-04-05 10:09:25 UTC
(In reply to comment #15)
> Seems likely to actually be due to mesa libGL using the initial-exec tls model
> where it shouldn't (bug#35268).

That sounds likely to me, so I am closing this bug. Feel free to reopen if the problem turns out to be in pixman after all.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.