Created attachment 43426 [details]
(I reported this to Debian initially, but one of the maintainers said to report it upstream as well.)
All GNUstep GUI apps crash immediately on startup when pixman 0.18.0 or newer is installed (there is no problem 0.16.4). Example backtrace attached.
Another user said:
"It's caused by the thread-local fast_path_cache variable in pixman.c. If you make that non-thread-local (a normal static variable) the problem will go away.
The root problem here is interaction between thread local storage and dlopen, because the gnustep-back bundle, which dynamically links to libpixman, is dlopened by gnustep-gui. However, I'm not sure how to properly fix it other than building pixman without TLS."
Do you have a link to the debian bug?
Nevermind, it's at
(In reply to comment #2)
> Nevermind, it's at
Yes, sorry for not including the link in the first place.
It isn't clear to me that this is a pixman bug. __thread is definitely supposed to work with dlopen().
A self-contained application that demonstrates the issue would be very useful.
Created attachment 43449 [details]
Here is a simple "hello world" app that exibits the bug.
You can set the backend with `defaults write NSGlobalDomain GSBackend libgnustep-foo', where `foo' can be "art", "cairo" or "x11". The crash happens with cairo and pixman >= 0.18.0
Created attachment 43450 [details]
Makefile to compile the example program
Needs gnustep-make. Type `gs_make' (on Debian) or `make GNUSTEP_MAKEFILES=/path/to/GNUstep/makefiles' (usually /usr/share/GNUstep/Makefiles, but it's distro/installation specific).
By "self-contained" I meant "does not depend on anything outside of pixman".
For what it's worth, the attached program works here (Fedora 14), but it doesn't appear to use pixman either.
Created attachment 43482 [details]
Here is a program that opens pixman with dlopen() and does a bunch of compositing. It works here on x86-64 Fedora 14.
We are going to need some clearer evidence that this is a problem in pixman.
(In reply to comment #7)
> For what it's worth, the attached program works here (Fedora 14),
Interesting. Are you sure you're using the cairo backend? (You can check if you start the program with --GNU-Debug=dflt.)
> but it doesn't appear to use pixman either.
Right, GNUstep apps do not use pixman, neither does gnustep-back-cairo. It just happens to link indirectly with pixman.
> Interesting. Are you sure you're using the cairo backend? (You can check if
> you start the program with --GNU-Debug=dflt.)
I followed your instructions, but no it does not appear to be using the cairo backend. As far as I can tell, that is not available for Fedora 14.
To move forward here, we need a *self-contained* application demonstrating a problem with pixman.
I am not going to debug GNUstep.
(In reply to comment #10)
> To move forward here, we need a *self-contained* application demonstrating a
> problem with pixman.
Sounds fair. Please give me some time to investigate the issue from the ground up -- I'm not convinced at all it's a pixman bug; I reported it against pixman only because of the observation that downgrading fixes the problem.
> I am not going to debug GNUstep.
BTW, your test program does not follow the scenario of GNUstep programs. GNUstep apps link against libgnustep-gui (which is something like GTK+ but with the backend in a *separate shared object*). When an application is being started, libgnustep-gui's NSApplication class dynamically loads the GUI backend at runtime using the NSBundle class (done under the hood with dlopen). Only that particular backend is linked _indirectly_ with libpixman.
Either way, you're right that the __thread usage in recent pixman releases should not be a problem, especially on GNU platforms.
> I'm not convinced at all it's a pixman bug; I reported it against pixman
> only because of the observation that downgrading fixes the problem.
OK for my downgrading the Debian bug's severity to “important” then?
(In reply to comment #12)
> OK for my downgrading the Debian bug's severity to “important” then?
Yes, I guess at this point of the cycle it's the lesser evil to have GNUstep broken in testing rather than holding the migration of pixman, X, cairo and a growing number of packages.
FYI, I rewrote your test program to mimic the GNUstep behavior (program -> library -> module.so -> libpixman) but could not reproduce. Then I thought it might be due to some improper usage of pixman within cairo (like http://cgit.freedesktop.org/cairo/commit/?id=71e8a4c23019b01aa43b334fcb2784c70daae9b5), but applying this commit against Debian's cairo version does not lead to reproducibility either.
Finally, I installed the latest pixman and cairo releases on a machine with an old GNUstep version (gNewSense DeltaD, which is based on Ubuntu Hardy), and to my surprise things are working flawlessly. Which leads me to the horrible suspicion that the bug may lie in GNUstep (most probably somewhere in the NSLock rewrite) and is just being exposed by recent pixman releases... Or it might be a toolchain issue that is not present on such an old system :-(
FWIW, changing the TLS model in the PIXMAN_DEFINE_THREAD_LOCAL macro, e.g.
static __thread type name __attribute__((tls_model("local-exec")))
makes the bug go away ("global-dynamic" is implied by -fPIC, AFAIK). Does this observation give you any clues?
Seems likely to actually be due to mesa libGL using the initial-exec tls model where it shouldn't (bug#35268).
(In reply to comment #15)
> Seems likely to actually be due to mesa libGL using the initial-exec tls model
> where it shouldn't (bug#35268).
That sounds likely to me, so I am closing this bug. Feel free to reopen if the problem turns out to be in pixman after all.