Bug 35268 - initial-exec TLS model breaks dlopen'ed libGL
Summary: initial-exec TLS model breaks dlopen'ed libGL
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: git
Hardware: Other Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact:
URL:
Whiteboard:
Keywords:
: 108165 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-03-13 04:46 UTC by Christopher James Halse Rogers
Modified: 2019-09-18 20:22 UTC (History)
9 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Source and Makefile for trivial SIGSEGV reproducer. (1.04 KB, application/zip)
2011-03-13 04:46 UTC, Christopher James Halse Rogers
Details
Partial patch (303.87 KB, patch)
2011-03-13 05:00 UTC, Christopher James Halse Rogers
Details | Splinter Review
Switch to general-dynamic TLS model (303.12 KB, patch)
2011-03-23 23:02 UTC, Christopher James Halse Rogers
Details | Splinter Review
gltest.c (2.40 KB, text/plain)
2014-01-29 09:06 UTC, Natanael Copa
Details
app.c (349 bytes, text/plain)
2014-01-29 09:07 UTC, Natanael Copa
Details
Makefile (297 bytes, text/plain)
2014-01-29 09:08 UTC, Natanael Copa
Details
dso.c (201 bytes, text/plain)
2014-01-29 10:57 UTC, Natanael Copa
Details
updated app.c (741 bytes, text/plain)
2014-01-29 11:00 UTC, Natanael Copa
Details
updated Makefile (349 bytes, text/plain)
2014-01-29 11:00 UTC, Natanael Copa
Details
Patches for musl compatability (7.44 KB, patch)
2018-10-13 07:07 UTC, Sora Lee
Details | Splinter Review
dlopen_test.c (217 bytes, text/plain)
2018-10-13 07:09 UTC, Sora Lee
Details
Patches for musl compatibility (7.44 KB, application/x-zip-compressed)
2018-10-13 07:16 UTC, Sora Lee
Details

Description Christopher James Halse Rogers 2011-03-13 04:46:46 UTC
Created attachment 44411 [details]
Source and Makefile for trivial SIGSEGV reproducer.

Mesa's use of the initial-exec TLS model unpredictably breaks usage of libGL by programs for which the executable isn't directly linked to libGL (so, python apps, mono apps, C apps which use dlopen, etc).

This manifests as a SIGSEGV in libstdc++ when certain conditions are met (stream output and exception handling seem to be examples of this).

Attached is a trivial example program which demonstrates the crash.  (Thanks to Ulrich von Zadow on the associated Ubuntu bug²).

According to the linux TLS ABI reference¹ I could find, initial-exec implies a the static TLS model, which doesn't work for dynamically loaded modules.  However, checking out the dl-tls.c code in eglibc there seems to be some attempt to handle this.  Also, the TLS ABI reference suggests that __tls_get_addr receives its parameter in %eax, but this doesn't appear to be the case in dl-tls.c.

¹: http://www.akkadia.org/drepper/tls.pdf 
²: https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/259219
Comment 1 Christopher James Halse Rogers 2011-03-13 05:00:28 UTC
Created attachment 44412 [details] [review]
Partial patch

Attached is a partial patch - this works for x86-64 (there's probably some optimisation possible with the register save/restore here) and when assembly is disabled, but blows up on IA32 where I still haven't quite grokked the ABI.

x86-64 users and those without assembly should be able to verify that the the test-case fails to crash with this patch applied (and, indeed, libGL no longer has the STATIC_TLS DT_FLAG applied).
Comment 2 Christopher James Halse Rogers 2011-03-23 23:02:34 UTC
Created attachment 44777 [details] [review]
Switch to general-dynamic TLS model
Comment 3 Tom Fogal 2011-04-26 10:25:37 UTC
It looks like init_glapi_relocs is an empty function after this patch.  Can it just be removed, then?

Is there hand-coded asm for other architectures (i.e. not just x86[-64]) as well?  I have not checked.  However, I see nothing about them in this patch, and if such assembly exists it should be patched as well, no?

I am concerned about calling functions such as __tls_get_addr in an unguarded fashion.  These TLS functions/support are language extensions and are not required to exist for all targets; furthermore they definitely will not exist for non-ELF backends (right?).  How prevalent are platforms that implement ELF but not these TLS extensions (OpenBSD?), as well as platforms such as Mac which don't even use ELF, AFAIK.
Comment 4 Julien Cristau 2011-04-26 14:36:31 UTC
> --- Comment #3 from Tom Fogal <tfogal@alumni.unh.edu> 2011-04-26 10:25:37 PDT ---
> Is there hand-coded asm for other architectures (i.e. not just x86[-64]) as
> well?  I have not checked.  However, I see nothing about them in this patch,
> and if such assembly exists it should be patched as well, no?
> 
There's sparc.
Comment 5 Nicholas Miell 2011-09-20 14:15:18 UTC
Mesa is explicitly allowed to use intial-exec, even when dlopen()ed.
See: http://www.redhat.com/archives/phil-list/2003-February/msg00077.html
Comment 6 Nicholas Miell 2011-09-20 15:15:58 UTC
Also, the test case is wrong, the main executable is compiled PIC for no reason and the shared object is compiled without PIC, which barely works on i386 and won't even link on AMD64.

With a non-broken test case, I am unable to reproduce this bug when Mesa is built with GLX_USE_TLS on AMD64.
Comment 7 Fabio Pedretti 2012-03-30 03:05:21 UTC
This patch was dropped in Ubuntu: "no longer needed, now breaks things":
http://anonscm.debian.org/gitweb/?p=pkg-xorg/lib/mesa.git;a=commitdiff;h=5ba285e9ca2f6f6d8baba231c80dcd41c2f9d2a4

It could be closed here eventually.
Comment 8 Natanael Copa 2014-01-29 07:30:30 UTC
initial-exec TLS and dlopen'ed libGL is still an issue with musl libc. Here is a backtrace from a coredump of firefox:

0x664c2c6f5000
Core was generated by `/usr/lib/firefox-26.0/firefox'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000664c1fb519a8 in __glXSetupForCommand (dpy=0xae298156200)
    at glxext.c:917
917	glxext.c: No such file or directory.
(gdb) bt
#0  0x0000664c1fb519a8 in __glXSetupForCommand (dpy=0xae298156200)
    at glxext.c:917
#1  0x0000664c1fb4ffcd in glx_context_init (gc=gc@entry=0xae298165480, 
    psc=psc@entry=0xae298164d60, config=config@entry=0xae29816f3e0)
    at glxcmds.c:258
#2  0x0000664c1fb7b940 in dri2_create_context (base=0xae298164d60, 
    config_base=0xae29816f3e0, shareList=<optimized out>, renderType=32788)
    at dri2_glx.c:228
#3  0x0000664c1fb4f3ed in CreateContext (dpy=dpy@entry=0xae298156200, 
    generic_id=141, config=0xae29816f3e0, 
    shareList_user=shareList_user@entry=0x0, allowDirect=<optimized out>, 
    code=code@entry=3, renderType=32788, screen=0) at glxcmds.c:301
#4  0x0000664c1fb4f6e1 in glXCreateContext (dpy=0xae298156200, 
    vis=0xae2981651e0, shareList=0x0, allowDirect=1) at glxcmds.c:430
#5  0x0000664c28f95774 in ?? () from /usr/lib/firefox-26.0/xulrunner/libxul.so
#6  0x0000664c28f95918 in ?? () from /usr/lib/firefox-26.0/xulrunner/libxul.so
#7  0x0000664c28f8da28 in ?? () from /usr/lib/firefox-26.0/xulrunner/libxul.so
#8  0x0000664c28f90e76 in ?? () from /usr/lib/firefox-26.0/xulrunner/libxul.so
#9  0x0000664c28f91127 in XRE_main ()
   from /usr/lib/firefox-26.0/xulrunner/libxul.so
#10 0x00000ae2946dd30f in do_main (xreDirectory=0x664c2c16e440, 
    argv=0x733c6106a4c8, argc=1)
    at /home/buildozer/aports/main/firefox/src/mozilla-release/browser/app/nsBrowserApp.cpp:275
#11 main (argc=<optimized out>, argv=<optimized out>)
    at /home/buildozer/aports/main/firefox/src/mozilla-release/browser/app/nsBrowserApp.cpp:635
(gdb)

It is mesa 9.2.5, so the segfault happens in: http://cgit.freedesktop.org/mesa/mesa/tree/src/glx/glxext.c?h=9.2#n917

 gc = __glXGetCurrentContext();

I believe that __glxGetCurrentContext is defined as a macro:
http://cgit.freedesktop.org/mesa/mesa/tree/src/glx/glxcurrent.c?h=9.2#n82


I believe that it segfaults due to the use of initial-exec use:
__thread void *__glX_tls_Context __attribute__ ((tls_model("initial-exec")))
   = &dummyContext;
Comment 9 Natanael Copa 2014-01-29 08:13:53 UTC
I don't think musl libc intend to implement the special handling for mesa as described in http://www.redhat.com/archives/phil-list/2003-February/msg00077.html

I think they intend to implement gnu2 tls dialect though so it might be an idea to use that instead.
Comment 10 Alexander Monakov 2014-01-29 08:30:11 UTC
Nowadays nVidia- and AMD-supplied libGL.so's use initial-exec TLS as well (at least on 32-bit x86).  Does LD_PRELOAD'ing libGL.so.1 for Firefox work with musl?
Comment 11 Natanael Copa 2014-01-29 08:44:09 UTC
(In reply to comment #10)
> Nowadays nVidia- and AMD-supplied libGL.so's use initial-exec TLS as well
> (at least on 32-bit x86).

the closed source libGL.sos are linked to GNU libc and is not really expected to work on musl libc systems anyways.

>  Does LD_PRELOAD'ing libGL.so.1 for Firefox work with musl?

yes. That makes it work.
Comment 12 Natanael Copa 2014-01-29 09:06:29 UTC
Created attachment 92975 [details]
gltest.c

testcase. To reproduce, dlopen ./gltest.so and call gltest().
Comment 13 Natanael Copa 2014-01-29 09:07:35 UTC
Created attachment 92976 [details]
app.c

app that call gltest() in a dlopened gltest.so
Comment 14 Natanael Copa 2014-01-29 09:08:12 UTC
Created attachment 92977 [details]
Makefile

Makefile for app.c and gltest.c
Comment 15 Rich Felker 2014-01-29 10:12:20 UTC
Hi. I'm the maintainer of musl and I'd like to confirm that we don't have plans to work around this on our side. musl aims to support extremely minimal thread stacks if desired by the application, and reserving a large amount of additional space like glibc does for late-loaded initial-exec TLS is contrary to that goal. I believe it's already possible for users to "work around" this issue by LD_PRELOAD'ing the library so that it gets assigned initial-exec TLS space at startup, but this has symbol visibility issues if the application only intended to use RTLD_LOCAL at dlopen time; we may consider adding another environment variable like LD_PRELOAD but which preloads the library with RTLD_LOCAL-like semantics, but this would still be an ugly hack and not a proper solution. Note that, at least as I understand it, the current approach that "works" with glibc is already fragile; if another library with large TLS needs gets loaded via dlopen before libGL.so, this may cause libGL.so to fail to load/work.

What I'd like to propose as the solution is eliminating the use of __attribute__((tls_model("initial-exec"))) and instead using -mtls-dialect=gnu2 on targets where it's supported. The "gnu2" TLS dialect, documented at http://www.fsfla.org/~lxoliva/writeups/TLS/paper-gcc2006.pdf,  http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt, etc., allows libraries loaded at startup to achieve TLS access performance roughly equivalent to initial-exec despite using general-dynamic model. This level of performance is also possible via dlopen if the implementation is able to put the library's TLS in extra space reserved at startup (like glibc does in most cases). In all other cases, a variant of traditional general-dynamic model is used, so performance is sub-optimal, but it's still moderately better (IIRC, 15-30% less time for access) than the original general-dynamic model.

So in summary, using gnu2 and leaving off the tls_model attribute instead of using initial-exec gives:

- Essentially same performance when the library is pulled in at load time.
- Essentially same performance when the library is dlopen'd and there's extra initial TLS space that can be assigned to it.
- Working but with reduced performance, instead of complete failure, when the library is dlopen'd and there's no room in the initial TLS space to put its TLS there.

Note that musl does not yet support the gnu2 TLS dialect, so for the time being, a way to turn off both the initial-exec hack AND the gnu2 dialect would be needed to solve the problem. But it's on our agenda, and having this issue with GL fixed via gnu2 dialect would draw more attention to getting it added.
Comment 16 Natanael Copa 2014-01-29 10:56:17 UTC
(In reply to comment #15)
> Note that, at least as I understand
> it, the current approach that "works" with glibc is already fragile; if
> another library with large TLS needs gets loaded via dlopen before libGL.so,
> this may cause libGL.so to fail to load/work.

This should be fairly easy to test. If we add a dlopen of a dso that allocates slightly less than TLS_STATIC_SURPLUS before we dlopen the libGL we should trigger an dlopen error.
Comment 17 Natanael Copa 2014-01-29 10:57:14 UTC
Created attachment 92987 [details]
dso.c

test dso that allocates big enough TLS variable.
Comment 18 Natanael Copa 2014-01-29 11:00:15 UTC
Created attachment 92988 [details]
updated app.c

updated app.c that dlopens the dso.so before gltest.

This should trigger the issue on glibc too. (I tested it on uclibc which has an NPTL implementation that is a copy of old version of glibc and it triggers it there).
Comment 19 Natanael Copa 2014-01-29 11:00:54 UTC
Created attachment 92989 [details]
updated Makefile
Comment 20 Rich Felker 2014-10-23 18:21:31 UTC
Update from the musl libc side: we now have TLSDESC support for x86 and x86_64, so the situation would be much improved (with no performance regression) by using TLSDESC. Support for TLSDESC on ARM and other archs is still missing but should not be hard to add now that the framework is in place.
Comment 21 Nicholas Miell 2018-10-04 00:55:39 UTC
*** Bug 108165 has been marked as a duplicate of this bug. ***
Comment 22 Rich Felker 2018-10-09 19:01:18 UTC
Ping.

Looking at this again, I see the original proposed patch introduced a lot of extra asm to save/restore registers that might be clobbered by __tls_get_addr. That could all go away if tlsdesc support is assumed and used.

The patch also adds some dubious prefix bytes to instructions in _x86_64_get_dispatch. Is there any reason for that?
Comment 23 Sora Lee 2018-10-13 07:03:47 UTC
(In reply to Rich Felker from comment #22)
> Ping.
> 
> Looking at this again, I see the original proposed patch introduced a lot of
> extra asm to save/restore registers that might be clobbered by
> __tls_get_addr. That could all go away if tlsdesc support is assumed and
> used.
> 
> The patch also adds some dubious prefix bytes to instructions in
> _x86_64_get_dispatch. Is there any reason for that?

I believe the operand size override prefixes are necessary to use @TLSGD on the x86-64 architecture.

I've remade the original patches for compatibility with mesa 18.2.2 and implemented TLSDESC as you suggested using https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt as a guide, however I still seem to be getting issues with the _glapi_tls_Dispatch symbol. In particular, my test code (dlopen_test.c) throws:

dlopen failed with message 'Error relocating /usr/lib/libglapi.so: _glapi_tls_Dispatch: initial-exec TLS resolves to dynamic definition in /usr/lib/libglapi.so'
Comment 24 Sora Lee 2018-10-13 07:07:31 UTC
Created attachment 142010 [details] [review]
Patches for musl compatability

The first two patches in this archive are necessary for compatibility with musl but are not directly related to this issue; I've included them here for completeness.
Comment 25 Sora Lee 2018-10-13 07:09:30 UTC
Created attachment 142011 [details]
dlopen_test.c
Comment 26 Sora Lee 2018-10-13 07:16:53 UTC
Created attachment 142012 [details]
Patches for musl compatibility

(In reply to Sora Lee from comment #24)
> Created attachment 142010 [details] [review] [review]
> Patches for musl compatibility
> 
> The first two patches in this archive are necessary for compatibility with
> musl but are not directly related to this issue; I've included them here for
> completeness.

Apologies; I seem to have misconfigured my previous attachment.
Comment 27 Rich Felker 2018-11-07 21:05:42 UTC
A couple issues with latest patches by Sora Lee:

1. There is no need for __attribute__((__tls_model__("global-dynamic"))). It's rightly the default whenever it's needed by the ABI. The attribute should just be removed.

2. There's at least one more place where the invalid initial-exec asm is duplicated, in src/mapi/entry_x86{,-64}_tls.h.

3. Per the ABI, calling the TLSDESC function requires the stack pointer to be aligned mod 16 (as for all function calls). This might not break in practice but should be done.

I have a draft patch with all of these changes that seems to be working. Once I check it over again I'll post it here.

Perhaps more to the point, though, the asm seems largely useless and I think it should just be removed. Modern GCC generates exactly the same code for the C stubs as what you get by doing the TLSDESC access in asm, except for some functions that take char args where gcc performs a spurious zero/sign-extend operation on the argument regissters before tail-calling through the dispatch. Supposedly this is a known gcc bug that's going to be improved in gcc 9.
Comment 28 Rich Felker 2018-11-07 21:11:30 UTC
Note that removing the asm would also fix sparc, for which I don't feel qualified to fix the asm. It would also make it possible to support x86 targets that lack TLSDESC support at the ldso level (maybe some or all BSDs?) without forcing them to revert to the slow TSD key based fallback, since the compiler could just be allowed to use the less-efficient GD dialect rather than needing a variant of the asm for it.
Comment 29 Rich Felker 2018-11-08 18:12:11 UTC
It turns out there's more code that needs to be refitted or removed from src/mapi/entry_*_tls.h, including powerpc64le asm I wasn't aware of before. There's a lot of code to generate executable stubs at runtime, but such generation is only a significant optimization when the TLS offset from the thread pointer is constant, and the whole point of this PR is that that assumption is invalid. Otherwise, the fallback case in entry.c of just handing out slots from a big static pool should perform just as well.

Should I continue along the path of fixing the stub generation to use TLSDESC? Or just submit a match that removes all this stuff? On 32-bit x86 I think there's still a slight benefit of having it -- by generating the stubs dynamically, the absolute address of the GOT slots containing the TLS descriptor can be hard-coded in the stub, rather than requiring a fake call/pop to load the GOT address. But on x86_64 it should make no difference.
Comment 30 Rich Felker 2018-12-20 16:36:00 UTC
Given the lack of response on this issue, I think when I get some time to focus on it again I'll just prepare a patch removing all of the asm dispatch stubs.

As noted before, they have no benefits at all on x86_64. On other archs only the runtime-generated dispatch stubs possibly benefit from generating them by hand, and the same benefit should be obtainable just by building with textrels (omitting -fPIC), if the user is willing to accept what that entails.

As a bonus, if it turns out that upstream is unwilling to merge a fix for this issue, a patch that just removes the problematic code is easier to maintain out-of-tree than one that makes lots of small changes to the code.
Comment 31 GitLab Migration User 2019-09-18 20:22:35 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/966.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.