Summary: | libpthread-stubs must NOT provide symbols which depend on actual implementation | ||
---|---|---|---|
Product: | XCB | Reporter: | Ian Romanick <idr> |
Component: | Library | Assignee: | xcb mailing list dummy <xcb> |
Status: | RESOLVED FIXED | QA Contact: | xcb mailing list dummy <xcb> |
Severity: | normal | ||
Priority: | medium | CC: | ben, chris, jfonseca, jon.turney, mattst88, psychon, robclark |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Ian Romanick
2016-10-04 23:27:37 UTC
I'm slightly confused by the whole thing. On Linux platforms (and modern Solaris) building libpthread-stubs results is a plain .pc file, and no DSO being built/installed. Regardless, the library is used only by libgbm, which based on your waffle/mixed_glx_egl will/should never get loaded. That said, IMHO it would be great if glibc provides a native C11 threads implementation (like musl) or alternatively we could drop the single recursive mutex from mesa. Regardless, can you share a bit more on the topic - do you use LD_PRELOAD/LD_LIBRARY_PATH, is this specific to local builds and/or distribution ones ? Actually seems like libpthreads-stub has decided to pick the hacky patch, despite my suggestion not to :-\ Rob, Ben, you recall when I said it is not a good idea and it will cause grief ;-) Well here it is (ableit not sure what/how exactly libpthread-stubs.so ends up in Ian's link chain) Gents, can we revert fa6db2f9c018c54a47e94c0175450303d700aa92 ? (In reply to Emil Velikov from comment #2) > Actually seems like libpthreads-stub has decided to pick the hacky patch, > despite my suggestion not to :-\ > > Rob, Ben, you recall when I said it is not a good idea and it will cause > grief ;-) Can you remind me of the conversation? Pthread stubs are just stubs, I'm confused why it's a bad idea to add stubbed symbols which are actually required to build. If you need the functionality, and it's not provided by libc, I don't see how not having a stubbed function will save you. > Well here it is (ableit not sure what/how exactly libpthread-stubs.so ends > up in Ian's link chain) > > Gents, can we revert fa6db2f9c018c54a47e94c0175450303d700aa92 ? I'm fine with revert as long as things still build. I'd like to understand things a bit better first. (In reply to Emil Velikov from comment #2) > Actually seems like libpthreads-stub has decided to pick the hacky patch, > despite my suggestion not to :-\ > > Rob, Ben, you recall when I said it is not a good idea and it will cause > grief ;-) > Well here it is (ableit not sure what/how exactly libpthread-stubs.so ends > up in Ian's link chain) > > Gents, can we revert fa6db2f9c018c54a47e94c0175450303d700aa92 ? So I'm a bit confused on how things actually worked before without pthread_mutexattr_settype().. That said, I've no opinion on reverting the libpthreads-stub commit as long as it doesn't break mesa build with various compilers.. iirc that was the motivation in the first place. I believe Ian's explanation perfectly reasonable, but let me try from another angle: libpthreads-stub must _not_ provide stub implementation for APIs that absolutely requires locking/other. Would that be pthread_mutexattr_settype or any other. IIRC a modern GLIBC/Solaris already cover what can be stubbed. Everything else simply requires an actual implementation. In here's example: mesa's dri modules depend on setting up a recursive mutex, which is something that cannot be ignored/stubbed. mesa has been 'fixed' before the libpthreads-stub patch landed. If there's an issue there, do poke me. (In reply to Emil Velikov from comment #5) > I believe Ian's explanation perfectly reasonable, but let me try from > another angle: My confusion wasn't about the problem of mixing/matching pthreads-stub and real pthreads fxns.. that is completely obvious. The confusion was about how it worked at all before when pthread_mutexattr_init()/etc wasn't provided, and if the libc doesn't provide a recursive mutex feature. > libpthreads-stub must _not_ provide stub implementation for APIs that > absolutely requires locking/other. Would that be pthread_mutexattr_settype > or any other. or pthread_mutex_lock()/unlock() for that matter ;-) pthreads-stub should only be used where you don't actually have *any* threads.. so in that regard, stubbing pthread_mutexattr_settype() is completely legit. The problem *sounds* like we somehow have a libc that provides *part* of the pthreads API, but not all, so you end up mixing stub and real implementation? But if that is the case, doesn't that mean that mutexes end up not being recursive, which sounds like a bad thing.. at least this is what I am reading between the lines here. I'm not entirely sure which libc we are talking about here. I have been building xcb and associated bits from source. pthreads-stubs is fa6db2f9, and libxcb is 65b298c7. I only recently updated to fa6db2f9 when I started monkeying about with NV20 and softpipe, and I don't think I ever observed these problems on i965 previously. When I build and install them, pthreads-stubs.pc contains: prefix=/opt/xorg-master-x86_64 exec_prefix=${prefix} libdir=/opt/xorg-master-x86_64/lib64 Name: pthread stubs Description: Stubs missing from libc for standard pthread functions Version: 0.3 Libs: -L${libdir} -lpthread-stubs Mesa picks pthreads-stubs up both directly via voodoo in configure.ac: dnl pthread-stubs is mandatory on targets where it exists case "$host_os" in cygwin* ) pthread_stubs_possible="no" ;; * ) pthread_stubs_possible="yes" ;; esac if test "x$pthread_stubs_possible" = xyes; then PKG_CHECK_MODULES(PTHREADSTUBS, pthread-stubs) AC_SUBST(PTHREADSTUBS_CFLAGS) AC_SUBST(PTHREADSTUBS_LIBS) fi and indirectly via libxcb.pc. prefix=/opt/xorg-master-x86_64 exec_prefix=${prefix} libdir=/opt/xorg-master-x86_64/lib64 includedir=${prefix}/include xcbproto_version=1.12 Name: XCB Description: X-protocol C Binding Version: 1.12 Requires.private: pthread-stubs xau >= 0.99.2 xdmcp Libs: -L${libdir} -lxcb Libs.private: Cflags: -I${includedir} I tried removing the bits from Mesa's configure.ac, but that didn't seem to have any affect. My system has glibc-2.22-18.fc23.x86_64, and that has: $ nm /lib64/libc.so.6 | grep 'T pthread' 000000000010fae0 T pthread_attr_destroy 000000000010fb40 T pthread_attr_getdetachstate 000000000010fba0 T pthread_attr_getinheritsched 000000000010fc00 T pthread_attr_getschedparam 000000000010fc60 T pthread_attr_getschedpolicy 000000000010fcc0 T pthread_attr_getscope 000000000010fb10 T pthread_attr_init@@GLIBC_2.2.5 000000000010fb70 T pthread_attr_setdetachstate 000000000010fbd0 T pthread_attr_setinheritsched 000000000010fc30 T pthread_attr_setschedparam 000000000010fc90 T pthread_attr_setschedpolicy 000000000010fcf0 T pthread_attr_setscope 000000000010fd20 T pthread_condattr_destroy 000000000010fd50 T pthread_condattr_init 000000000013e8d0 T pthread_cond_broadcast@GLIBC_2.2.5 000000000010fd80 T pthread_cond_broadcast@@GLIBC_2.3.2 000000000013e900 T pthread_cond_destroy@GLIBC_2.2.5 000000000010fdb0 T pthread_cond_destroy@@GLIBC_2.3.2 000000000013e930 T pthread_cond_init@GLIBC_2.2.5 000000000010fde0 T pthread_cond_init@@GLIBC_2.3.2 000000000013e960 T pthread_cond_signal@GLIBC_2.2.5 000000000010fe10 T pthread_cond_signal@@GLIBC_2.3.2 000000000013e9c0 T pthread_cond_timedwait@GLIBC_2.2.5 000000000010fe70 T pthread_cond_timedwait@@GLIBC_2.3.2 000000000013e990 T pthread_cond_wait@GLIBC_2.2.5 000000000010fe40 T pthread_cond_wait@@GLIBC_2.3.2 000000000010fab0 T pthread_equal 000000000010fea0 T pthread_exit 000000000010fed0 T pthread_getschedparam 000000000010ff30 T pthread_mutex_destroy 000000000010ff60 T pthread_mutex_init 000000000010ff90 T pthread_mutex_lock 000000000010ffc0 T pthread_mutex_unlock 000000000010fff0 T pthread_self 0000000000110050 T pthread_setcanceltype 000000000010ff00 T pthread_setschedparam I can try reverting fa6db2f9 tomorrow. (In reply to Rob Clark from comment #6) > (In reply to Emil Velikov from comment #5) > > libpthreads-stub must _not_ provide stub implementation for APIs that > > absolutely requires locking/other. Would that be pthread_mutexattr_settype > > or any other. > > or pthread_mutex_lock()/unlock() for that matter ;-) > > pthreads-stub should only be used where you don't actually have *any* > threads.. so in that regard, stubbing pthread_mutexattr_settype() is > completely legit. > > The problem *sounds* like we somehow have a libc that provides *part* of the > pthreads API, but not all, so you end up mixing stub and real > implementation? This, exactly. If pthread-stubs provides all the stubs for pthread mutexes, then it makes sense for pthread-stubs to provide a stub for pthread_mutexattr_settype, which can always succeed at setting a mutex as recursive, because you can lock and unlock it all you want. However, if you actually have a real non-stub implementation of pthread mutexes, but that implementation doesn't have pthread_mutexattr_settype, pthread-stubs *must not* provide a stub that always succeeds, because then a program would expect that it has successfully set a mutex as recursive, and then deadlock. (As well as the "uninitialized data" issue.) Is Mesa linking against the libc6 stubs, and then running in the single-threaded case? Or does it somehow end up running against the real libpthread but also calling pthread-stubs functions? (In reply to Ian Romanick from comment #7) > My system has glibc-2.22-18.fc23.x86_64, and that has: > > $ nm /lib64/libc.so.6 | grep 'T pthread' > 000000000010fae0 T pthread_attr_destroy > 000000000010fb40 T pthread_attr_getdetachstate > 000000000010fba0 T pthread_attr_getinheritsched > 000000000010fc00 T pthread_attr_getschedparam > 000000000010fc60 T pthread_attr_getschedpolicy > 000000000010fcc0 T pthread_attr_getscope > 000000000010fb10 T pthread_attr_init@@GLIBC_2.2.5 > 000000000010fb70 T pthread_attr_setdetachstate > 000000000010fbd0 T pthread_attr_setinheritsched > 000000000010fc30 T pthread_attr_setschedparam > 000000000010fc90 T pthread_attr_setschedpolicy > 000000000010fcf0 T pthread_attr_setscope > 000000000010fd20 T pthread_condattr_destroy > 000000000010fd50 T pthread_condattr_init > 000000000013e8d0 T pthread_cond_broadcast@GLIBC_2.2.5 > 000000000010fd80 T pthread_cond_broadcast@@GLIBC_2.3.2 > 000000000013e900 T pthread_cond_destroy@GLIBC_2.2.5 > 000000000010fdb0 T pthread_cond_destroy@@GLIBC_2.3.2 > 000000000013e930 T pthread_cond_init@GLIBC_2.2.5 > 000000000010fde0 T pthread_cond_init@@GLIBC_2.3.2 > 000000000013e960 T pthread_cond_signal@GLIBC_2.2.5 > 000000000010fe10 T pthread_cond_signal@@GLIBC_2.3.2 > 000000000013e9c0 T pthread_cond_timedwait@GLIBC_2.2.5 > 000000000010fe70 T pthread_cond_timedwait@@GLIBC_2.3.2 > 000000000013e990 T pthread_cond_wait@GLIBC_2.2.5 > 000000000010fe40 T pthread_cond_wait@@GLIBC_2.3.2 > 000000000010fab0 T pthread_equal > 000000000010fea0 T pthread_exit > 000000000010fed0 T pthread_getschedparam > 000000000010ff30 T pthread_mutex_destroy > 000000000010ff60 T pthread_mutex_init > 000000000010ff90 T pthread_mutex_lock > 000000000010ffc0 T pthread_mutex_unlock > 000000000010fff0 T pthread_self > 0000000000110050 T pthread_setcanceltype > 000000000010ff00 T pthread_setschedparam > > I can try reverting fa6db2f9 tomorrow. Interesting.. I see the same thing. So just fired up glxgears in gdb instead and set a breakpoint on pthread_mutexattr_settype(). Turns out some of the pthreads syms are coming from libc and some from libpthread-2.23.so. Not entirely sure the reasoning for that. Is it possibly you have the libc part but not libpthreads, so you end up mixing libc's partial pthreads w/ pthread-stubs instead of libpthreads.so? Hello, Yes, it is expected that pthread calls go through libc and libpthread. libc's stubs actually include a check to know whether libpthread got loaded or not. In the former case, it calls the libpthread function, and thus the end result is just as if you called libpthread directly Samuel (In reply to Rob Clark from comment #6) > (In reply to Emil Velikov from comment #5) > > I believe Ian's explanation perfectly reasonable, but let me try from > > another angle: > > My confusion wasn't about the problem of mixing/matching pthreads-stub and > real pthreads fxns.. that is completely obvious. > > The confusion was about how it worked at all before when > pthread_mutexattr_init()/etc wasn't provided, and if the libc doesn't > provide a recursive mutex feature. > My take is that an actual implementation (likely libpthread.so) is/was used throughout. At least it should have been ;-) > > libpthreads-stub must _not_ provide stub implementation for APIs that > > absolutely requires locking/other. Would that be pthread_mutexattr_settype > > or any other. > > or pthread_mutex_lock()/unlock() for that matter ;-) > Almost. If you don't require actual locking (!recursive + single thread) they can be empty stubs. > pthreads-stub should only be used where you don't actually have *any* > threads.. Agree. > so in that regard, stubbing pthread_mutexattr_settype() is > completely legit. > Not really :-\ AFAICT setting/changing the mutexattr implies that one requires actual locking. > The problem *sounds* like we somehow have a libc that provides *part* of the > pthreads API, but not all, so you end up mixing stub and real > implementation? But if that is the case, doesn't that mean that mutexes end > up not being recursive, which sounds like a bad thing.. at least this is > what I am reading between the lines here. I'm not entirely sure which libc > we are talking about here. Both libc and libpthreads-stub provide weak symbols for some of the 'safe' pthread API. In practise the libpthread.so symbols should override them and I'm not 100% sure how we end up using the libpthreads-stubs pthread_mutexattr_settype(), which in itself causes the problem. (In reply to Rob Clark from comment #9) > (In reply to Ian Romanick from comment #7) > > My system has glibc-2.22-18.fc23.x86_64, and that has: > > > > $ nm /lib64/libc.so.6 | grep 'T pthread' > > 000000000010fae0 T pthread_attr_destroy > > 000000000010fb40 T pthread_attr_getdetachstate > > 000000000010fba0 T pthread_attr_getinheritsched > > 000000000010fc00 T pthread_attr_getschedparam > > 000000000010fc60 T pthread_attr_getschedpolicy > > 000000000010fcc0 T pthread_attr_getscope > > 000000000010fb10 T pthread_attr_init@@GLIBC_2.2.5 > > 000000000010fb70 T pthread_attr_setdetachstate > > 000000000010fbd0 T pthread_attr_setinheritsched > > 000000000010fc30 T pthread_attr_setschedparam > > 000000000010fc90 T pthread_attr_setschedpolicy > > 000000000010fcf0 T pthread_attr_setscope > > 000000000010fd20 T pthread_condattr_destroy > > 000000000010fd50 T pthread_condattr_init > > 000000000013e8d0 T pthread_cond_broadcast@GLIBC_2.2.5 > > 000000000010fd80 T pthread_cond_broadcast@@GLIBC_2.3.2 > > 000000000013e900 T pthread_cond_destroy@GLIBC_2.2.5 > > 000000000010fdb0 T pthread_cond_destroy@@GLIBC_2.3.2 > > 000000000013e930 T pthread_cond_init@GLIBC_2.2.5 > > 000000000010fde0 T pthread_cond_init@@GLIBC_2.3.2 > > 000000000013e960 T pthread_cond_signal@GLIBC_2.2.5 > > 000000000010fe10 T pthread_cond_signal@@GLIBC_2.3.2 > > 000000000013e9c0 T pthread_cond_timedwait@GLIBC_2.2.5 > > 000000000010fe70 T pthread_cond_timedwait@@GLIBC_2.3.2 > > 000000000013e990 T pthread_cond_wait@GLIBC_2.2.5 > > 000000000010fe40 T pthread_cond_wait@@GLIBC_2.3.2 > > 000000000010fab0 T pthread_equal > > 000000000010fea0 T pthread_exit > > 000000000010fed0 T pthread_getschedparam > > 000000000010ff30 T pthread_mutex_destroy > > 000000000010ff60 T pthread_mutex_init > > 000000000010ff90 T pthread_mutex_lock > > 000000000010ffc0 T pthread_mutex_unlock > > 000000000010fff0 T pthread_self > > 0000000000110050 T pthread_setcanceltype > > 000000000010ff00 T pthread_setschedparam > > > > I can try reverting fa6db2f9 tomorrow. > > Interesting.. I see the same thing. So just fired up glxgears in gdb > instead and set a breakpoint on pthread_mutexattr_settype(). Turns out some > of the pthreads syms are coming from libc and some from libpthread-2.23.so. > Not entirely sure the reasoning for that. > > Is it possibly you have the libc part but not libpthreads, so you end up > mixing libc's partial pthreads w/ pthread-stubs instead of libpthreads.so? Sure that list is pretty normal/constant. Yet you don't see the issues that Ian has reported, do you ? As a simple check - build libpthreads-stub before/after the offending commit and rebuild libxcb/mesa against it. In the former case there will be no libpthread-stub.so while in the latter there will be one. Then at runtime you'll likely end up using the libc/pthreads/pthreads-stub libraries leading to fun experience. (In reply to Emil Velikov from comment #11) > (In reply to Rob Clark from comment #6) > > so in that regard, stubbing pthread_mutexattr_settype() is > > completely legit. > > > Not really :-\ AFAICT setting/changing the mutexattr implies that one > requires actual locking. Well, to be super-anal, pthreads-stub should perhaps flag an error or assert or something for non-recursive mutex if recursively acquired. But I guess no one has cared about that so far. It doesn't help that the default mutex type can be (but doesn't have to be) recursive. But I don't see how setting mutexattr implies that you require locking in a single-threaded environment any more than pthread_mutex_lock() does. So I still think stubbing the mutexattr fxns is not wrong.. but I guess there is an unfortunate interaction with how pthread syms that aren't in libc get resolved? Maybe reverting the stubs is more convenient than fixing glibc if we can't figure out how to get the symbols from libpthread before pthread-stubs. That said, all I have of pthreads-stubs in my filesystem is the .pc file, which I guess explains why I am not seeing this issue? (In reply to Josh Triplett from comment #8) > Is Mesa linking against the libc6 stubs, and then running in the > single-threaded case? Or does it somehow end up running against the real > libpthread but also calling pthread-stubs functions? As far as I can tell, it is single threaded, linked with libpthread (according to ldd on the piglit test in question), and hitting pthread-stubs functions. The Valgrind stack trace in comment #0 shows /usr/lib64/libpthread-2.22.so. Reverting fa6db2f9 seems to have fixed the problems in softpipe and NV20. (In reply to Rob Clark from comment #13) > (In reply to Emil Velikov from comment #11) > > (In reply to Rob Clark from comment #6) > > > so in that regard, stubbing pthread_mutexattr_settype() is > > > completely legit. > > > > > Not really :-\ AFAICT setting/changing the mutexattr implies that one > > requires actual locking. > > Well, to be super-anal, pthreads-stub should perhaps flag an error or assert > or something for non-recursive mutex if recursively acquired. But I guess > no one has cared about that so far. It doesn't help that the default mutex > type can be (but doesn't have to be) recursive. > > But I don't see how setting mutexattr implies that you require locking in a > single-threaded environment any more than pthread_mutex_lock() does. So I > still think stubbing the mutexattr fxns is not wrong.. but I guess there is > an unfortunate interaction with how pthread syms that aren't in libc get > resolved? Maybe reverting the stubs is more convenient than fixing glibc if > we can't figure out how to get the symbols from libpthread before > pthread-stubs. > Yes, we can add a lot more error checking/alike and make things more robust. By then they won't be plain stubs though ;-) If we consider that mesa (and maybe others) do zero error checking for pthread_mutexattr_* and pthread_mutex_init and others, it smells like accident waiting to happen. Guess we should fix that, one of these days as well. > That said, all I have of pthreads-stubs in my filesystem is the .pc file, > which I guess explains why I am not seeing this issue? Pretty much. If you/Ben rebuild pthreads-stubs (and packages which depend on it) you'll see issues similar to the ones Ian reported. Let's imagine for a second that all the stuff I said so far is complete BS ;-) Have we considered why glibc does _not_ expose the new symbols while it does the fa6db2f9c01~1 ones ? Surely they know a thing or two about libc.so and libpthread.so and made the conscious decision to not provide pthread_mutexattr_* by the former. Uli can we please revert fa6db2f9c01 - it's been confirmed to cause problems ? (In reply to Emil Velikov from comment #16) > Uli can we please revert fa6db2f9c01 - it's been confirmed to cause problems? Not really, sorry. So as I read it, this bug report says we are in the following situation: - libpthread is loaded - libpthread-stubs is loaded - Yet, the stub from libpthread-stubs is called instead of the function from libpthread None of this sounds specific to pthread_mutexattr_{init,settype}. Why would removing these function from libpthread-stubs fix things and not hide the brokeness? Put differently: Could the same problem also occur with other symbols that are stubbed? Put differently: I'd like to understand the problem before I try to solve it. Just saw your IRC poke, barring the handles. For the future - myself xexaxo, Rob robclark, and Ian idr. $ nm -CD --defined-only /usr/lib/libpthread-2.24.so | grep pthread_mutexattr_settype 000000000000bc30 T __pthread_mutexattr_settype 000000000000bc30 T pthread_mutexattr_settype [emil@arch-x1c3 mesa]$ nm -CD --defined-only /usr/lib/libc-2.24.so | grep pthread_mutexattr_settype | wc -l 0 Dropping the symbols will fix the "brokenness" since then the pthread-stubs will expand to a plain .pc file (no DSO), thus only libc/libpthread will be used. In practise all Linux distros will end up like ^^ since glibc and musl have provided the fa6db2f9c01~1 symbols for quite a few years. IIRC even latest Solaris does so. Thinking about the days, where libc did not provide the stubs and fa6db2f9c01 were not around - things were bound to break in an identical way. For example: Foo uses the fa6db2f9c01~1 pthreads API and links against pthreads-stubs. Be that directly or indirectly by/in one of its dependencies. Foo uses the stubs and then dlopens libbar2. The latter links against libpthread and the weak pthread-stubs symbols are overwritten. At this point we'll get issues similar to the one described. If we are to fix this: a) in foo - this means that one will need to link against full blown pthreads, even though it does not want/need it. That is because foo cannot know if the modules that it is going to use will pull pthreads, pthread-stubs or neither. b) in pthread-stubs - one will end up re-implementing most of glibc libc<>libpthread heuristics. So if anything I'd consider making things blindingly obvious that if runtime X doesn't provide pthread-stubs behaviour things can break at any point. Thus suggest a pthreads link for the very odd platform. /me goes to add error messages for libdrm/mesa if $pkg-config --libs pthread-stubs is non-null If comment 18 is hard to parse things are explained in a (slightly) different light in https://lists.freedesktop.org/archives/dri-devel/2016-October/122494.html One thing I'm thinking of, as an alternative to doing the ^^ in every project. Have pthread-stubs cflags/libs default to the normal pthread ones, if libc doesn't provide the stubs. Opinions, ideas ? *** Bug 98532 has been marked as a duplicate of this bug. *** (In reply to Emil Velikov from comment #18) > Foo uses the fa6db2f9c01~1 pthreads API and links against pthreads-stubs. Be > that directly or indirectly by/in one of its dependencies. > > Foo uses the stubs and then dlopens libbar2. The latter links against > libpthread and the weak pthread-stubs symbols are overwritten. > > At this point we'll get issues similar to the one described. Ah, dlopen(). Ok, that does indeed explain the problems that everyone is seeing. However, my memory warns me about dlopen()ing libpthread and random google results say that it doesn't work too well, e.g. https://freetz.org/ticket/819 and https://sourceware.org/bugzilla/show_bug.cgi?id=16628. However, only on Linux is this actually an issue with the patch that should be reverted. On other patches, other things will break. Since you seem to be trying to fix this in pthread-stubs (thanks for those patches!), I'll stop complaining about dlopen()ing libpthreads now. Since the problematic commit was reverted/pthread-stubs reworked we can close this. Nobody should see this kind of issues - be that on Linux or other platforms. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.