Bug 98048

Summary: libpthread-stubs must NOT provide symbols which depend on actual implementation
Product: XCB Reporter: Ian Romanick <idr>
Component: LibraryAssignee: xcb mailing list dummy <xcb>
Status: RESOLVED FIXED QA Contact: xcb mailing list dummy <xcb>
Severity: normal    
Priority: medium CC: ben, chris, jfonseca, jon.turney, mattst88, psychon, robclark
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Ian Romanick 2016-10-04 23:27:37 UTC
At least as far back as ecaa81b, Mesa has used pthread_mutexattr_init and pthread_mutexattr_settype.  Neither of these functions is available in libc6, so both are supplied by libpthread-stubs.  However, the implementations in libpthread-stubs don't do anything.  That's a huge problem because pthread_mutex_init expects that pthread_mutexattr_t to contain valid data.

I have observed two adverse affects of this in piglit runs on multiple systems.

1. On softpipe, there are numerous sporadic crashes inside malloc or free (see file:///home/idr/devel/graphics/piglit-results/results/problems.html).  There are also numerous valgrind warnings like:

==27195== Conditional jump or move depends on uninitialised value(s)
==27195==    at 0x99E82A2: pthread_mutex_init (in /usr/lib64/libpthread-2.22.so)
==27195==    by 0xBA9EA1D: mtx_init (threads_posix.h:217)
==27195==    by 0xBA9EA1D: _mesa_alloc_shared_state (shared.c:122)
==27195==    by 0xB9DA749: _mesa_initialize_context (context.c:1188)
==27195==    by 0xBB4478F: st_create_context (st_context.c:544)
==27195==    by 0xBB6C64D: st_api_create_context (st_manager.c:669)
==27195==    by 0xBCA6DD6: dri_create_context (dri_context.c:123)
==27195==    by 0xBCA63EE: driCreateContextAttribs (dri_util.c:448)
==27195==    by 0x52515E8: drisw_create_context_attribs (drisw_glx.c:476)
==27195==    by 0x522B5A2: glXCreateContextAttribsARB (create_context.c:78)
==27195==    by 0x6413353: ??? (in /usr/lib64/libwaffle-1.so.0.5.0)
==27195==    by 0x640F394: waffle_context_create (in /usr/lib64/libwaffle-1.so.0.5.0)
==27195==    by 0x4F72238: make_context_current_singlepass (piglit_wfl_framework.c:476)


2. On NV20 I have observed semi-random deadlocks from within meta.  I have a patch to work around these, but I now believe the real problem is the context shared texture mutex is not being created PTHREAD_MUTEX_RECURSIVE because pthread_mutexattr_settype does nothing.

piglit tests link with libpthread.  I have confirmed that the versions from libpthread-stubs are still called from within Mesa:

[idr@dynamic104 piglit]$ LIBGL_ALWAYS_SOFTWARE=y gdb --args bin/asmparsertest ARBvp1.0 tests/asmparsertest/shaders/ARBvp1.0/sne-02.txt -auto -fbo
GNU gdb (GDB) Fedora 7.10.1-31.fc23
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bin/asmparsertest...done.
(gdb) b pthread_mutexattr_init
Function "pthread_mutexattr_init" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (pthread_mutexattr_init) pending.
(gdb) r
Starting program: /home/idr/devel/graphics/piglit/bin/asmparsertest ARBvp1.0 /home/idr/devel/graphics/piglit/tests/asmparsertest/shaders/ARBvp1.0/sne-02.txt -auto -fbo
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-18.fc23.x86_64
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable

Breakpoint 1, __pthread_zero_stub () at stubs.c:203
203	}
(gdb) bt
#0  __pthread_zero_stub () at stubs.c:203
#1  0x00007ffff0fb6a06 in mtx_init (type=4, mtx=0x6fb9d8) at ../../include/c11/threads_posix.h:215
#2  _mesa_alloc_shared_state (ctx=ctx@entry=0x7ffff7fd0010) at main/shared.c:122
#3  0x00007ffff0ef274a in _mesa_initialize_context (ctx=ctx@entry=0x7ffff7fd0010, api=api@entry=API_OPENGL_COMPAT, visual=visual@entry=0x7fffffffd140, share_list=share_list@entry=0x0, driverFunctions=driverFunctions@entry=0x7fffffffcbe0) at main/context.c:1188
#4  0x00007ffff105c790 in st_create_context (api=api@entry=API_OPENGL_COMPAT, pipe=pipe@entry=0x63a2e0, visual=visual@entry=0x7fffffffd140, share=share@entry=0x0, options=options@entry=0x7fffffffd278) at state_tracker/st_context.c:544
#5  0x00007ffff108464e in st_api_create_context (stapi=<optimized out>, smapi=0x62ad00, attribs=0x7fffffffd250, error=0x7fffffffd24c, shared_stctxi=0x0) at state_tracker/st_manager.c:669
#6  0x00007ffff11bedd7 in dri_create_context (api=<optimized out>, visual=0x62f870, cPriv=<optimized out>, major_version=<optimized out>, minor_version=<optimized out>, flags=<optimized out>, notify_reset=false, error=0x7fffffffd3fc, sharedContextPrivate=0x0) at dri_context.c:123
#7  0x00007ffff11be3ef in driCreateContextAttribs (screen=0x62ac60, api=<optimized out>, config=<optimized out>, shared=<optimized out>, num_attribs=<optimized out>, attribs=<optimized out>, error=0x7fffffffd3fc, data=0x623070) at dri_util.c:448
#8  0x00007ffff77d05e9 in drisw_create_context_attribs (base=0x622f70, config_base=0x637040, shareList=<optimized out>, num_attribs=<optimized out>, attribs=<optimized out>, error=0x7fffffffd3fc) at drisw_glx.c:476
#9  0x00007ffff77aa5a3 in glXCreateContextAttribsARB (dpy=0x615390, config=0x637040, share_context=0x0, direct=1, attrib_list=0x7fffffffd450) at create_context.c:78
#10 0x00007ffff65fc354 in glx_context_create () from /lib64/libwaffle-1.so.0
#11 0x00007ffff65f8395 in waffle_context_create () from /lib64/libwaffle-1.so.0
#12 0x00007ffff7b3c239 in make_context_current_singlepass (wfl_fw=0x614c40, test_config=0x7fffffffdb50, flavor=CONTEXT_GL_COMPAT, partial_config_attrib_list=0x0) at tests/util/piglit-framework-gl/piglit_wfl_framework.c:476
#13 0x00007ffff7b3c4f1 in make_context_current (wfl_fw=0x614c40, test_config=0x7fffffffdb50, partial_config_attrib_list=0x0) at tests/util/piglit-framework-gl/piglit_wfl_framework.c:557
#14 0x00007ffff7b3c5ea in piglit_wfl_framework_init (wfl_fw=0x614c40, test_config=0x7fffffffdb50, platform=19, partial_config_attrib_list=0x0) at tests/util/piglit-framework-gl/piglit_wfl_framework.c:613
#15 0x00007ffff7b3b1ae in piglit_fbo_framework_create (test_config=0x7fffffffdb50) at tests/util/piglit-framework-gl/piglit_fbo_framework.c:151
#16 0x00007ffff7b216e2 in piglit_gl_framework_factory (test_config=0x7fffffffdb50) at tests/util/piglit-framework-gl/piglit_gl_framework.c:48
#17 0x00007ffff7b2125d in piglit_gl_test_run (argc=3, argv=0x7fffffffdc88, config=0x7fffffffdb50) at tests/util/piglit-framework-gl.c:191
#18 0x000000000040127e in main (argc=3, argv=0x7fffffffdc88) at tests/asmparsertest/asmparsertest.c:41
Comment 1 Emil Velikov 2016-10-06 16:01:08 UTC
I'm slightly confused by the whole thing.

On Linux platforms (and modern Solaris)  building libpthread-stubs results is a plain .pc file, and no DSO being built/installed. Regardless, the library is used only by libgbm, which based on your waffle/mixed_glx_egl will/should never get loaded.

That said, IMHO it would be great if glibc provides a native C11 threads implementation (like musl) or alternatively we could drop the single recursive mutex from mesa.

Regardless, can you share a bit more on the topic - do you use LD_PRELOAD/LD_LIBRARY_PATH, is this specific to local builds and/or distribution ones ?
Comment 2 Emil Velikov 2016-10-06 16:06:26 UTC
Actually seems like libpthreads-stub has decided to pick the hacky patch, despite my suggestion not to :-\

Rob, Ben, you recall when I said it is not a good idea and it will cause grief ;-)
Well here it is (ableit not sure what/how exactly libpthread-stubs.so ends up in Ian's link chain)

Gents, can we revert fa6db2f9c018c54a47e94c0175450303d700aa92 ?
Comment 3 Ben Widawsky 2016-10-06 16:26:27 UTC
(In reply to Emil Velikov from comment #2)
> Actually seems like libpthreads-stub has decided to pick the hacky patch,
> despite my suggestion not to :-\
> 
> Rob, Ben, you recall when I said it is not a good idea and it will cause
> grief ;-)

Can you remind me of the conversation? Pthread stubs are just stubs, I'm confused why it's a bad idea to add stubbed symbols which are actually required to build. If you need the functionality, and it's not provided by libc, I don't see how not having a stubbed function will save you.

> Well here it is (ableit not sure what/how exactly libpthread-stubs.so ends
> up in Ian's link chain)
> 
> Gents, can we revert fa6db2f9c018c54a47e94c0175450303d700aa92 ?

I'm fine with revert as long as things still build. I'd like to understand things a bit better first.
Comment 4 Rob Clark 2016-10-06 17:34:53 UTC
(In reply to Emil Velikov from comment #2)
> Actually seems like libpthreads-stub has decided to pick the hacky patch,
> despite my suggestion not to :-\
> 
> Rob, Ben, you recall when I said it is not a good idea and it will cause
> grief ;-)
> Well here it is (ableit not sure what/how exactly libpthread-stubs.so ends
> up in Ian's link chain)
> 
> Gents, can we revert fa6db2f9c018c54a47e94c0175450303d700aa92 ?

So I'm a bit confused on how things actually worked before without pthread_mutexattr_settype()..

That said, I've no opinion on reverting the libpthreads-stub commit as long as it doesn't break mesa build with various compilers.. iirc that was the motivation in the first place.
Comment 5 Emil Velikov 2016-10-06 18:02:19 UTC
I believe Ian's explanation perfectly reasonable, but let me try from another angle:

libpthreads-stub must _not_ provide stub implementation for APIs that absolutely requires locking/other. Would that be pthread_mutexattr_settype or any other.

IIRC a modern GLIBC/Solaris already cover what can be stubbed. Everything else simply requires an actual implementation.

In here's example: mesa's dri modules depend on setting up a recursive mutex, which is something that cannot be ignored/stubbed.

mesa has been 'fixed' before the libpthreads-stub patch landed. If there's an issue there, do poke me.
Comment 6 Rob Clark 2016-10-06 18:25:47 UTC
(In reply to Emil Velikov from comment #5)
> I believe Ian's explanation perfectly reasonable, but let me try from
> another angle:

My confusion wasn't about the problem of mixing/matching pthreads-stub and real pthreads fxns.. that is completely obvious.

The confusion was about how it worked at all before when pthread_mutexattr_init()/etc wasn't provided, and if the libc doesn't provide a recursive mutex feature.

> libpthreads-stub must _not_ provide stub implementation for APIs that
> absolutely requires locking/other. Would that be pthread_mutexattr_settype
> or any other.

or pthread_mutex_lock()/unlock() for that matter ;-)

pthreads-stub should only be used where you don't actually have *any* threads.. so in that regard, stubbing pthread_mutexattr_settype() is completely legit.

The problem *sounds* like we somehow have a libc that provides *part* of the pthreads API, but not all, so you end up mixing stub and real implementation?  But if that is the case, doesn't that mean that mutexes end up not being recursive, which sounds like a bad thing..  at least this is what I am reading between the lines here.  I'm not entirely sure which libc we are talking about here.
Comment 7 Ian Romanick 2016-10-06 23:11:55 UTC
I have been building xcb and associated bits from source.  pthreads-stubs is fa6db2f9, and libxcb is 65b298c7.  I only recently updated to fa6db2f9 when I started monkeying about with NV20 and softpipe, and I don't think I ever observed these problems on i965 previously.

When I build and install them, pthreads-stubs.pc contains:

    prefix=/opt/xorg-master-x86_64
    exec_prefix=${prefix}
    libdir=/opt/xorg-master-x86_64/lib64
    
    Name: pthread stubs
    Description: Stubs missing from libc for standard pthread functions
    Version: 0.3
    Libs: -L${libdir} -lpthread-stubs

Mesa picks pthreads-stubs up both directly via voodoo in configure.ac:

    dnl pthread-stubs is mandatory on targets where it exists
    case "$host_os" in
    cygwin* )
        pthread_stubs_possible="no"
        ;;
    * )
        pthread_stubs_possible="yes"
        ;;
    esac
    
    if test "x$pthread_stubs_possible" = xyes; then
        PKG_CHECK_MODULES(PTHREADSTUBS, pthread-stubs)
        AC_SUBST(PTHREADSTUBS_CFLAGS)
        AC_SUBST(PTHREADSTUBS_LIBS)
    fi

and indirectly via libxcb.pc.

    prefix=/opt/xorg-master-x86_64
    exec_prefix=${prefix}
    libdir=/opt/xorg-master-x86_64/lib64
    includedir=${prefix}/include
    xcbproto_version=1.12
    
    Name: XCB
    Description: X-protocol C Binding
    Version: 1.12
    Requires.private: pthread-stubs xau >= 0.99.2 xdmcp
    Libs: -L${libdir} -lxcb
    Libs.private: 
    Cflags: -I${includedir}

I tried removing the bits from Mesa's configure.ac, but that didn't seem to have any affect.

My system has glibc-2.22-18.fc23.x86_64, and that has:

$ nm /lib64/libc.so.6 | grep 'T pthread'
000000000010fae0 T pthread_attr_destroy
000000000010fb40 T pthread_attr_getdetachstate
000000000010fba0 T pthread_attr_getinheritsched
000000000010fc00 T pthread_attr_getschedparam
000000000010fc60 T pthread_attr_getschedpolicy
000000000010fcc0 T pthread_attr_getscope
000000000010fb10 T pthread_attr_init@@GLIBC_2.2.5
000000000010fb70 T pthread_attr_setdetachstate
000000000010fbd0 T pthread_attr_setinheritsched
000000000010fc30 T pthread_attr_setschedparam
000000000010fc90 T pthread_attr_setschedpolicy
000000000010fcf0 T pthread_attr_setscope
000000000010fd20 T pthread_condattr_destroy
000000000010fd50 T pthread_condattr_init
000000000013e8d0 T pthread_cond_broadcast@GLIBC_2.2.5
000000000010fd80 T pthread_cond_broadcast@@GLIBC_2.3.2
000000000013e900 T pthread_cond_destroy@GLIBC_2.2.5
000000000010fdb0 T pthread_cond_destroy@@GLIBC_2.3.2
000000000013e930 T pthread_cond_init@GLIBC_2.2.5
000000000010fde0 T pthread_cond_init@@GLIBC_2.3.2
000000000013e960 T pthread_cond_signal@GLIBC_2.2.5
000000000010fe10 T pthread_cond_signal@@GLIBC_2.3.2
000000000013e9c0 T pthread_cond_timedwait@GLIBC_2.2.5
000000000010fe70 T pthread_cond_timedwait@@GLIBC_2.3.2
000000000013e990 T pthread_cond_wait@GLIBC_2.2.5
000000000010fe40 T pthread_cond_wait@@GLIBC_2.3.2
000000000010fab0 T pthread_equal
000000000010fea0 T pthread_exit
000000000010fed0 T pthread_getschedparam
000000000010ff30 T pthread_mutex_destroy
000000000010ff60 T pthread_mutex_init
000000000010ff90 T pthread_mutex_lock
000000000010ffc0 T pthread_mutex_unlock
000000000010fff0 T pthread_self
0000000000110050 T pthread_setcanceltype
000000000010ff00 T pthread_setschedparam

I can try reverting fa6db2f9 tomorrow.
Comment 8 Josh Triplett 2016-10-07 03:38:48 UTC
(In reply to Rob Clark from comment #6)
> (In reply to Emil Velikov from comment #5)
> > libpthreads-stub must _not_ provide stub implementation for APIs that
> > absolutely requires locking/other. Would that be pthread_mutexattr_settype
> > or any other.
> 
> or pthread_mutex_lock()/unlock() for that matter ;-)
> 
> pthreads-stub should only be used where you don't actually have *any*
> threads.. so in that regard, stubbing pthread_mutexattr_settype() is
> completely legit.
> 
> The problem *sounds* like we somehow have a libc that provides *part* of the
> pthreads API, but not all, so you end up mixing stub and real
> implementation?

This, exactly.

If pthread-stubs provides all the stubs for pthread mutexes, then it makes sense for pthread-stubs to provide a stub for pthread_mutexattr_settype, which can always succeed at setting a mutex as recursive, because you can lock and unlock it all you want.

However, if you actually have a real non-stub implementation of pthread mutexes, but that implementation doesn't have pthread_mutexattr_settype, pthread-stubs *must not* provide a stub that always succeeds, because then a program would expect that it has successfully set a mutex as recursive, and then deadlock.  (As well as the "uninitialized data" issue.)

Is Mesa linking against the libc6 stubs, and then running in the single-threaded case?  Or does it somehow end up running against the real libpthread but also calling pthread-stubs functions?
Comment 9 Rob Clark 2016-10-07 14:27:53 UTC
(In reply to Ian Romanick from comment #7)
> My system has glibc-2.22-18.fc23.x86_64, and that has:
> 
> $ nm /lib64/libc.so.6 | grep 'T pthread'
> 000000000010fae0 T pthread_attr_destroy
> 000000000010fb40 T pthread_attr_getdetachstate
> 000000000010fba0 T pthread_attr_getinheritsched
> 000000000010fc00 T pthread_attr_getschedparam
> 000000000010fc60 T pthread_attr_getschedpolicy
> 000000000010fcc0 T pthread_attr_getscope
> 000000000010fb10 T pthread_attr_init@@GLIBC_2.2.5
> 000000000010fb70 T pthread_attr_setdetachstate
> 000000000010fbd0 T pthread_attr_setinheritsched
> 000000000010fc30 T pthread_attr_setschedparam
> 000000000010fc90 T pthread_attr_setschedpolicy
> 000000000010fcf0 T pthread_attr_setscope
> 000000000010fd20 T pthread_condattr_destroy
> 000000000010fd50 T pthread_condattr_init
> 000000000013e8d0 T pthread_cond_broadcast@GLIBC_2.2.5
> 000000000010fd80 T pthread_cond_broadcast@@GLIBC_2.3.2
> 000000000013e900 T pthread_cond_destroy@GLIBC_2.2.5
> 000000000010fdb0 T pthread_cond_destroy@@GLIBC_2.3.2
> 000000000013e930 T pthread_cond_init@GLIBC_2.2.5
> 000000000010fde0 T pthread_cond_init@@GLIBC_2.3.2
> 000000000013e960 T pthread_cond_signal@GLIBC_2.2.5
> 000000000010fe10 T pthread_cond_signal@@GLIBC_2.3.2
> 000000000013e9c0 T pthread_cond_timedwait@GLIBC_2.2.5
> 000000000010fe70 T pthread_cond_timedwait@@GLIBC_2.3.2
> 000000000013e990 T pthread_cond_wait@GLIBC_2.2.5
> 000000000010fe40 T pthread_cond_wait@@GLIBC_2.3.2
> 000000000010fab0 T pthread_equal
> 000000000010fea0 T pthread_exit
> 000000000010fed0 T pthread_getschedparam
> 000000000010ff30 T pthread_mutex_destroy
> 000000000010ff60 T pthread_mutex_init
> 000000000010ff90 T pthread_mutex_lock
> 000000000010ffc0 T pthread_mutex_unlock
> 000000000010fff0 T pthread_self
> 0000000000110050 T pthread_setcanceltype
> 000000000010ff00 T pthread_setschedparam
> 
> I can try reverting fa6db2f9 tomorrow.

Interesting.. I see the same thing.  So just fired up glxgears in gdb instead and set a breakpoint on pthread_mutexattr_settype().  Turns out some of the pthreads syms are coming from libc and some from libpthread-2.23.so.  Not entirely sure the reasoning for that.

Is it possibly you have the libc part but not libpthreads, so you end up mixing libc's partial pthreads w/ pthread-stubs instead of libpthreads.so?
Comment 10 Samuel Thibault 2016-10-07 14:36:13 UTC
Hello,

Yes, it is expected that pthread calls go through libc and libpthread. libc's stubs actually include a check to know whether libpthread got loaded or not. In the former case, it calls the libpthread function, and thus the end result is just as if you called libpthread directly

Samuel
Comment 11 Emil Velikov 2016-10-07 16:41:03 UTC
(In reply to Rob Clark from comment #6)
> (In reply to Emil Velikov from comment #5)
> > I believe Ian's explanation perfectly reasonable, but let me try from
> > another angle:
> 
> My confusion wasn't about the problem of mixing/matching pthreads-stub and
> real pthreads fxns.. that is completely obvious.
> 
> The confusion was about how it worked at all before when
> pthread_mutexattr_init()/etc wasn't provided, and if the libc doesn't
> provide a recursive mutex feature.
> 
My take is that an actual implementation (likely libpthread.so) is/was used throughout. At least it should have been ;-)

> > libpthreads-stub must _not_ provide stub implementation for APIs that
> > absolutely requires locking/other. Would that be pthread_mutexattr_settype
> > or any other.
> 
> or pthread_mutex_lock()/unlock() for that matter ;-)
> 
Almost. If you don't require actual locking (!recursive + single thread) they can be empty stubs.

> pthreads-stub should only be used where you don't actually have *any*
> threads..
Agree.

> so in that regard, stubbing pthread_mutexattr_settype() is
> completely legit.
> 
Not really :-\ AFAICT setting/changing the mutexattr implies that one requires actual locking.

> The problem *sounds* like we somehow have a libc that provides *part* of the
> pthreads API, but not all, so you end up mixing stub and real
> implementation?  But if that is the case, doesn't that mean that mutexes end
> up not being recursive, which sounds like a bad thing..  at least this is
> what I am reading between the lines here.  I'm not entirely sure which libc
> we are talking about here.

Both libc and libpthreads-stub provide weak symbols for some of the 'safe' pthread API. In practise the libpthread.so symbols should override them and I'm not 100% sure how we end up using the libpthreads-stubs pthread_mutexattr_settype(), which in itself causes the problem.
Comment 12 Emil Velikov 2016-10-07 16:48:02 UTC
(In reply to Rob Clark from comment #9)
> (In reply to Ian Romanick from comment #7)
> > My system has glibc-2.22-18.fc23.x86_64, and that has:
> > 
> > $ nm /lib64/libc.so.6 | grep 'T pthread'
> > 000000000010fae0 T pthread_attr_destroy
> > 000000000010fb40 T pthread_attr_getdetachstate
> > 000000000010fba0 T pthread_attr_getinheritsched
> > 000000000010fc00 T pthread_attr_getschedparam
> > 000000000010fc60 T pthread_attr_getschedpolicy
> > 000000000010fcc0 T pthread_attr_getscope
> > 000000000010fb10 T pthread_attr_init@@GLIBC_2.2.5
> > 000000000010fb70 T pthread_attr_setdetachstate
> > 000000000010fbd0 T pthread_attr_setinheritsched
> > 000000000010fc30 T pthread_attr_setschedparam
> > 000000000010fc90 T pthread_attr_setschedpolicy
> > 000000000010fcf0 T pthread_attr_setscope
> > 000000000010fd20 T pthread_condattr_destroy
> > 000000000010fd50 T pthread_condattr_init
> > 000000000013e8d0 T pthread_cond_broadcast@GLIBC_2.2.5
> > 000000000010fd80 T pthread_cond_broadcast@@GLIBC_2.3.2
> > 000000000013e900 T pthread_cond_destroy@GLIBC_2.2.5
> > 000000000010fdb0 T pthread_cond_destroy@@GLIBC_2.3.2
> > 000000000013e930 T pthread_cond_init@GLIBC_2.2.5
> > 000000000010fde0 T pthread_cond_init@@GLIBC_2.3.2
> > 000000000013e960 T pthread_cond_signal@GLIBC_2.2.5
> > 000000000010fe10 T pthread_cond_signal@@GLIBC_2.3.2
> > 000000000013e9c0 T pthread_cond_timedwait@GLIBC_2.2.5
> > 000000000010fe70 T pthread_cond_timedwait@@GLIBC_2.3.2
> > 000000000013e990 T pthread_cond_wait@GLIBC_2.2.5
> > 000000000010fe40 T pthread_cond_wait@@GLIBC_2.3.2
> > 000000000010fab0 T pthread_equal
> > 000000000010fea0 T pthread_exit
> > 000000000010fed0 T pthread_getschedparam
> > 000000000010ff30 T pthread_mutex_destroy
> > 000000000010ff60 T pthread_mutex_init
> > 000000000010ff90 T pthread_mutex_lock
> > 000000000010ffc0 T pthread_mutex_unlock
> > 000000000010fff0 T pthread_self
> > 0000000000110050 T pthread_setcanceltype
> > 000000000010ff00 T pthread_setschedparam
> > 
> > I can try reverting fa6db2f9 tomorrow.
> 
> Interesting.. I see the same thing.  So just fired up glxgears in gdb
> instead and set a breakpoint on pthread_mutexattr_settype().  Turns out some
> of the pthreads syms are coming from libc and some from libpthread-2.23.so. 
> Not entirely sure the reasoning for that.
> 
> Is it possibly you have the libc part but not libpthreads, so you end up
> mixing libc's partial pthreads w/ pthread-stubs instead of libpthreads.so?

Sure that list is pretty normal/constant. Yet you don't see the issues that Ian has reported, do you ?

As a simple check - build libpthreads-stub before/after the offending commit and rebuild libxcb/mesa against it.

In the former case there will be no libpthread-stub.so while in the latter there will be one. Then at runtime you'll likely end up using the libc/pthreads/pthreads-stub libraries leading to fun experience.
Comment 13 Rob Clark 2016-10-07 18:09:18 UTC
(In reply to Emil Velikov from comment #11)
> (In reply to Rob Clark from comment #6)
> > so in that regard, stubbing pthread_mutexattr_settype() is
> > completely legit.
> > 
> Not really :-\ AFAICT setting/changing the mutexattr implies that one
> requires actual locking.

Well, to be super-anal, pthreads-stub should perhaps flag an error or assert or something for non-recursive mutex if recursively acquired.  But I guess no one has cared about that so far.  It doesn't help that the default mutex type can be (but doesn't have to be) recursive.

But I don't see how setting mutexattr implies that you require locking in a single-threaded environment any more than pthread_mutex_lock() does.  So I still think stubbing the mutexattr fxns is not wrong..  but I guess there is an unfortunate interaction with how pthread syms that aren't in libc get resolved?  Maybe reverting the stubs is more convenient than fixing glibc if we can't figure out how to get the symbols from libpthread before pthread-stubs.

That said, all I have of pthreads-stubs in my filesystem is the .pc file, which I guess explains why I am not seeing this issue?
Comment 14 Ian Romanick 2016-10-08 22:39:50 UTC
(In reply to Josh Triplett from comment #8)
> Is Mesa linking against the libc6 stubs, and then running in the
> single-threaded case?  Or does it somehow end up running against the real
> libpthread but also calling pthread-stubs functions?

As far as I can tell, it is single threaded, linked with libpthread (according to ldd on the piglit test in question), and hitting pthread-stubs functions.  The Valgrind stack trace in comment #0 shows /usr/lib64/libpthread-2.22.so.

Reverting fa6db2f9 seems to have fixed the problems in softpipe and NV20.
Comment 15 Emil Velikov 2016-10-10 13:03:45 UTC
(In reply to Rob Clark from comment #13)
> (In reply to Emil Velikov from comment #11)
> > (In reply to Rob Clark from comment #6)
> > > so in that regard, stubbing pthread_mutexattr_settype() is
> > > completely legit.
> > > 
> > Not really :-\ AFAICT setting/changing the mutexattr implies that one
> > requires actual locking.
> 
> Well, to be super-anal, pthreads-stub should perhaps flag an error or assert
> or something for non-recursive mutex if recursively acquired.  But I guess
> no one has cared about that so far.  It doesn't help that the default mutex
> type can be (but doesn't have to be) recursive.
> 
> But I don't see how setting mutexattr implies that you require locking in a
> single-threaded environment any more than pthread_mutex_lock() does.  So I
> still think stubbing the mutexattr fxns is not wrong..  but I guess there is
> an unfortunate interaction with how pthread syms that aren't in libc get
> resolved?  Maybe reverting the stubs is more convenient than fixing glibc if
> we can't figure out how to get the symbols from libpthread before
> pthread-stubs.
> 
Yes, we can add a lot more error checking/alike and make things more robust. By then they won't be plain stubs though ;-)

If we consider that mesa (and maybe others) do zero error checking for pthread_mutexattr_* and pthread_mutex_init and others, it smells like accident waiting to happen. Guess we should fix that, one of these days as well.

> That said, all I have of pthreads-stubs in my filesystem is the .pc file,
> which I guess explains why I am not seeing this issue?
Pretty much. If you/Ben rebuild pthreads-stubs (and packages which depend on it) you'll see issues similar to the ones Ian reported.
Comment 16 Emil Velikov 2016-10-24 18:10:55 UTC
Let's imagine for a second that all the stuff I said so far is complete BS ;-)

Have we considered why glibc does _not_ expose the new symbols while it does the fa6db2f9c01~1 ones ? Surely they know a thing or two about libc.so and libpthread.so and made the conscious decision to not provide pthread_mutexattr_* by the former.

Uli can we please revert fa6db2f9c01 - it's been confirmed to cause problems ?
Comment 17 Uli Schlachter 2016-10-30 14:23:39 UTC
(In reply to Emil Velikov from comment #16)
> Uli can we please revert fa6db2f9c01 - it's been confirmed to cause problems?

Not really, sorry.

So as I read it, this bug report says we are in the following situation:
- libpthread is loaded
- libpthread-stubs is loaded
- Yet, the stub from libpthread-stubs is called instead of the function from libpthread

None of this sounds specific to pthread_mutexattr_{init,settype}. Why would removing these function from libpthread-stubs fix things and not hide the brokeness? Put differently: Could the same problem also occur with other symbols that are stubbed? Put differently: I'd like to understand the problem before I try to solve it.
Comment 18 Emil Velikov 2016-10-31 14:27:07 UTC
Just saw your IRC poke, barring the handles. For the future - myself xexaxo, Rob robclark, and Ian idr.

$ nm -CD --defined-only /usr/lib/libpthread-2.24.so  | grep pthread_mutexattr_settype
000000000000bc30 T __pthread_mutexattr_settype
000000000000bc30 T pthread_mutexattr_settype
[emil@arch-x1c3 mesa]$ nm -CD --defined-only /usr/lib/libc-2.24.so   | grep pthread_mutexattr_settype | wc -l
0

Dropping the symbols will fix the "brokenness" since then the pthread-stubs will expand to a plain .pc file (no DSO), thus only libc/libpthread will be used.
In practise all Linux distros will end up like ^^ since glibc and musl have provided the fa6db2f9c01~1 symbols for quite a few years. IIRC even latest Solaris does so.

Thinking about the days, where libc did not provide the stubs and fa6db2f9c01 were not around - things were bound to break in an identical way. For example:

Foo uses the fa6db2f9c01~1 pthreads API and links against pthreads-stubs. Be that directly or indirectly by/in one of its dependencies.

Foo uses the stubs and then dlopens libbar2. The latter links against libpthread and the weak pthread-stubs symbols are overwritten.

At this point we'll get issues similar to the one described.


If we are to fix this:
 a) in foo - this means that one will need to link against full blown pthreads, even though it does not want/need it. That is because foo cannot know if the modules that it is going to use will pull pthreads, pthread-stubs or neither.
 b) in pthread-stubs - one will end up re-implementing most of glibc libc<>libpthread heuristics.

So if anything I'd consider making things blindingly obvious that if runtime X doesn't provide pthread-stubs behaviour things can break at any point. Thus suggest a pthreads link for the very odd platform.


/me goes to add error messages for libdrm/mesa if $pkg-config --libs pthread-stubs is non-null
Comment 19 Emil Velikov 2016-10-31 15:17:10 UTC
If comment 18 is hard to parse things are explained in a (slightly) different light in https://lists.freedesktop.org/archives/dri-devel/2016-October/122494.html

One thing I'm thinking of, as an alternative to doing the ^^ in every project. Have pthread-stubs cflags/libs default to the normal pthread ones, if libc doesn't provide the stubs. Opinions, ideas ?
Comment 20 Chris Wilson 2016-11-02 20:27:04 UTC
*** Bug 98532 has been marked as a duplicate of this bug. ***
Comment 21 Uli Schlachter 2016-11-04 15:23:40 UTC
(In reply to Emil Velikov from comment #18)
> Foo uses the fa6db2f9c01~1 pthreads API and links against pthreads-stubs. Be
> that directly or indirectly by/in one of its dependencies.
> 
> Foo uses the stubs and then dlopens libbar2. The latter links against
> libpthread and the weak pthread-stubs symbols are overwritten.
> 
> At this point we'll get issues similar to the one described.

Ah, dlopen(). Ok, that does indeed explain the problems that everyone is seeing. However, my memory warns me about dlopen()ing libpthread and random google results say that it doesn't work too well, e.g. 
https://freetz.org/ticket/819 and https://sourceware.org/bugzilla/show_bug.cgi?id=16628.

However, only on Linux is this actually an issue with the patch that should be reverted. On other patches, other things will break.

Since you seem to be trying to fix this in pthread-stubs (thanks for those patches!), I'll stop complaining about dlopen()ing libpthreads now.
Comment 22 Emil Velikov 2017-03-15 12:05:08 UTC
Since the problematic commit was reverted/pthread-stubs reworked we can close this.
Nobody should see this kind of issues - be that on Linux or other platforms.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.