OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD JUNIPER OpenGL version string: 2.1 Mesa 8.0.2 OpenGL shading language version string: 1.20 #0 0x00007ffff6002e98 in pthread_detach () from /lib64/libpthread.so.0 #1 0x00007ffff47f2324 in pipe_thread_destroy (thread=<optimized out>) at ../../../../../src/gallium/auxiliary/os/os_thread.h:78 #2 radeon_drm_cs_destroy (rcs=0x7ffff7fd7010) at radeon_drm_cs.c:475 #3 0x00007ffff47e07b0 in r600_context_fini (ctx=0x642dd8) at r600_hw_context.c:780 #4 0x00007ffff47c4932 in r600_destroy_context (context=0x642ac0) at r600_pipe.c:197 #5 0x00007ffff489397b in st_destroy_context (st=0x7a8ae0) at state_tracker/st_context.c:268 #6 0x00007ffff47ee324 in dri_destroy_context (cPriv=<optimized out>) at dri_context.c:174 #7 0x00007ffff47c1abb in driDestroyContext (pcp=0x6177e0) at ../common/dri_util.c:277 #8 0x00007ffff7bbf01f in dri2_destroy_context (context=0x617640) at dri2_glx.c:132 #9 0x00007ffff7b99b89 in glx_display_free (priv=0x613a30) at glxext.c:228 #10 0x00007ffff7b99c16 in __glXCloseDisplay (dpy=0x607010, codes=<optimized out>) at glxext.c:275 #11 0x00007ffff74c7f9c in XCloseDisplay () from /usr/lib64/libX11.so.6 #12 0x0000000000403590 in main () Gentoo build with ./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64 --disable-dependency-tracking --enable-dri --enable-glx --enable-texture-float --disable-debug --enable-egl --enable-gbm --disable-gles1 --disable-gles2 --enable-glx-tls --disable-osmesa --enable-asm --enable-shared-dricore --enable-shared-glapi --with-dri-drivers=,swrast,radeon,r200 --with-gallium-drivers=,swrast,r600 --with-egl-platforms=x11,drm --enable-gallium-egl --disable-d3d1x --disable-gallium-g3dvl --enable-gallium-llvm --disable-openvg --disable-vdpau --disable-xa --disable-xvmc
Similar crash in glXDestroyContext from kwin, or Firefox's glxtest.
Bisection gives good: 3b3d2e53bc11f9b5fbda812953700b216cd8ab93 bad: http://cgit.freedesktop.org/mesa/mesa/commit/?id=210ddf0819b5acf87a614214b6d4b02193aafa4a (Bug still exists in git 8a933e36d1f91d3b8a5377a8d23c2dcdd403e8ad.)
Maarten, any ideas?
Woops, looks like I just did overkill. pthread_detach is not needed after a pthread_join, so just remove the pipe_thread_destroy call. I was probably looking at win32 too much at the time, where a thread handle has to be closed after you wait for the thread to terminate. I posted the patch to mesa-dev, but in any case shouldn't libpthread be at least handling this slightly more gracefully than a crash? diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c index 2239059..168f455 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c @@ -469,7 +469,6 @@ static void radeon_drm_cs_destroy(struct radeon_winsys_cs *rcs) pipe_semaphore_signal(&cs->flush_queued); pipe_semaphore_wait(&cs->flush_completed); pipe_thread_wait(cs->thread); - pipe_thread_destroy(cs->thread); } pipe_semaphore_destroy(&cs->flush_queued); pipe_semaphore_destroy(&cs->flush_completed);
This looks dangerous: 1) pipe_semaphore_signal(&cs->flush_queued); 2) pipe_semaphore_wait(&cs->flush_completed); 3) pipe_thread_wait(cs->thread); There's no guarantee that pipe_semaphore_signal(&cs->flush_completed) in radeon_drm_cs_emit_ioctl() is executed after 2). Getting rid of 2) should fix that and it would be safe since 3) blocks until radeon_drm_cs_emit_ioctl() returns.
What do you mean? flush_completed is always explicitly signalled before the flush thread is exiting, see right before the 'return NULL'
Well, I just skimmed over the patch, so maybe I'm missing something, but what I'm seeing is: 2 threads a) runs radeon_drm_cs_destroy() b) runs radeon_drm_cs_emit_ioctl() When e.g. a context switch occurs on thread a) between 1) and 2) then thread b) can run pipe_semaphore_signal(&cs->flush_completed) before a) runs pipe_semaphore_wait(&cs->flush_completed). The former will effectively be a noop since no other thread is waiting and the latter will block forever. No?
You're complaining about undefined behavior in a destruction handler? When you call radeon_drm_cs_destroy, you better make damned well sure you've finished using it first, you have bigger problems than a deadlock otherwise.
Merely pointing out what I noticed, which might be unrelated to its correctness ;) And that's that the handler itself triggers thread b) and hence might deadlock, unrelated to additional users. But if I'm indeed seeing ghosts please ignore the false alarm.
Could somebody please give an update on this issue? Was a fix committed? According to Michael Dänzer I might have hit this issue crashing Evolution 3.4.3-1 with a segmentation fault [1]. This is Debian Sid/unstable with Mesa 8.0.4-2. [1] http://lists.freedesktop.org/archives/dri-devel/2012-September/028029.html
The Debian BTS tracks this issue under #688108 [1]. [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=688108
http://cgit.freedesktop.org/mesa/mesa/commit/src/gallium/winsys/radeon/drm/radeon_drm_cs.c?id=6bb0151f1fd87b4e15f177c7122fc28fea29497e fixes this. That fix is not on the 8.0 branch. ISTR a smaller stacksize limit making libpthread more tolerant to these bugs.
Pushed to the 8.0 branch too now, commit 25da204f692c291ae3f06bbb9245131738bb80da there.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.