Bug 49198 - glxinfo SIGSEGV in pthread_detach under radeon_drm_cs_destroy under dri2_destroy_context
Summary: glxinfo SIGSEGV in pthread_detach under radeon_drm_cs_destroy under dri2_dest...
Status: CLOSED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: 8.0
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2012-04-26 15:56 UTC by Karl Tomlinson
Modified: 2012-09-20 07:27 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Karl Tomlinson 2012-04-26 15:56:05 UTC
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD JUNIPER
OpenGL version string: 2.1 Mesa 8.0.2
OpenGL shading language version string: 1.20

#0  0x00007ffff6002e98 in pthread_detach () from /lib64/libpthread.so.0
#1  0x00007ffff47f2324 in pipe_thread_destroy (thread=<optimized out>)
    at ../../../../../src/gallium/auxiliary/os/os_thread.h:78
#2  radeon_drm_cs_destroy (rcs=0x7ffff7fd7010) at radeon_drm_cs.c:475
#3  0x00007ffff47e07b0 in r600_context_fini (ctx=0x642dd8)
    at r600_hw_context.c:780
#4  0x00007ffff47c4932 in r600_destroy_context (context=0x642ac0)
    at r600_pipe.c:197
#5  0x00007ffff489397b in st_destroy_context (st=0x7a8ae0)
    at state_tracker/st_context.c:268
#6  0x00007ffff47ee324 in dri_destroy_context (cPriv=<optimized out>)
    at dri_context.c:174
#7  0x00007ffff47c1abb in driDestroyContext (pcp=0x6177e0)
    at ../common/dri_util.c:277
#8  0x00007ffff7bbf01f in dri2_destroy_context (context=0x617640)
    at dri2_glx.c:132
#9  0x00007ffff7b99b89 in glx_display_free (priv=0x613a30) at glxext.c:228
#10 0x00007ffff7b99c16 in __glXCloseDisplay (dpy=0x607010, 
    codes=<optimized out>) at glxext.c:275
#11 0x00007ffff74c7f9c in XCloseDisplay () from /usr/lib64/libX11.so.6
#12 0x0000000000403590 in main ()

Gentoo build with
./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64 --disable-dependency-tracking --enable-dri --enable-glx --enable-texture-float --disable-debug --enable-egl --enable-gbm --disable-gles1 --disable-gles2 --enable-glx-tls --disable-osmesa --enable-asm --enable-shared-dricore --enable-shared-glapi --with-dri-drivers=,swrast,radeon,r200 --with-gallium-drivers=,swrast,r600 --with-egl-platforms=x11,drm --enable-gallium-egl --disable-d3d1x --disable-gallium-g3dvl --enable-gallium-llvm --disable-openvg --disable-vdpau --disable-xa --disable-xvmc
Comment 1 Karl Tomlinson 2012-04-27 23:23:21 UTC
Similar crash in glXDestroyContext from kwin, or Firefox's glxtest.
Comment 2 Karl Tomlinson 2012-05-22 02:50:17 UTC
Bisection gives good: 3b3d2e53bc11f9b5fbda812953700b216cd8ab93
bad: http://cgit.freedesktop.org/mesa/mesa/commit/?id=210ddf0819b5acf87a614214b6d4b02193aafa4a

(Bug still exists in git 8a933e36d1f91d3b8a5377a8d23c2dcdd403e8ad.)
Comment 3 Michel Dänzer 2012-05-22 03:11:05 UTC
Maarten, any ideas?
Comment 4 Maarten Lankhorst 2012-05-22 03:26:10 UTC
Woops, looks like I just did overkill.

pthread_detach is not needed after a pthread_join, so just remove the pipe_thread_destroy call.

I was probably looking at win32 too much at the time, where a thread handle has to be closed after you wait for the thread to terminate.

I posted the patch to mesa-dev, but in any case shouldn't libpthread be at least handling this slightly more gracefully than a crash?

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
index 2239059..168f455 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
@@ -469,7 +469,6 @@ static void radeon_drm_cs_destroy(struct radeon_winsys_cs *rcs)
         pipe_semaphore_signal(&cs->flush_queued);
         pipe_semaphore_wait(&cs->flush_completed);
         pipe_thread_wait(cs->thread);
-        pipe_thread_destroy(cs->thread);
     }
     pipe_semaphore_destroy(&cs->flush_queued);
     pipe_semaphore_destroy(&cs->flush_completed);
Comment 5 Andre Heider 2012-05-22 04:36:26 UTC
This looks dangerous:

1) pipe_semaphore_signal(&cs->flush_queued);
2) pipe_semaphore_wait(&cs->flush_completed);
3) pipe_thread_wait(cs->thread);

There's no guarantee that pipe_semaphore_signal(&cs->flush_completed) in radeon_drm_cs_emit_ioctl() is executed after 2).
Getting rid of 2) should fix that and it would be safe since 3) blocks until radeon_drm_cs_emit_ioctl() returns.
Comment 6 Maarten Lankhorst 2012-05-22 07:44:12 UTC
What do you mean? flush_completed is always explicitly signalled before the flush thread is exiting, see right before the 'return NULL'
Comment 7 Andre Heider 2012-05-22 08:42:42 UTC
Well, I just skimmed over the patch, so maybe I'm missing something, but what I'm seeing is: 2 threads
a) runs radeon_drm_cs_destroy()
b) runs radeon_drm_cs_emit_ioctl()

When e.g. a context switch occurs on thread a) between 1) and 2) then thread b) can run pipe_semaphore_signal(&cs->flush_completed) before a) runs pipe_semaphore_wait(&cs->flush_completed).
The former will effectively be a noop since no other thread is waiting and the latter will block forever.
No?
Comment 8 Maarten Lankhorst 2012-05-22 08:46:31 UTC
You're complaining about undefined behavior in a destruction handler?

When you call radeon_drm_cs_destroy, you better make damned well sure you've finished using it first, you have bigger problems than a deadlock otherwise.
Comment 9 Andre Heider 2012-05-22 09:03:31 UTC
Merely pointing out what I noticed, which might be unrelated to its correctness ;)
And that's that the handler itself triggers thread b) and hence might deadlock, unrelated to additional users.
But if I'm indeed seeing ghosts please ignore the false alarm.
Comment 10 Paul Menzel 2012-09-19 10:20:58 UTC
Could somebody please give an update on this issue? Was a fix committed?

According to Michael Dänzer I might have hit this issue crashing Evolution 3.4.3-1 with a segmentation fault [1]. This is Debian Sid/unstable with Mesa 8.0.4-2.


[1] http://lists.freedesktop.org/archives/dri-devel/2012-September/028029.html
Comment 11 Paul Menzel 2012-09-19 13:17:40 UTC
The Debian BTS tracks this issue under #688108 [1].

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=688108
Comment 12 Karl Tomlinson 2012-09-19 21:20:02 UTC
http://cgit.freedesktop.org/mesa/mesa/commit/src/gallium/winsys/radeon/drm/radeon_drm_cs.c?id=6bb0151f1fd87b4e15f177c7122fc28fea29497e
fixes this.
That fix is not on the 8.0 branch.

ISTR a smaller stacksize limit making libpthread more tolerant to these bugs.
Comment 13 Maarten Lankhorst 2012-09-20 07:27:45 UTC
Pushed to the 8.0 branch too now, commit 25da204f692c291ae3f06bbb9245131738bb80da there.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.