Created attachment 89406 [details] [review] Proposed patch to fix the problem. Don't call pthread_join() after fork(). When pulseaudio --start runs, it uses a lockfile to ensure that two concurrently spawned 'pulseaudio --start' processes only spawn one daemon. This procedure involves two processes, one of which also spawns threads. With OpenBSD's pthread implementation 'pulseaudio --start' hangs hard after obtaining the lock file. The stuck process is waiting for other threads in pthread_join(), in a process which contains only a single thread! The problem is fixed by calling pthread_join() only in the process that spawns threads. Below is a quote from Philip Guenther, who helped me analyze the problem, with more details: [[[ The output of kdump -Hrf pulseaudio-kdump-4-hanging-process.out confirms that it's bad code in pulseaudio. From the end of that: 8149/1008666 pulseaudio.orig GIO fd 4 wrote 1 bytes "x" 8149/1008666 pulseaudio.orig RET write 1 8149/1008149 pulseaudio.orig RET poll 1 8149/1008149 pulseaudio.orig CALL read(0x3,0x7f7fffff669f,0x1) 8149/1008149 pulseaudio.orig GIO fd 3 read 1 bytes "x" 8149/1008149 pulseaudio.orig RET read 1 So, process 8149 contains at least two threads, with tids 1008666 and 1008149. The gdb session from your previous email shows that the former thread, 1008666, is the one that's the target of the hung pthread_join(). 8149/1008149 pulseaudio.orig CALL read(0x3,0x7f7fffff6680,0x10) 8149/1008149 pulseaudio.orig RET read -1 errno 35 Resource temporarily unava ilable 8149/1008149 pulseaudio.orig CALL pipe(0x7f7fffff67b0) 8149/1008149 pulseaudio.orig RET pipe 0 8149/1008149 pulseaudio.orig CALL sigprocmask(SIG_BLOCK,~0<>) 8149/1008149 pulseaudio.orig RET sigprocmask 0<> 8149/1008149 pulseaudio.orig CALL fork() 8149/1008149 pulseaudio.orig RET fork 15791/0x3daf One of the threads forks, creating process 15791... 8149/1008149 pulseaudio.orig CALL sigprocmask(SIG_SETMASK,0<>) 8149/1008149 pulseaudio.orig RET sigprocmask ~0x10100<SIGKILL|SIGSTOP> 8149/1008149 pulseaudio.orig CALL close(0x7) 8149/1008149 pulseaudio.orig RET close 0 8149/1008149 pulseaudio.orig CALL read(0x6,0x7f7fffff6810,0x4) 8149/1008666 pulseaudio.orig CALL __threxit(0x3d59798e810) ...while the other thread in the original process, the target of the problem pthread_join(), exits. 15791/1015791 pulseaudio.orig RET fork 0 15791/1015791 pulseaudio.orig CALL sigprocmask(SIG_SETMASK,0<>) 15791/1015791 pulseaudio.orig RET sigprocmask ~0x10100<SIGKILL|SIGSTOP> 15791/1015791 pulseaudio.orig CALL getthrid() 15791/1015791 pulseaudio.orig RET getthrid 1015791/0xf7fef 15791/1015791 pulseaudio.orig CALL sendto(0x4,0x7f7fffff66af,0x1,0x400<MSG_NOS IGNAL>,0,0) 15791/1015791 pulseaudio.orig RET sendto -1 errno 38 Socket operation on non- socket 15791/1015791 pulseaudio.orig CALL write(0x4,0x7f7fffff66af,0x1) 15791/1015791 pulseaudio.orig GIO fd 4 wrote 1 bytes "x" 15791/1015791 pulseaudio.orig RET write 1 15791/1015791 pulseaudio.orig CALL __thrsleep(0x3d59798e804,CLOCK_REALTIME,0,0 x3d59798e800,0x3d5999f88e0) The new process, which only has one thread, then tries to pthread_join() that thread in the original process. That's Wrong. To quote XSH 2.9.2: 2.9.2 Thread IDs Although implementations may have thread IDs that are unique in a system, applications should only assume that thread IDs are usable and unique within a single process. The effect of calling any of the functions defined in this volume of POSIX.1-2008 and passing as an argument the thread ID of a thread from another process is unspecified. ]]]
Thank you for the patch! It looks otherwise good, but a small bit of memory is being leaked, because the thread structure is not freed. I think it would be good to create a new function pa_thread_free_nojoin() that would be otherwise the same as pa_thread_free(), but it wouldn't call pa_thread_join().
Created attachment 89600 [details] [review] Revised path, adding pa_thread_join(). Also includes a log message.
Thank you! Patch applied: http://cgit.freedesktop.org/pulseaudio/pulseaudio/commit/?id=d8e2b3a78c6c60ffec9a115a0e35a3cb4c68598f
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.