Bug 71738 - pulseaudio --start triggers undefined behaviour in pthread_join()
Summary: pulseaudio --start triggers undefined behaviour in pthread_join()
Alias: None
Product: PulseAudio
Classification: Unclassified
Component: daemon (show other bugs)
Version: unspecified
Hardware: Other OpenBSD
: medium normal
Assignee: pulseaudio-bugs
QA Contact: pulseaudio-bugs
Depends on:
Reported: 2013-11-18 13:17 UTC by Stefan Sperling
Modified: 2013-11-22 15:36 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:

Proposed patch to fix the problem. Don't call pthread_join() after fork(). (724 bytes, patch)
2013-11-18 13:17 UTC, Stefan Sperling
Details | Splinter Review
Revised path, adding pa_thread_join(). Also includes a log message. (2.66 KB, patch)
2013-11-21 18:00 UTC, Stefan Sperling
Details | Splinter Review

Description Stefan Sperling 2013-11-18 13:17:06 UTC
Created attachment 89406 [details] [review]
Proposed patch to fix the problem. Don't call pthread_join() after fork().

When pulseaudio --start runs, it uses a lockfile to ensure that two concurrently spawned 'pulseaudio --start' processes only spawn one daemon.

This procedure involves two processes, one of which also spawns threads.

With OpenBSD's pthread implementation 'pulseaudio --start' hangs hard after obtaining the lock file. The stuck process is waiting for other threads in pthread_join(), in a process which contains only a single thread!
The problem is fixed by calling pthread_join() only in the process that spawns threads.

Below is a quote from Philip Guenther, who helped me analyze the problem, with more details:

The output of
        kdump -Hrf pulseaudio-kdump-4-hanging-process.out

confirms that it's bad code in pulseaudio.  From the end of that:

  8149/1008666 pulseaudio.orig GIO   fd 4 wrote 1 bytes
  8149/1008666 pulseaudio.orig RET   write 1
  8149/1008149 pulseaudio.orig RET   poll 1
  8149/1008149 pulseaudio.orig CALL  read(0x3,0x7f7fffff669f,0x1)
  8149/1008149 pulseaudio.orig GIO   fd 3 read 1 bytes
  8149/1008149 pulseaudio.orig RET   read 1

So, process 8149 contains at least two threads, with tids 1008666 and   
1008149.  The gdb session from your previous email shows that the former
thread, 1008666, is the one that's the target of the hung pthread_join().
  8149/1008149 pulseaudio.orig CALL  read(0x3,0x7f7fffff6680,0x10)
  8149/1008149 pulseaudio.orig RET   read -1 errno 35 Resource temporarily unava
  8149/1008149 pulseaudio.orig CALL  pipe(0x7f7fffff67b0)
  8149/1008149 pulseaudio.orig RET   pipe 0
  8149/1008149 pulseaudio.orig CALL  sigprocmask(SIG_BLOCK,~0<>)
  8149/1008149 pulseaudio.orig RET   sigprocmask 0<>
  8149/1008149 pulseaudio.orig CALL  fork()
  8149/1008149 pulseaudio.orig RET   fork 15791/0x3daf

One of the threads forks, creating process 15791...

  8149/1008149 pulseaudio.orig CALL  sigprocmask(SIG_SETMASK,0<>)
  8149/1008149 pulseaudio.orig RET   sigprocmask ~0x10100<SIGKILL|SIGSTOP>
  8149/1008149 pulseaudio.orig CALL  close(0x7)
  8149/1008149 pulseaudio.orig RET   close 0
  8149/1008149 pulseaudio.orig CALL  read(0x6,0x7f7fffff6810,0x4)
  8149/1008666 pulseaudio.orig CALL  __threxit(0x3d59798e810)

...while the other thread in the original process, the target of the
problem pthread_join(), exits.

 15791/1015791 pulseaudio.orig RET   fork 0
 15791/1015791 pulseaudio.orig CALL  sigprocmask(SIG_SETMASK,0<>)
 15791/1015791 pulseaudio.orig RET   sigprocmask ~0x10100<SIGKILL|SIGSTOP>
 15791/1015791 pulseaudio.orig CALL  getthrid()
 15791/1015791 pulseaudio.orig RET   getthrid 1015791/0xf7fef
 15791/1015791 pulseaudio.orig CALL  sendto(0x4,0x7f7fffff66af,0x1,0x400<MSG_NOS
 15791/1015791 pulseaudio.orig RET   sendto -1 errno 38 Socket operation on non-
 15791/1015791 pulseaudio.orig CALL  write(0x4,0x7f7fffff66af,0x1)
 15791/1015791 pulseaudio.orig GIO   fd 4 wrote 1 bytes
 15791/1015791 pulseaudio.orig RET   write 1
 15791/1015791 pulseaudio.orig CALL  __thrsleep(0x3d59798e804,CLOCK_REALTIME,0,0

The new process, which only has one thread, then tries to pthread_join()
that thread in the original process.  That's Wrong.  To quote XSH 2.9.2:

2.9.2 Thread IDs
      Although implementations may have thread IDs that are unique in a
      system, applications should only assume that thread IDs are usable
      and unique within a single process. The effect of calling any of the
      functions defined in this volume of POSIX.1-2008 and passing as an
      argument the thread ID of a thread from another process is
Comment 1 Tanu Kaskinen 2013-11-18 14:21:55 UTC
Thank you for the patch!

It looks otherwise good, but a small bit of memory is being leaked, because the thread structure is not freed. I think it would be good to create a new function pa_thread_free_nojoin() that would be otherwise the same as pa_thread_free(), but it wouldn't call pa_thread_join().
Comment 2 Stefan Sperling 2013-11-21 18:00:50 UTC
Created attachment 89600 [details] [review]
Revised path, adding pa_thread_join(). Also includes a log message.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.