Bug 25742

Summary: context_connect blocks for a few (4--10) seconds when no pulseaudio process running
Product: libcanberra Reporter: jez <jezaustin>
Component: UnspecifiedAssignee: Lennart Poettering <lennart>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: bugs.freedesktop, jezaustin
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description jez 2009-12-21 06:19:52 UTC
context_connect blocks when no pulseaudio process running when it calls to pa_context_connect.

See mozilla/firefox bugs https://bugzilla.mozilla.org/show_bug.cgi?id=533470 and 
https://bugzilla.mozilla.org/show_bug.cgi?id=520417

Is this related? https://bugs.freedesktop.org/show_bug.cgi?id=15862

Is it possible to avoid blocking so long in the case where there is no pulseaudio process?
Comment 1 Lennart Poettering 2009-12-21 11:04:33 UTC
Uh, I don't think things are as they might appear.

First of all, most distros configure PA to autospawn if it is not running anyway. That means the whole problem description is not consistent.

Secondly, if you disable autospawning then non-existing PA should be detected as soon as the connection finished failing, which should be right-away -- unless of course there's a DNS problem or suchlike.
Comment 2 jez 2009-12-21 11:18:25 UTC
Lennart, you're right I have PA configured to not autospawn. How could I confirm a DNS problem?
Comment 3 Karl Tomlinson 2009-12-21 15:35:56 UTC
(In reply to comment #2)
> How could I confirm a DNS problem?

Try attaching a debugger while the process is blocked and check all threads to see if libnss_dns or libresolv is in any of the stacks (thread apply all bt).
I don't actually know how to persuade gdb to print library names when debug symbols are available though, so you may need to guess from file/function names.
Comment 4 jez 2009-12-22 02:50:31 UTC
(In reply to comment #3)
> Try attaching a debugger while the process is blocked and check all threads to
> see if libnss_dns or libresolv is in any of the stacks (thread apply all bt).
> I don't actually know how to persuade gdb to print library names when debug
> symbols are available though, so you may need to guess from file/function
> names.
> 

Rather than messing with the enormous firefox, I've been experimenting with the code at http://www.ypass.net/blog/2009/10/pulseaudio-an-async-example-to-get-device-lists/ .
I'm not at all familiar with debugging threads, and GDB throws errors during startup:
Error while reading shared library symbols:
find_new_threads_callback: cannot get thread info: generic error
find_new_threads_callback: cannot get thread info: generic error

while in dl_main (rtld.c), then
[New Thread 0x7f6774e2c730 (LWP 14758)]
Cannot enable thread event reporting for Thread 0x7f6774e2c730 (LWP 14758): generic error

So I'm not sure if I can check all the threads, even if I knew how... are this messages normal?

Anyway, here's a stack trace which is a lot smaller than the one at 
https://bugzilla.mozilla.org/show_bug.cgi?id=533470#comment_text_6:

#0  0x00007f316a82a7d1 in *__GI___nptl_create_event () at events.c:27
#1  0x00007f316a82bc34 in __pthread_create_2_1 (
    newthread=<value optimized out>, attr=<value optimized out>, 
    start_routine=<value optimized out>, arg=<value optimized out>)
    at ../nptl/sysdeps/pthread/createthread.c:224
#2  0x00007f316bb4a532 in asyncns_new (n_proc=1) at asyncns.c:549
#3  0x00007f316e1e46c1 in pa_socket_client_new_string (m=0x1b790e8, 
    use_rtclock=true, name=<value optimized out>, 
    default_port=<value optimized out>) at pulsecore/socket-client.c:480
#4  0x00007f316e763a6a in try_next_connection (c=0x1b79190)
    at pulse/context.c:868
#5  0x00007f316e7649de in pa_context_connect (c=0x1b79190, 
    server=0x1b85f60 "", flags=PA_CONTEXT_NOFLAGS, api=0x0)
    at pulse/context.c:1024
#6  0x000000000040104e in pa_get_devicelist (input=0x7fff7829f0b0, 
    output=0x7fff7829c030) at pulse-test.c:112
#7  0x00000000004011f0 in main (argc=1, argv=0x7fff782a2228)
    at pulse-test.c:203

I think probably frame 2's asyncns_new call is hiding what's really going on, which is an asyncns_getaddrinfo(...) followed by a start_timeout(...).
This would match with experience, where the error "socket(): Address family not supported by protocol" is given immediately, then the 5ish second pause would just be to satisfy the timeout.
Furthermore, compiling pulseaudio without HAVE_ASYNCNS removes the 5s pause.

I guess this is pulseaudio's bug.
Comment 5 jez 2009-12-22 03:50:46 UTC
Bug submitted to pulseaudio http://pulseaudio.org/ticket/752
Comment 6 jez 2010-03-19 03:24:42 UTC
(In reply to comment #5)
> Bug submitted to pulseaudio http://pulseaudio.org/ticket/752
> 

I uploaded a patch to pulseaudio last night, which will hopefully be accepted
and solve the problem.
Comment 7 Lennart Poettering 2010-03-22 08:13:44 UTC
Patch looks good. Merged upstream. Thanks a lot!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.