Bug 44680 - "Daemon startup failed" due to "Failed to remove stale UNIX socket" and NFS-mounted homes
Summary: "Daemon startup failed" due to "Failed to remove stale UNIX socket" and NFS-m...
Status: RESOLVED FIXED
Alias: None
Product: PulseAudio
Classification: Unclassified
Component: daemon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: pulseaudio-bugs
QA Contact: pulseaudio-bugs
URL:
Whiteboard: triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-11 07:28 UTC by Peter Schwenk
Modified: 2012-03-28 05:59 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Attempt to make runtime paths smaller (2.25 KB, patch)
2012-03-13 18:44 UTC, Colin Guthrie
Details | Splinter Review

Description Peter Schwenk 2012-01-11 07:28:38 UTC
When a user that has an NFS-mounted home directory logs into a Linux machine (tried Ubuntu 11.10 and Fedora 13, both x86 and x86_64), PulseAudio will not start.  Users with homes on the local disk filesystem are not affected.

Looking in the syslog, we early on see the following:

Jan 10 12:38:28 mars pulseaudio[2052]: [pulseaudio] module-dbus-protocol.c: dbus_server_listen() failed: org.freedesktop.DBus.Error.BadAddress: Abstract socket name too long
Jan 10 12:38:28 mars pulseaudio[2052]: [pulseaudio] module-dbus-protocol.c: Starting the local D-Bus server failed.

Looking in the logs, after the above, we see repeated the following related messages:

Jan 10 12:38:58 mars pulseaudio[2260]: [pulseaudio] module-protocol-stub.c: Failed to remove stale UNIX socket '/Network/Servers/<fqdn of server>/Volumes/Homes/jdoe/.pulse/e735d8c8be6377c9aa3f7b4c000005f9-runtime/native': No such file or directory
Jan 10 12:38:58 mars pulseaudio[2260]: [pulseaudio] module.c: Failed to load module "module-native-protocol-unix" (argument: ""): initialization failed.
Jan 10 12:38:58 mars pulseaudio[2260]: [pulseaudio] main.c: Module load failed.
Jan 10 12:38:58 mars pulseaudio[2260]: [pulseaudio] main.c: Failed to initialize daemon.
Jan 10 12:38:58 mars pulseaudio[2257]: [pulseaudio] main.c: Daemon startup failed.

If we look in the user's home directory, we see that ~/.pulse/###-runtime is a symlink to a random directory in /tmp, e.g. /tmp/pulse-2L9K88eMlGn7.  In that random local directory, there is a socket named 'nati', not 'native' like the error messages indicate is expected.  It seems that the socket doesn't get created with the expected name.
Comment 1 Colin Guthrie 2012-03-13 18:27:46 UTC
It looks like we're not allowing enough space in a variable somewhere for the full socket path name.

Looking at the code, we're mostly pretty careful about our paths and using flexible allocation etc., however I do notice this:


pa_socket_server* pa_socket_server_new_unix(pa_mainloop_api *m, const char *filename) {
    int fd = -1;
    struct sockaddr_un sa;
    pa_socket_server *s;

    pa_assert(m);
    pa_assert(filename);

    if ((fd = pa_socket_cloexec(PF_UNIX, SOCK_STREAM, 0)) < 0) {
        pa_log("socket(): %s", pa_cstrerror(errno));
        goto fail;
    }

    memset(&sa, 0, sizeof(sa));
    sa.sun_family = AF_UNIX;
    pa_strlcpy(sa.sun_path, filename, sizeof(sa.sun_path));



Here, it's a limitation of the sa.sun_path.


Looking further in /usr/include/sys/un.h:

struct sockaddr_un
  {
    __SOCKADDR_COMMON (sun_);
    char sun_path[108];         /* Path name.  */
  };


So there we have it. 108 characters.

Not quite sure how to address this. I guess we could pass the path through realpath() first to reduce the file size down, but it's certainly not a universal solution. We should probably fail more gracelessly(!) when the path overflows.
Comment 2 Colin Guthrie 2012-03-13 18:44:33 UTC
Created attachment 58413 [details] [review]
Attempt to make runtime paths smaller

This patch might solve the problem in your case.

Longer term, we'll be wanting to switch to a folder inside /run/user/$USER/ which we can guarantee no other user can write to (which is why we do the whole crazy symlinking stuff).

Ultimately the /run/user/$USER/ dir is supplied to us in XDG_RUNTIME_DIR variable under modern systems.

However, in the mean time, please can you try this patch and let me know?
Comment 3 Colin Guthrie 2012-03-28 04:17:38 UTC
OK, I've tested and committed a similar patch to the one posted here.

It should work around the issues for now, but a "full and proper" fix is to use XDG_RUNTIME_DIR which we will support in the fullness of time.

Cheers
Comment 4 Peter Schwenk 2012-03-28 05:59:48 UTC
Thanks.  I was unable to test the patch, but it seems very reasonable.
On Mar 28, 2012, at 7:17 AM, bugzilla-daemon@freedesktop.org wrote:

> https://bugs.freedesktop.org/show_bug.cgi?id=44680
> 
> Colin Guthrie <fdo@colin.guthr.ie> changed:
> 
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>             Status|NEW                         |RESOLVED
>         Resolution|                            |FIXED
> 
> --- Comment #3 from Colin Guthrie <fdo@colin.guthr.ie> 2012-03-28 04:17:38 PDT ---
> OK, I've tested and committed a similar patch to the one posted here.
> 
> It should work around the issues for now, but a "full and proper" fix is to use
> XDG_RUNTIME_DIR which we will support in the fullness of time.
> 
> Cheers
> 
> -- 
> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.


--
Peter Schwenk  ||  Campus IT Associate 3
Mathematical Sciences  ||  University of Delaware
Newark, DE  19716-2553  ||  (302) 831-0437 (v)
schwenk @ math . udel . edu  ||  http://www.math.udel.edu/~schwenk


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.