Bug 96873 - tcp transport always fails: PULSE_SERVER=127.0.0.1 src/pacat, Connection failure: Connection terminated
Summary: tcp transport always fails: PULSE_SERVER=127.0.0.1 src/pacat, Connection fail...
Status: RESOLVED MOVED
Alias: None
Product: PulseAudio
Classification: Unclassified
Component: core (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: pulseaudio-bugs
QA Contact: pulseaudio-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-09 20:40 UTC by Sergei Trofimovich
Modified: 2018-07-30 10:00 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Sergei Trofimovich 2016-07-09 20:40:10 UTC
Noticed breakage on real applications like mpv and other media apps.

PA used to work but I'm not sure what exactly changed in my environment.

Currently pulseaudio over TCP is broken on the following versions:
    7.1, 8.0, 9.0.

How to reproduce:
=================

1. run pulseaudio server on one terminal:
    $ pulseaudio -vvv

2. run client with PULSE_SERVER=127.0.0.1 set
    dev/git/pulseaudio $ PULSE_SERVER=127.0.0.1 src/pacat

Analysis:
=========

1. strace shows there is a failed sendmsg(cmsg_type=SCM_CREDENTIALS) on TCP socket.

    $ PULSE_SERVER=127.0.0.1 strace -f src/pacat

    ppoll([{fd=3, events=POLLIN}, {fd=13, events=POLLIN|POLLOUT}, {fd=0, events=POLLIN}, {fd=5, events=POLLIN}], 4, {29, 999755000}, NULL, 8) = 1 ([{fd=13, revents=POLLOUT}], left {29, 999753448})
    write(4, "W", 1)                        = 1
    sendmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\1\24\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0", 20}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=5412, uid=1000, gid=100}}, msg_flags=0}, MSG_NOSIGNAL) = -1 EINVAL (Invalid argument)
    write(2, "Connection failure: Connection t"..., 42Connection failure: Connection terminated
) = 42
    write(4, "W", 1)                        = 1
    write(4, "W", 1)                        = 1
    close(13)

server side sees it as abrupt connection:
    I: [pulseaudio] socket-server.c: TCP connection accepted by tcpwrap.
    I: [pulseaudio] client.c: Created 8 "Native client (TCP/IP client from 127.0.0.1:46160)"
    I: [pulseaudio] client.c: Freed 8 "Native client (TCP/IP client from 127.0.0.1:46160)"
    I: [pulseaudio] protocol-native.c: Connection died.

Looking at the code there is a few functions that assume UNIX sockets:

    pa_iochannel_write_with_creds()
    pa_iochannel_write_with_fds()

I've added a few asserts there to explicitly state the invariant of UNIX socket being handled:

    diff --git a/src/pulsecore/iochannel.c b/src/pulsecore/iochannel.c
    index e62750b..a85de41 100644
    --- a/src/pulsecore/iochannel.c
    +++ b/src/pulsecore/iochannel.c
    @@ -313,2 +313,3 @@ ssize_t pa_iochannel_write_with_creds(pa_iochannel*io, const void*data, size_t l
         pa_assert(io->ofd >= 0);
    +    pa_assert(pa_iochannel_creds_supported(io));
    
    @@ -365,2 +366,3 @@ ssize_t pa_iochannel_write_with_fds(pa_iochannel*io, const void*data, size_t l,
         pa_assert(nfd <= MAX_ANCIL_DATA_FDS);
    +    pa_assert(pa_iochannel_creds_supported(io));

That allows to get exact backtrace how crash happens:

    dev/git/pulseaudio $ PULSE_SERVER=127.0.0.1 src/pacat

    Assertion 'pa_iochannel_creds_supported(io)' failed at pulsecore/iochannel.c:314, function     pa_iochannel_write_with_creds(). Aborting.
    Aborted (core dumped)

    dev/git/pulseaudio $ gdb src/.libs/pacat core.29381

    #0  0x00007f814e4591c8 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
    54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
    (gdb) bt
    #0  0x00007f814e4591c8 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
    #1  0x00007f814e45a61a in __GI_abort () at abort.c:89
    #2  0x00007f815164dc5e in pa_iochannel_write_with_creds (io=0xb08830, data=0xb08aa8, l=20, ucred=0xb0a944)
        at pulsecore/iochannel.c:314
    #3  0x00007f815166958a in do_write (p=0xb08a60) at pulsecore/pstream.c:777
    #4  0x00007f8151666fc1 in do_pstream_read_write (p=0xb08a60) at pulsecore/pstream.c:258
    #5  0x00007f8151667358 in io_callback (io=0xb08830, userdata=0xb08a60) at pulsecore/pstream.c:304
    #6  0x00007f815164ceff in callback (m=0xb02748, e=0xb08a10, fd=13, f=PA_IO_EVENT_OUTPUT, userdata=0xb08830)
        at pulsecore/iochannel.c:158
    #7  0x00007f81518d922a in dispatch_pollfds (m=0xb026f0) at pulse/mainloop.c:655
    #8  0x00007f81518da011 in pa_mainloop_dispatch (m=0xb026f0) at pulse/mainloop.c:898
    #9  0x00007f81518da18e in pa_mainloop_iterate (m=0xb026f0, block=1, retval=0x7fff98276708) at pulse/mainloop.c:929
    #10 0x00007f81518da1ee in pa_mainloop_run (m=0xb026f0, retval=0x7fff98276708) at pulse/mainloop.c:944
    #11 0x0000000000406e43 in main (argc=1, argv=0x7fff98276a28) at utils/pacat.c:1202
Comment 1 Sergei Trofimovich 2016-07-09 22:07:31 UTC
Tried git pulseaudio on two machines with old and new kernel.

Old 4.3.0 works:

    sendmsg(12, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\1\24\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0", 20}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=13237, uid=1000, gid=1000}}, msg_flags=0}, MSG_NOSIGNAL) = 20

BAD 4.7.0-rc6-00148-gee40fb2 (vanilla) does not:

    sendmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\1\24\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0", 20}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=5412, uid=1000, gid=100}}, msg_flags=0}, MSG_NOSIGNAL) = -1 EINVAL (Invalid argument)

I'll try to pursue kernel changes path.
Comment 2 Tanu Kaskinen 2016-07-10 02:30:43 UTC
(In reply to Sergei Trofimovich from comment #0)
> Noticed breakage on real applications like mpv and other media apps.
> 
> PA used to work but I'm not sure what exactly changed in my environment.
> 
> Currently pulseaudio over TCP is broken on the following versions:
>     7.1, 8.0, 9.0.

To clarify, did you only test those versions, or did you also try older versions and they worked?

Thanks for testing different kernel versions. This issue was reported in IRC too, but we didn't manage to trace it down to kernel differences yet.

> Looking at the code there is a few functions that assume UNIX sockets:
> 
>     pa_iochannel_write_with_creds()
>     pa_iochannel_write_with_fds()

Or rather, the functions assume that setting creds works ("works" as in doesn't result in an error) on IP sockets as well (or at least pa_iochannel_write_with_creds() assumes that - I haven't looked at the call paths for pa_iochannel_write_with_fds()). It seems that the kernel has stopped accepting creds for IP sockets? That behaviour would make sense to me, although it might be against the "never break userspace" rule that the kernel tries to follow.

In any case, I'll write a patch that avoids calling the aforementioned functions on IP sockets.
Comment 3 Sergei Trofimovich 2016-07-10 08:58:53 UTC
(In reply to Tanu Kaskinen from comment #2)
> (In reply to Sergei Trofimovich from comment #0)
> > Noticed breakage on real applications like mpv and other media apps.
> > 
> > PA used to work but I'm not sure what exactly changed in my environment.
> > 
> > Currently pulseaudio over TCP is broken on the following versions:
> >     7.1, 8.0, 9.0.
> 
> To clarify, did you only test those versions, or did you also try older
> versions and they worked?

I tried only 7.1, 8.0, 9.0 on 4.7.0-rc6-00148-gee40fb2 kernel and none of them worked. Didn't try older on this kernel.

I tried 7.1, 8.0, current git on 4.3.0 and all of them work. Didin't try older on this kernel.

> It seems that the kernel has stopped accepting creds for IP sockets? That behaviour would make sense to me, although it might be against the "never break userspace" rule that the kernel tries to follow.

Right. Trying to setup usermode linux binary to bisect that down to kernel commit changing tcp handling of sendmsg().
Comment 4 Sergei Trofimovich 2016-07-10 10:20:57 UTC
Bisected kernel down to bad commit:
    http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c14ac9451c34832554db33386a4393be8bba3a7b

I think tcp.c is where we have started getting -EINVAL:
    http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/net/ipv4/tcp.c?id=c14ac9451c34832554db33386a4393be8bba3a7b

I guess tcp used to ignore control messages completely and now
it's unhappy about those.

I've failed to revert c14ac9451c34832554db33386a4393be8bba3a7b
properly but adding the following workaround allows pulseaudio
work over TCP:

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5c7ed14..1bec47c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1122,15 +1122,15 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
                if (tp->repair_queue == TCP_NO_QUEUE)
                        goto out_err;

                /* 'common' sending to sendq */
        }

        sockc.tsflags = sk->sk_tsflags;
-       if (msg->msg_controllen) {
+       if (0 && msg->msg_controllen) {
                err = sock_cmsg_send(sk, msg, &sockc);
                if (unlikely(err)) {
                        err = -EINVAL;
                        goto out_err;
                }
        }


# bad: [ee40fb2948fc99096836995d4f3ddcc0efbac790] Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
# good: [2dcd0af568b0cf583645c8a317dd12e344b1c72a] Linux 4.6
git bisect start 'master' 'v4.6'
# bad: [6eb59af580dcffc6f6982ac8ef6d27a1a5f26b27] Merge tag 'mfd-for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
git bisect bad 6eb59af580dcffc6f6982ac8ef6d27a1a5f26b27
# bad: [a7fd20d1c476af4563e66865213474a2f9f473a4] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect bad a7fd20d1c476af4563e66865213474a2f9f473a4
# bad: [9dc0b289c4c09bc1a92bdcc055cb37af9b72eb28] net/mlx5_core: Firmware commands to support flow counters
git bisect bad 9dc0b289c4c09bc1a92bdcc055cb37af9b72eb28
# bad: [4319a7976722f6925b5bbbdac417d87a0cbde859] ixgbe: Add work around for empty SFP+ cage crosstalk
git bisect bad 4319a7976722f6925b5bbbdac417d87a0cbde859
# bad: [bddf59046d804638d998f9015246d4990f1cab09] Merge tag 'wireless-drivers-next-for-davem-2016-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
git bisect bad bddf59046d804638d998f9015246d4990f1cab09
# bad: [307c2431abf0974996356c13b67432f4b35e5f2f] mlxsw: Pass mlxsw_core as a param of mlxsw_core_skb_transmit*
git bisect bad 307c2431abf0974996356c13b67432f4b35e5f2f
# bad: [afb8ece4326f2151771f4c40b8d9f799cee5ae6e] i40e: remove redundant check on vsi->active_vlans
git bisect bad afb8ece4326f2151771f4c40b8d9f799cee5ae6e
# bad: [532182cd610782db8c18230c2747626562032205] tcp: increment sk_drops for dropped rx packets
git bisect bad 532182cd610782db8c18230c2747626562032205
# bad: [88023beb2a467dcfd9aa958138f0f3b5e1c432e0] net: macb: Fix simple typo
git bisect bad 88023beb2a467dcfd9aa958138f0f3b5e1c432e0
# good: [91979b9db86340d7cd49392a498663fb1ac74639] stmmac: update MAINTAINERS
git bisect good 91979b9db86340d7cd49392a498663fb1ac74639
# bad: [c14ac9451c34832554db33386a4393be8bba3a7b] sock: enable timestamping using control messages
git bisect bad c14ac9451c34832554db33386a4393be8bba3a7b
# good: [6b084928baac562ed61866f540a96120e9c9ddb7] tcp: use one bit in TCP_SKB_CB to mark ACK timestamps
git bisect good 6b084928baac562ed61866f540a96120e9c9ddb7
# good: [24025c465f77c3585f73450bab19501b2edd6fba] ipv4: process socket-level control messages in IPv4
git bisect good 24025c465f77c3585f73450bab19501b2edd6fba
# good: [ad1e46a837163a3e7160a1250825bcfafd2e714b] ipv6: process socket-level control messages in IPv6
git bisect good ad1e46a837163a3e7160a1250825bcfafd2e714b
# first bad commit: [c14ac9451c34832554db33386a4393be8bba3a7b] sock: enable timestamping using control messages
Comment 5 Sergei Trofimovich 2016-07-10 11:51:01 UTC
Sent email to linux-netdev:
    http://marc.info/?l=linux-netdev&m=146815097831670&w=2
Comment 6 Sergei Trofimovich 2016-07-11 07:57:51 UTC
Proposed fix for linux kernel:
    http://marc.info/?l=linux-netdev&m=146816955602701&w=2
Comment 7 GitLab Migration User 2018-07-30 10:00:42 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/pulseaudio/pulseaudio/issues/164.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.