Bug 92141

Summary: Connection failure: Protocol error
Product: PulseAudio Reporter: robi
Component: clientsAssignee: pulseaudio-bugs
Status: RESOLVED FIXED QA Contact: pulseaudio-bugs
Severity: major    
Priority: medium CC: lennart
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description robi 2015-09-27 15:53:29 UTC
After booting pavucontrol works just fine, but if I try to start it a little later it just says "Establishing connection to PulseAudio. Please wait..." running "pactl stat" in terminal just returns: "Connection failure: Protocol error"

Pulseaudio version is 7.0-1 and pavucontrol version is 3.0-1. I've only recently been affected by this problem, so I'm guessing it might have been introduced with the latest update.

This is the log of it happening:

https://bpaste.net/show/bb7eea84c040

The last part of the log where everything keeps repeating was the exact moment I started pavucontrol and it was trying to establish the connection to pulseaudio.
Comment 1 robi 2015-09-27 15:58:40 UTC
https://bpaste.net/show/6c5f7537174b

this is a second log of what happens when I try to lower the volume with

    pactl set-sink-volume 0 -2%
Comment 2 Tanu Kaskinen 2015-09-28 06:34:50 UTC
Does pactl work right after booting like pavucontrol does? If it does, then it sounds like there's one PulseAudio instance started at login time (maybe by start-pulseaudio-x11), but it exits after being idle for some time, and then later a different PulseAudio version is started via autospawning. This sounds unlikely, but I don't think it's entirely impossible. What distro are you using? Have you ever installed PulseAudio from source?

What does pactl print if you run it with this command:

    PULSE_LOG=99 pactl stat

With luck the extra logging will show more details about what causes the protocol error, but probably not. (The logging of protocol errors really should be improved...)

Feel free to ping me in IRC if you want.
Comment 3 robi 2015-09-28 07:13:51 UTC
Yes, pactl works right after booting just like pavucontrol does.

I have already disabled autospawn and the autostart of pulseaudio via a desktop file as I was trying to see what's causing all of these issues.

Currently I'm starting pulseaudio as a systemd --user service. I'm using arch linux 64 bit and I have never installed pulseaudio from source.

I can't currently run the command as I'm on a different machine, but I will provide the output later today.
Comment 4 robi 2015-09-28 08:19:07 UTC
Output of PULSE_LOG=99 pactl stat:

Parsing configuration file '/etc/pulse/client.conf'
Using shared memory pool with 1024 slots of size 64.0 KiB each, total size is 64.0 MiB, maximum usable slot size is 65472
Trying to connect to /run/user/1000/pulse/native...
SHM possible: yes
Protocol version: remote 30, local 30
Negotiated SHM: yes
Failed to import memory block.
Connection failure: Protocol error
Comment 5 robi 2015-09-28 10:56:09 UTC
I can also confirm that this bug was introduced with 7.0. I do not have any issues on 6.0-2.
Comment 6 Tanu Kaskinen 2015-09-28 11:00:23 UTC
Ok, the extra logging did reveal a bit more information about what goes wrong after all! Importing shared memory from the server fails for some reason. Unfortunately I can't tell what reason. I only know that pa_memimport_get() returns NULL. The best way forward is probably gdb.

For gdb to be useful, you need to have debug symbols installed for libpulse. Unfortunately, Jan Steffens (the PA maintainer in Arch) told me that it's not simple to install those on Arch. You'd need to recompile PulseAudio to get the debug symbols. If that's something that you know how to do, then please do it. Otherwise I'll write up some instructions for how to temporarily install PulseAudio from upstream sources safely without messing up the installation from Arch packages.

[Update: Jan told me to point you to [1]. Installing pulseaudio-git, libpulse-git and any other pulseaudio-*-git packages that you need from Jan's repo should get you the debug symbols.]

[1] https://bbs.archlinux.org/viewtopic.php?id=117157

Once you have the debug symbols, run gdb as follows:

PULSE_LOG=99 gdb --args pactl stat

That will print something like this:

GNU gdb (Debian 7.10-1) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from pactl...done.
(gdb)

You're now in the gdb prompt. Enter "break pa_memimport_get". That symbol hasn't yet been loaded, so gdb asks you whether to add the breakpoint when the symbol becomes available. Answer "y". Then enter "run". The execution should soon stop at the breakpoint. From now on enter "n" ("n" is short for "next") repeatedly until the program exits (that will take quite some time). When the program exits, you can quit gdb with command "quit", and then save all the terminal output and attach it here. That should show the exact place that fails.
Comment 7 Tanu Kaskinen 2015-09-28 11:05:01 UTC
This bug has been seen also on Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800120
Comment 8 robi 2015-09-28 11:12:09 UTC
https://bpaste.net/show/c60f972d01ae
Comment 9 Tanu Kaskinen 2015-09-28 11:16:25 UTC
Hmm, that didn't work out as well as on my own machine, since gdb couldn't show the source code. At least the line numbers are there, that might suffice...
Comment 10 Tanu Kaskinen 2015-09-28 11:35:01 UTC
Based on the line numbers, I figured out that segment_attach() fails. segment_attach() can fail for two reasons, and the first one shouldn't be possible (the pa_hashmap_size(i->segments) >= PA_MEMIMPORT_SEGMENTS_MAX check can't fail, because this is the first segment that we're handling).

So, pa_shm_attach() fails. That function has several failure cases, but all but two of them log an error message. There are no error messages from pa_shm_attach() in the log, so the problem is that shm_open() fails, and errno is either EACCES or ENOENT. That is, either pactl doesn't have permission to access the shm file, or the shm file doesn't exist.

What does "ls -l /dev/shm" print?
Comment 11 robi 2015-09-28 11:36:39 UTC
total 26M
drwxr-x--- 2 root   root    40 Sep 28 13:08 firejail/
-rw------- 1 robert robert 25M Sep 28 13:27 i3-log-1648
-rwxr-x--- 1 robert robert  32 Sep 28 13:08 sem.lastpassffsemaphore*
Comment 12 Tanu Kaskinen 2015-09-28 12:14:07 UTC
No pulseaudio shm files at all... Apparently the server is then failing to create those. The server log doesn't have any errors about that, though, but I guess that's kind of expected, since if the server detected errors, it wouldn't pretend to the client that shm is working...

But after looking at the code, it looks like there's no way that could happen. We see "Enabling srbchannel..." messages in the log, and that message won't be logged if core->rw_mempool is NULL, and if it's not NULL, then I don't see any way how creating the shm file could have failed.

Can you double-check that PulseAudio is running for your user while you run "ls -l /dev/shm"? There should be at least one file named pulse-shm-XXX (where XXX is some random number).

Also, after reboot, while PulseAudio still works, does /dev/shm have any files named pulse-shm-XXX? Maybe something is wiping the files after a while...
Comment 13 robi 2015-09-28 12:19:13 UTC
Maybe I should have made myself clearer. Pulseaudio does work. Always. What doesn't work is trying to change the volume with pactl and trying to use pavucontrol.

I'm sitting here listening to music on youtube while checking if there is something inside /dev/shm, so pulseaudio is *currently* running and working, as I check this, and there are no shm files inside that folder.

This is the output of systemctl --user status pulseaudio.socket:

systemctl --user status pulseaudio.socket
‚óŹ pulseaudio.socket - Sound System
   Loaded: loaded (/usr/lib/systemd/user/pulseaudio.socket; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2015-09-28 13:44:56 CEST; 30min ago
   Listen: /run/user/1000/pulse/native (Stream)

Sep 28 13:44:56 StoraArch systemd[472]: Listening on Sound System.

I'll reboot to check the rest.
Comment 14 robi 2015-09-28 12:21:11 UTC
This is what /dev/shm looks like right after booting:

-rwx------ 1 robert robert 65M Sep 28 14:20 pulse-shm-1406240471*
-rwx------ 1 robert robert 65M Sep 28 14:20 pulse-shm-1700429444*
-rwx------ 1 robert robert 65M Sep 28 14:20 pulse-shm-2453895259*
-rwx------ 1 robert robert 65M Sep 28 14:20 pulse-shm-2951274004*
-rwx------ 1 robert robert 65M Sep 28 14:20 pulse-shm-514164094*
Comment 15 robi 2015-09-28 12:22:25 UTC
A minute later the output looks like this again:

drwxr-x--- 2 root   root    40 Sep 28 14:20 firejail/
-rw------- 1 robert robert 25M Sep 28 14:20 i3-log-1608
-rwxr-x--- 1 robert robert  32 Sep 28 14:20 sem.lastpassffsemaphore*

The shm files are gone and pavucontrol can no longer connect to pulseaudio.
Comment 16 David Henningsson 2015-09-28 12:35:57 UTC
This finding gives more questions than answers...

 * When running PA 6.0, are the shm files still there?
 * Why doesn't pa_shm_attach log when it can't open an shm file due to ENOENT or EACCESS?
 * If you disable srbchannel, does it start to work? (Well, yes, pactl does, because it does not need to send audio memblocks, but programs that actually play back audio?)
 
And most importantly...

 * Who murdered the seemingly innocent shm files, leaving no traces?
Comment 17 Tanu Kaskinen 2015-09-28 12:40:21 UTC
Can you postpone the creation of the firejail, i3-log and lastpass files, or do they get created automatically when you log in? If you wait a bit before starting firejail or lastpass after rebooting, can you see any correlation between one of those files appearing under /dev/shm and the pulseaudio files disappearing? Maybe firejail or lastpass do some "cleanup" when they start...
Comment 18 robi 2015-09-28 12:56:13 UTC
Just checked. When running 6.0-2 the shm files do not disappear.

Disable srbchannel? Sorry, no idea how to do that or what it means.


But! Tanu I think you just found the culprit. I've been constantly checking the content of /dev/shm and what I've just found was that no matter what I did, which media I played or when I opened pavucontrol it did not matter. Even playing a youtube video on firefox did not cause it.

What *did* cause it, is the creating of the firejail directory! Firejail is a sandbox program which I use to secure my firefox. I start the whole thing with this command:

firejail --seccomp firefox

If I run firefox without firejail, everything is fine. As soon as I run the whole thing though, the shm files are being deleted. 

The next question is: Why doesn't this happen with 6.0? From what I could gather, it seems like on 6.0 a few files are being deleted, but at least one always remains.
Comment 19 Tanu Kaskinen 2015-09-28 13:28:35 UTC
srbchannel is a more efficient transport for the client-server communication. In 7.0 it's enabled by default, in 6.0 it wasn't. It can be disabled by giving option "srbchannel=no" to module-native-protocol-unix in /etc/pulse/default.pa.

The reason why /dev/shm isn't empty with 6.0 is probably that when clients connect to PulseAudio, each of them will create a file in /dev/shm. So after the files have been deleted, new ones can be created again. With 7.0 that doesn't happen, because the clients can't connect at all. Using srbchannel involves one long-lived file in /dev/shm that all clients use, and if it's deleted, clients can't connect any more.

David: I don't understand either why we don't log failures if they happen due to EACCES or ENOENT. Maybe you'd like to write a patch?

By the way, the issue seems to have already been reported to the firejail developers: https://github.com/netblue30/firejail/issues/69
Comment 20 Tanu Kaskinen 2015-09-28 13:30:09 UTC
Resolving as "notourbug".
Comment 21 Tanu Kaskinen 2015-10-07 07:41:00 UTC
Reopening. I believe this is a PulseAudio bug after all. If a client runs in a sandbox that creates a new pid namespace, pa_shm_cleanup() will remove all shm files, because kill() fails with ESRCH. (I don't actually have empirical proof, but it seems obvious that this will happen.)

I don't know yet how (and if) this can be fixed. memfd support will help, but I think the traditional shm files will be in use for a long time, e.g. on non-Linux systems. (Do non-Linux systems support pid namespaces, though?)
Comment 22 Tanu Kaskinen 2015-12-27 12:28:14 UTC
Not going to be fixed in time for 8.0. I'm removing the release blocker status.
Comment 23 Cris 2016-11-27 18:30:22 UTC
What's the status on this?
Comment 24 Tanu Kaskinen 2016-11-27 19:34:16 UTC
This is fixed for Linux. PulseAudio 9.0 introduced the memfd shared memory mechanism, which can be enabled by putting "enable-memfd = yes" to daemon.conf. PulseAudio 10.0 will have the feature enabled by default. Other operating systems may exhibit the same problem, but I'll close this bug anyway. If this is still a problem for anyone after 9.0, please reopen.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.