Created attachment 123624 [details]
PA version 7.1.2
After a few days running, PA crashes with the attached backtrace and debug log.
In my setup, I use a lot of chromium & GST-pulsesink sink-inputs which are played on the same module-combine-sink which is than playing on a HDMI and analog output.
I can't really make up what is going wrong...
Created attachment 123625 [details]
PA debug log
are you using combine sink of analog and hdmi of different hda controllers
81008.508| 10.000) D: [pulseaudio][modules/module-combine-sink.c:219 adjust_rates()] [alsa_output.pci-0000_01_00.1.hdmi-stereo] total=57.56ms sink=50.10ms
(81008.508| 0.000) D: [pulseaudio][modules/module-combine-sink.c:219 adjust_rates()] [alsa_output.pci-0000_00_1b.0.analog-stereo] total=50.34ms sink=50.32ms
(81008.508| 0.000) I: [pulseaudio][modules/module-combine-sink.c:232 adjust_rates()] [combined] avg total latency is 53.95 msec.
(81008.508| 0.000) I: [pulseaudio][modules/module-combine-sink.c:233 adjust_rates()] [combined] target latency is 50.34 msec.
(81008.508| 0.000) I: [pulseaudio][modules/module-combine-sink.c:260 adjust_rates()] [alsa_output.pci-0000_01_00.1.hdmi-stereo] new rate is 44131 Hz; ratio is 1.001;
I am using just 1 combine sink (on which all sink-inputs are playing). This combine sink is slaving on the 2 physical sinks available (HDMI and analog).
There is no PulseAudio 7.1.2. Could you please try with 8.0 or git master and see if this still occurs?
Raymond, this appears to be unrelated to module-combine. The crash is on the daemon's main thread while dealing with unlinking a client stream.
Yes, correct. The version was 7.1 (installed from Debian package version 7.1-2~bpo8+1).
Now trying with version 8.0-2+b2 from Debian Stretch....
The same crash occurred using version 8.0 (built from GIT repo, tag v8.0, hash 7f6c4e6).
I attached backtraces and debug log.
Created attachment 124074 [details]
Backtraces PA8.0 built from GIT 7f6c4e6
Created attachment 124075 [details]
Debug log PA8.0 built from GIT 7f6c4e6
I found one bug and made a fix for that, but I doubt it's related to this crash. Anyway, if you want to test, you can grab the patch here: https://patchwork.freedesktop.org/patch/89079/
Both backtraces show that a client connection is being torn down, and imported memblocks from the client are being converted into local memblocks. That's where the crash occurs. Both backtraces show that at the same time, a memblock is being freed.
Converting a memblock from imported to local and freeing an imported memblock both decrement the n_imported counter. My guess is that the counter gets decremented twice for the same memblock. There are no safeguards preventing this, as far as I can see.
The memblock type is being read from multiple threads without locking, so when a memblock is converted from imported to local, the memblock's type changes, but other threads may still see the old value. I think we need to add more aggressive locking to the places where we deal with imported memblocks (and maybe other types too, but I haven't yet checked if there are other type transitions than imported -> local).
Thank you for the investigation.
Is there a way to workaround this behavior? Eg. only start/stop streams at certain moments...
One impractical workaround would be to move streams away from the combine sink and wait a bit before they are disconnected. That would ensure that the combine sink doesn't have any references to the client memblocks when the clients disconnect. That's only possible if you have very good control over when the streams disconnect.
This doesn't seem like a regression, but since this is a crasher, I'll mark this tentatively as a 10.0 blocker anyway.
The current status is that I'm looking into the possibility of removing all memblock type changes. It seems nicer than adding more locking to make the type changes safe. There are only two cases where we change memblock types on the fly: we make imported memblocks local when a client disconnects (related to this crash), and alsa-source creates memblocks from alsa memory that is valid only for a short time, so before the memory becomes invalid, the audio has to be copied elsewhere and the memblock type has to be changed.
Making imported memblocks local doesn't seem necessary, because I think the underlying memory area stays accessible as long as needed. Maybe the idea was to make sure that we don't keep old client mempools around in the server for longer than absolutely necessary, but I don't understand why having them around a bit longer would be a problem.
The alsa-source use case seems to be about avoiding one memory copy, if the recorded audio isn't needed after the pa_source_post() call. When dealing with normal client streams, however, the memory has to be copied anyway before it can be sent to the client. I can't think of any use cases where the current approach would save memory copies, except the case where a source runs without anything connected to it, but that's hardly a use case worth optimizing for...
I don't expect fast progress on this, and I worry how I can reach confidence in not breaking anything. The code is complicated, and at least so far I don't feel I understand it well, and I plan to remove big parts of it (like memblock revoking and releasing - those things seem pretty useless to me, but usually things have purpose)...
(In reply to Lode Cools from comment #11)
> Is there a way to workaround this behavior?
I now realized that you can probably get rid of the crashes by setting "enable-shm = no" in /etc/pulse/daemon.conf. The crash happens when making imported memblocks local, but if shm is disabled, there won't be any imported memblocks to convert.
This won't be fixed in 10.0, I'm marking this as a 11.0 blocker instead.
The current status is that I have a bunch of untested patches ready, but I still have to do more work to get rid of all memblock type changes.
Tanu, is this still something we're targetting for 11.0?
I'll remove the blocker status. I still want to get this fixed sooner rather than later, but blocking the release on this doesn't seem like a good idea.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.