Bug 95352 - PA crash on Assertion 'pa_atomic_load(&b->pool->stat.n_imported) > 0
Summary: PA crash on Assertion 'pa_atomic_load(&b->pool->stat.n_imported) > 0
Status: NEW
Alias: None
Product: PulseAudio
Classification: Unclassified
Component: core (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: pulseaudio-bugs
QA Contact: pulseaudio-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-11 15:35 UTC by Lode Cools
Modified: 2017-06-04 22:54 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
PA backtraces (44.51 KB, text/plain)
2016-05-11 15:35 UTC, Lode Cools
Details
PA debug log (1.83 MB, text/plain)
2016-05-11 15:36 UTC, Lode Cools
Details
Backtraces PA8.0 built from GIT 7f6c4e6 (51.79 KB, text/plain)
2016-05-25 07:02 UTC, Lode Cools
Details
Debug log PA8.0 built from GIT 7f6c4e6 (741.85 KB, text/x-log)
2016-05-25 07:03 UTC, Lode Cools
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lode Cools 2016-05-11 15:35:59 UTC
Created attachment 123624 [details]
PA backtraces

PA version 7.1.2

After a few days running, PA crashes with the attached backtrace and debug log.

In my setup, I use a lot of chromium & GST-pulsesink sink-inputs which are played on the same module-combine-sink which is than playing on a HDMI and analog output.

I can't really make up what is going wrong...
Comment 1 Lode Cools 2016-05-11 15:36:40 UTC
Created attachment 123625 [details]
PA debug log
Comment 2 Raymond 2016-05-12 02:38:35 UTC
are you using combine sink of analog and hdmi of different hda controllers

81008.508|  10.000) D: [pulseaudio][modules/module-combine-sink.c:219 adjust_rates()] [alsa_output.pci-0000_01_00.1.hdmi-stereo] total=57.56ms sink=50.10ms 
(81008.508|   0.000) D: [pulseaudio][modules/module-combine-sink.c:219 adjust_rates()] [alsa_output.pci-0000_00_1b.0.analog-stereo] total=50.34ms sink=50.32ms 
(81008.508|   0.000) I: [pulseaudio][modules/module-combine-sink.c:232 adjust_rates()] [combined] avg total latency is 53.95 msec.
(81008.508|   0.000) I: [pulseaudio][modules/module-combine-sink.c:233 adjust_rates()] [combined] target latency is 50.34 msec.
(81008.508|   0.000) I: [pulseaudio][modules/module-combine-sink.c:260 adjust_rates()] [alsa_output.pci-0000_01_00.1.hdmi-stereo] new rate is 44131 Hz; ratio is 1.001;
Comment 3 Lode Cools 2016-05-12 06:18:55 UTC
I am using just 1 combine sink (on which all sink-inputs are playing).  This combine sink is slaving on the 2 physical sinks available (HDMI and analog).
Comment 4 Arun Raghavan 2016-05-12 07:37:50 UTC
There is no PulseAudio 7.1.2. Could you please try with 8.0 or git master and see if this still occurs?

Raymond, this appears to be unrelated to module-combine. The crash is on the daemon's main thread while dealing with unlinking a client stream.
Comment 5 Lode Cools 2016-05-12 08:26:44 UTC
Yes, correct.  The version was 7.1 (installed from Debian package version 7.1-2~bpo8+1).

Now trying with version 8.0-2+b2 from Debian Stretch....
Comment 6 Lode Cools 2016-05-25 07:00:04 UTC
Hi,

The same crash occurred using version 8.0 (built from GIT repo, tag v8.0, hash 7f6c4e6).

I attached backtraces and debug log.
Comment 7 Lode Cools 2016-05-25 07:02:46 UTC
Created attachment 124074 [details]
Backtraces PA8.0 built from GIT 7f6c4e6
Comment 8 Lode Cools 2016-05-25 07:03:32 UTC
Created attachment 124075 [details]
Debug log PA8.0 built from GIT 7f6c4e6
Comment 9 Tanu Kaskinen 2016-05-25 11:29:23 UTC
I found one bug and made a fix for that, but I doubt it's related to this crash. Anyway, if you want to test, you can grab the patch here: https://patchwork.freedesktop.org/patch/89079/
Comment 10 Tanu Kaskinen 2016-05-25 15:39:10 UTC
Both backtraces show that a client connection is being torn down, and imported memblocks from the client are being converted into local memblocks. That's where the crash occurs. Both backtraces show that at the same time, a memblock is being freed.

Converting a memblock from imported to local and freeing an imported memblock both decrement the n_imported counter. My guess is that the counter gets decremented twice for the same memblock. There are no safeguards preventing this, as far as I can see.

The memblock type is being read from multiple threads without locking, so when a memblock is converted from imported to local, the memblock's type changes, but other threads may still see the old value. I think we need to add more aggressive locking to the places where we deal with imported memblocks (and maybe other types too, but I haven't yet checked if there are other type transitions than imported -> local).
Comment 11 Lode Cools 2016-05-27 13:35:29 UTC
Thank you for the investigation.

Is there a way to workaround this behavior?  Eg. only start/stop streams at certain moments...
Comment 12 Tanu Kaskinen 2016-06-02 13:13:19 UTC
One impractical workaround would be to move streams away from the combine sink and wait a bit before they are disconnected. That would ensure that the combine sink doesn't have any references to the client memblocks when the clients disconnect. That's only possible if you have very good control over when the streams disconnect.
Comment 13 Tanu Kaskinen 2016-06-30 16:28:52 UTC
This doesn't seem like a regression, but since this is a crasher, I'll mark this tentatively as a 10.0 blocker anyway.

The current status is that I'm looking into the possibility of removing all memblock type changes. It seems nicer than adding more locking to make the type changes safe. There are only two cases where we change memblock types on the fly: we make imported memblocks local when a client disconnects (related to this crash), and alsa-source creates memblocks from alsa memory that is valid only for a short time, so before the memory becomes invalid, the audio has to be copied elsewhere and the memblock type has to be changed.

Making imported memblocks local doesn't seem necessary, because I think the underlying memory area stays accessible as long as needed. Maybe the idea was to make sure that we don't keep old client mempools around in the server for longer than absolutely necessary, but I don't understand why having them around a bit longer would be a problem.

The alsa-source use case seems to be about avoiding one memory copy, if the recorded audio isn't needed after the pa_source_post() call. When dealing with normal client streams, however, the memory has to be copied anyway before it can be sent to the client. I can't think of any use cases where the current approach would save memory copies, except the case where a source runs without anything connected to it, but that's hardly a use case worth optimizing for...

I don't expect fast progress on this, and I worry how I can reach confidence in not breaking anything. The code is complicated, and at least so far I don't feel I understand it well, and I plan to remove big parts of it (like memblock revoking and releasing - those things seem pretty useless to me, but usually things have purpose)...
Comment 14 Tanu Kaskinen 2016-10-21 11:40:52 UTC
(In reply to Lode Cools from comment #11)
> Is there a way to workaround this behavior?

I now realized that you can probably get rid of the crashes by setting "enable-shm = no" in /etc/pulse/daemon.conf. The crash happens when making imported memblocks local, but if shm is disabled, there won't be any imported memblocks to convert.
Comment 15 Tanu Kaskinen 2016-12-12 18:00:55 UTC
This won't be fixed in 10.0, I'm marking this as a 11.0 blocker instead.

The current status is that I have a bunch of untested patches ready, but I still have to do more work to get rid of all memblock type changes.
Comment 16 Arun Raghavan 2017-05-31 05:50:34 UTC
Tanu, is this still something we're targetting for 11.0?
Comment 17 Tanu Kaskinen 2017-06-04 22:54:58 UTC
Not really...

I'll remove the blocker status. I still want to get this fixed sooner rather than later, but blocking the release on this doesn't seem like a good idea.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.