Bug 90416

Summary:	[abrt] pulseaudio: pa_atomic_load(): pulseaudio killed by SIGSEGV
Product:	PulseAudio	Reporter:	Raman Gupta <rocketraman>
Component:	daemon	Assignee:	pulseaudio-bugs
Status:	RESOLVED FIXED	QA Contact:	pulseaudio-bugs
Severity:	normal
Priority:	medium	CC:	diwic, lennart, wim.taymans
Version:	unspecified
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	88920
Attachments:	check_sink_state.patch Debug log Debug log Fix two crashes abrt dump for SEGFAULT with two_crashes.patch

Description Raman Gupta 2015-05-12 13:13:01 UTC

Changing the card profile via "set-card-profile" causes a crash in the PulseAudio daemon when the echo cancellation module is active for a running program with an open audio stream i.e. the program is started with:

export PULSE_PROP="filter.want=echo-cancel"

If the echo cancellation filter is not requested, then the set-card-profile works fine.

Here is the upstream bug report that has all the abrt attachments: https://bugzilla.redhat.com/show_bug.cgi?id=1198661.

Fedora retrace: https://retrace.fedoraproject.org/faf/reports/586989/

Version-Release number of selected component:
pulseaudio-5.0-25.fc21

Additional info:
reporter:       libreport-2.3.0
backtrace_rating: 4
cmdline:        pulseaudio -D
crash_function: pa_atomic_load
executable:     /usr/bin/pulseaudio
kernel:         3.18.7-200.fc21.x86_64
runlevel:       N 5
type:           CCpp
uid:            1000

Truncated backtrace:
Thread no. 1 (10 frames)
 #0 pa_atomic_load at pulsecore/atomic.h:57
 #1 pa_asyncmsgq_send at pulsecore/asyncmsgq.c:143
 #2 pa_sink_input_start_move at pulsecore/sink-input.c:1618
 #3 pa_sink_input_move_to at pulsecore/sink-input.c:1891
 #4 sink_unlink_hook_callback at modules/module-rescue-streams.c:107
 #5 pa_hook_fire at pulsecore/hook-list.c:106
 #6 pa_sink_unlink at pulsecore/sink.c:693
 #7 sink_input_kill_cb at modules/echo-cancel/module-echo-cancel.c:1410
 #8 pa_sink_move_all_finish at pulsecore/sink.c:924
 #9 card_set_profile at modules/alsa/module-alsa-card.c:262

Comment 1 Tanu Kaskinen 2015-05-13 10:12:25 UTC

I'm marking this as a release blocker for now. It sounds like a reproducible crash, so it shouldn't be too hard to get fixed.

Comment 2 Raman Gupta 2015-05-13 11:48:40 UTC

Thank you. Yes it is easily reproducible.

Comment 3 David Henningsson 2015-05-21 12:27:46 UTC

Created attachment 115945 [details] [review]
check_sink_state.patch

Hi,

Could you check if the attached patch resolves the problem?

In case it does not, could you start PulseAudio with log-level debug (e g pulseaudio -vvvv), reproduce the bug with the log running, then attach the resulting log here? Thanks!

Comment 4 Raman Gupta 2015-05-21 15:25:25 UTC

Created attachment 115957 [details]
Debug log

Pulseaudio debug log.

Comment 5 Raman Gupta 2015-05-21 15:25:44 UTC

(In reply to David Henningsson from comment #3)
> Created attachment 115945 [details] [review] [review]
> check_sink_state.patch
> 
> Hi,
> 
> Could you check if the attached patch resolves the problem?
> 
> In case it does not, could you start PulseAudio with log-level debug (e g
> pulseaudio -vvvv), reproduce the bug with the log running, then attach the
> resulting log here? Thanks!

No, pulseaudio is still crashing:

Window 1:

$ pulseaudio -vvvv > pulseaudio.log 2>&1
Segmentation fault (core dumped)

Window 2:

$ pactl set-card-profile alsa_card.pci-0000_00_1b.0 output:analog-surround-40+input:analog-stereo
Connection failure: Connection terminated

Debug log attached.

Comment 6 Raman Gupta 2015-05-21 15:54:11 UTC

(In reply to Raman Gupta from comment #5)
> No, pulseaudio is still crashing:

Standby, I realized I didn't patch PA properly.

Comment 7 Raman Gupta 2015-05-21 16:25:37 UTC

(In reply to Raman Gupta from comment #6)
> (In reply to Raman Gupta from comment #5)
> > No, pulseaudio is still crashing:
> 
> Standby, I realized I didn't patch PA properly.

Nope, segmentation fault again, with the patch. Updated log will be attached.

Comment 8 Raman Gupta 2015-05-21 16:25:59 UTC

Created attachment 115958 [details]
Debug log

Comment 9 David Henningsson 2015-05-22 07:49:47 UTC

Created attachment 115968 [details] [review]
Fix two crashes

New day, new patch, I think this one is more likely to actually fix something...

I wasn't able to reproduce exactly your crash, but I got a similar one that think I found something that could cause it. 

The patch fixes two different crashes
 - one about setting asyncmsgq to NULL (bad idea, instead we just "park it somewhere")
 - one about trying to set the name when impossible to do so accurately (this is the one I got)

Could you try it? And if it still crashes, check if the backtrace is the same or if it perhaps crashed for another reason this time.

Thanks!

Comment 10 Raman Gupta 2015-05-22 15:14:17 UTC

(In reply to David Henningsson from comment #9)
> Created attachment 115968 [details] [review] [review]
> Fix two crashes
> 
> New day, new patch, I think this one is more likely to actually fix
> something...
> 
> I wasn't able to reproduce exactly your crash, but I got a similar one that
> think I found something that could cause it. 
> 
> The patch fixes two different crashes
>  - one about setting asyncmsgq to NULL (bad idea, instead we just "park it
> somewhere")
>  - one about trying to set the name when impossible to do so accurately
> (this is the one I got)
> 
> Could you try it? And if it still crashes, check if the backtrace is the
> same or if it perhaps crashed for another reason this time.

Thanks David for trying to solve this.

I tried the new patch (both with and without the earlier patch). In both cases, PA segfaulted again. The backtrace seems to be different than before though:

https://retrace.fedoraproject.org/faf/reports/668884/

I will attach the abrt data directory for the crash.

Comment 11 Raman Gupta 2015-05-22 15:15:16 UTC

Created attachment 115976 [details]
abrt dump for SEGFAULT with two_crashes.patch

Comment 12 Arun Raghavan 2015-06-06 05:51:45 UTC

Just to summarise the problem, here's what's happening:

0. We're running a playback and capture stream, which caused module-echo-cancel to be autoloaded

1. We disable the card profile on which the echo canceller is running, making the underlying sink go away

2. At the end of the profile change, the m-e-c sink-input gets killed (since it said it could not be moved, which is what we do in the autoload case)

3. This makes us unload the m-e-c module, at which time we try to move the original client stream away

4. At this point, there is no underlying sink for m-e-c, so the asyncmsgq is NULL, and any messages that might be sent during the move will cause a crash (we don't deal with a non-existent asyncmsgq, and trying to use something fake causes sink input state to go out of whack and crash in other places). 

One way to fix this problem, is to not have the echo-cancel sink-input refuse to move (which we do when the module is autoloaded). Instead, we allow the move to happen, and then when it is completed, if we were autoloaded, we unload ourselves.

This means that we'll never be in an inconsistent state with respect to the existence of an underlying sink, which solves the problem at source. On the other hand, this feels a bit ugly as a solution.

I'll hack this up, but I'd like to hear thoughts on this approach if there are any.

Comment 13 Arun Raghavan 2015-06-12 11:23:46 UTC

Pushed a fix for this now, the way I described in comment #12.

Comment 14 Raman Gupta 2015-12-15 22:36:05 UTC

Can you tell me which version this fix will be in? Thanks!

Comment 15 Tanu Kaskinen 2015-12-15 22:53:58 UTC

The fix is in 6.0.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.