Bug 88452 - pulseaudio "killed" my qemu instance (assertion hit)
Summary: pulseaudio "killed" my qemu instance (assertion hit)
Status: RESOLVED FIXED
Alias: None
Product: PulseAudio
Classification: Unclassified
Component: core (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: pulseaudio-bugs
QA Contact: pulseaudio-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 88920
  Show dependency treegraph
 
Reported: 2015-01-15 09:08 UTC by Jiri Slaby
Modified: 2015-07-30 02:54 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Pstream patch (5.03 KB, patch)
2015-01-21 07:18 UTC, David Henningsson
Details | Splinter Review

Description Jiri Slaby 2015-01-15 09:08:25 UTC
$ pulseaudio --version
pulseaudio 5.99.2

This is an upstream clone of:
https://bugzilla.novell.com/show_bug.cgi?id=913202

I switched to pulseaudio 5.99.2 and today I got this from qemu-kvm:
Assertion 'uchunk->index % bq->base == 0' failed at pulsecore/memblockq.c:291, function pa_memblockq_push(). Aborting.

I run qemu as follows:
/usr/bin/qemu-system-x86_64 -machine accel=kvm -k en-us -smp 2 -m 760 -usb -usbdevice tablet -hda /home/new/sles12.img -soundhw hda -net user,hostfwd=tcp::2222-:22 -net nic,model=e1000 -serial pty -balloon virtio
Comment 1 Tanu Kaskinen 2015-01-15 14:46:29 UTC
Is this reproducible? Could you get a backtrace with gdb?

If I want to try to reproduce this myself, where do I get the same sles12.img that you're using?
Comment 2 Jiri Slaby 2015-01-20 10:12:00 UTC
(In reply to Tanu Kaskinen from comment #1)
> Is this reproducible?

Not 100%, but it happens often.

> Could you get a backtrace with gdb?

...

> If I want to try to reproduce this myself, where do I get the same
> sles12.img that you're using?

It does not matter which image you use. Anyway, the version Takashi provided in
https://build.opensuse.org/package/show/home:tiwai:branches:multimedia:libs/pulseaudio

solved the issue. So it is perhaps fixed upstream.
Comment 3 Tanu Kaskinen 2015-01-20 10:18:47 UTC
Thanks, I'll close this bug for now. Please reopen if the issue resurfaces.
Comment 4 Jiri Slaby 2015-01-20 11:52:52 UTC
Oh, it has jsut happened again.


(gdb) where
#0  0x00007fac95bfc0f7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007fac95bfd4ca in __GI_abort () at abort.c:78
#2  0x00007fac93b52a2b in pa_memblockq_push (bq=bq@entry=0x7fac400088e0, uchunk=uchunk@entry=0x7fac473fd8a0)
    at pulsecore/memblockq.c:285
#3  0x00007fac93b53ab3 in pa_memblockq_push_align (bq=0x7fac400088e0, chunk=chunk@entry=0x7fac473fd950)
    at pulsecore/memblockq.c:793
#4  0x00007fac9a3ccf07 in pstream_memblock_callback (p=<optimized out>, channel=<optimized out>, offset=0, 
    seek=PA_SEEK_RELATIVE, chunk=0x7fac473fd950, userdata=0x7fac9daadf60) at pulse/context.c:405
#5  0x00007fac93b58d09 in do_read (p=p@entry=0x7fac40000960, re=re@entry=0x7fac40000b28) at pulsecore/pstream.c:854
#6  0x00007fac93b5b43b in do_pstream_read_write (p=0x7fac40000960) at pulsecore/pstream.c:191
#7  0x00007fac93b5b6d8 in srb_callback (srb=<optimized out>, userdata=0x7fac40000960) at pulsecore/pstream.c:227
#8  0x00007fac93b5c04a in srbchannel_rwloop (sr=0x7fac40008010) at pulsecore/srbchannel.c:192
#9  0x00007fac9a3e11c4 in dispatch_pollfds (m=0x7fac9daade30) at pulse/mainloop.c:657
#10 pa_mainloop_dispatch (m=m@entry=0x7fac9daade30) at pulse/mainloop.c:900
#11 0x00007fac9a3e159a in pa_mainloop_iterate (m=m@entry=0x7fac9daade30, block=block@entry=1, retval=retval@entry=0x0)
    at pulse/mainloop.c:931
#12 0x00007fac9a3e1640 in pa_mainloop_run (m=0x7fac9daade30, retval=retval@entry=0x0) at pulse/mainloop.c:946
#13 0x00007fac9a3ef6c3 in thread (userdata=0x7fac9db19bd0) at pulse/thread-mainloop.c:90
#14 0x00007fac93b6a898 in internal_thread_func (userdata=0x7fac9daae4e0) at pulsecore/thread-posix.c:83
#15 0x00007fac9913e3a4 in start_thread (arg=0x7fac473fe700) at pthread_create.c:310
#16 0x00007fac95caba4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

(gdb) p uchunk->index
$3 = 65426
(gdb) p bq->base
$4 = 4
Comment 5 David Henningsson 2015-01-21 07:18:48 UTC
Created attachment 112587 [details] [review]
Pstream patch
Comment 6 David Henningsson 2015-01-21 07:21:38 UTC
pa_pstream_send_memblock could, at least in theory, split packets in the middle of a frame. The attached patch should fix that.

That said, even if this patch actually fixes the bug we might have a problem with clients and servers if one is older than 6.0 and the other one is 6.0 or newer.
Comment 7 Takashi Iwai 2015-01-21 07:46:45 UTC
FYI, my OBS package is being built with David's new patch now.
Comment 8 Jiri Slaby 2015-01-21 08:20:49 UTC
No hope :(. I rebuilt qemu against pulse, restarted pulseaudio via -k and ran qemu:

(gdb) where
#0  0x00007f1df20fc0f7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007f1df20fd4ca in __GI_abort () at abort.c:78
#2  0x00007f1df0052a2b in pa_memblockq_push (bq=bq@entry=0x7f1d9c0088e0, uchunk=uchunk@entry=0x7f1da37fd8a0) at pulsecore/memblockq.c:285
#3  0x00007f1df0053ab3 in pa_memblockq_push_align (bq=0x7f1d9c0088e0, chunk=chunk@entry=0x7f1da37fd950) at pulsecore/memblockq.c:793
#4  0x00007f1df68ccf07 in pstream_memblock_callback (p=<optimized out>, channel=<optimized out>, offset=0, seek=PA_SEEK_RELATIVE, chunk=0x7f1da37fd950, userdata=0x7f1dfa3f6f60)
    at pulse/context.c:405
#5  0x00007f1df0058d09 in do_read (p=p@entry=0x7f1d9c000960, re=re@entry=0x7f1d9c000b28) at pulsecore/pstream.c:858
#6  0x00007f1df005b46b in do_pstream_read_write (p=0x7f1d9c000960) at pulsecore/pstream.c:191
#7  0x00007f1df005b708 in srb_callback (srb=<optimized out>, userdata=0x7f1d9c000960) at pulsecore/pstream.c:227
#8  0x00007f1df005c07a in srbchannel_rwloop (sr=0x7f1d9c008010) at pulsecore/srbchannel.c:192
#9  0x00007f1df68e11c4 in dispatch_pollfds (m=0x7f1dfa3f6e30) at pulse/mainloop.c:657
#10 pa_mainloop_dispatch (m=m@entry=0x7f1dfa3f6e30) at pulse/mainloop.c:900
#11 0x00007f1df68e159a in pa_mainloop_iterate (m=m@entry=0x7f1dfa3f6e30, block=block@entry=1, retval=retval@entry=0x0) at pulse/mainloop.c:931
#12 0x00007f1df68e1640 in pa_mainloop_run (m=0x7f1dfa3f6e30, retval=retval@entry=0x0) at pulse/mainloop.c:946
#13 0x00007f1df68ef6d3 in thread (userdata=0x7f1dfa462bd0) at pulse/thread-mainloop.c:90
#14 0x00007f1df006a8c8 in internal_thread_func (userdata=0x7f1dfa3f74e0) at pulsecore/thread-posix.c:83
#15 0x00007f1df563e3a4 in start_thread (arg=0x7f1da37fe700) at pthread_create.c:310
#16 0x00007f1df21aba4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

I have a core dump of that. So whatever you want, tell me.
Comment 9 David Henningsson 2015-01-21 09:19:05 UTC
Does turning off srbchannel help?

I e, in /etc/pulse/default.pa, change

load-module module-native-protocol-unix

to

load-module module-native-protocol-unix srbchannel=no

...and restart PulseAudio.


In case you get a new backtrace involving "srb_callback" somewhere in that chain, srbchannel is still active.
Comment 10 Jiri Slaby 2015-01-21 09:51:20 UTC
(In reply to David Henningsson from comment #9)
> load-module module-native-protocol-unix srbchannel=no

This seems to be a workaround.
Comment 11 Tanu Kaskinen 2015-01-30 16:53:09 UTC
(In reply to Jiri Slaby from comment #2)
> > If I want to try to reproduce this myself, where do I get the same
> > sles12.img that you're using?
> 
> It does not matter which image you use.

But I don't know where to get suitable images. I've never used qemu. Are operating system installation images (e.g. DVD images, .iso extension) good?
Comment 12 Jiri Slaby 2015-02-02 13:41:16 UTC
(In reply to Jiri Slaby from comment #10)
> (In reply to David Henningsson from comment #9)
> > load-module module-native-protocol-unix srbchannel=no
> 
> This seems to be a workaround.

And this is the default as of rc3. You can close this IIUC.
Comment 13 Arun Raghavan 2015-02-02 13:44:47 UTC
Marked as a 7.0 blocker as I hope we'll be have srbchannel on-by-default by then.
Comment 14 Jiri Slaby 2015-02-02 13:47:04 UTC
(In reply to Tanu Kaskinen from comment #11)
> (In reply to Jiri Slaby from comment #2)
> > > If I want to try to reproduce this myself, where do I get the same
> > > sles12.img that you're using?
> > 
> > It does not matter which image you use.
> 
> But I don't know where to get suitable images. I've never used qemu. Are
> operating system installation images (e.g. DVD images, .iso extension) good?

I think so, you can download any live image and run it as:
qemu -hda /dev/null -cdrom the_live_image.iso -soundhw hda 

This should suffice.
Comment 15 David Henningsson 2015-02-13 12:36:58 UTC
Ok, reproduced here.

First. Qemu asks for a very low latency:

D: [lt-pulseaudio] protocol-native.c: Requested latency=0,01 ms, Received latency=0,50 ms

0,01 ms, that's half (!) of one sample. I wonder why qemu does that...

Even if we add that up to 0,50 ms that's still very low, even if gets a bit larger later due to overruns.

Second. Qemu starts up a recording stream, asking pulseaudio to write packets to it. But it does not release any packets in a timely fashion. So the 128 slots quickly fill up. That's when PulseAudio starts to send audio packets over the srbchannel as a fallback/workaround.

Third. It's when the srbchannel gets full that "interesting" things start to happen. When not the entire packet can be written at a time, it might also be read in chunks, depending on timing. However, if only part of a memblock is available, it is being split up - with no respect to frame boundaries - in do_read (right below /* Frame payload available */ ). This is the main culprit.

Fourth. There is an mcalign layer that can sometimes fix this, but not always. The problem is here that since the queue is so filled up, only some packets get pushed into the queue. If a memblock gets split in two in the previous step, and only the latter of them gets into the queue (e g because it is much smaller than the first), then the latter one will start at an index which is not a frame boundary, and the program will crash.
Comment 16 David Henningsson 2015-02-13 13:08:24 UTC
I think the section below /* Frame payload available */ in do_read is broken w r t what happens if the packet is split up.

Look how this offset parameter does not take re->index into account:

offset = (int64_t) ((((uint64_t) ntohl(re->descriptor[PA_PSTREAM_DESCRIPTOR_OFFSET_HI])) << 32) | (((uint64_t) ntohl(re->descriptor[PA_PSTREAM_DESCRIPTOR_OFFSET_LO]))));

Now, that happens to work if and only if
 1) the seek mode is PA_SEEK_RELATIVE and
 2) the previously packet was successfully pushed into the queue.

If any of these two fails, any memchunk except the first in a split will be pushed into the wrong place in the queue.

Maybe there is some advantage in getting some of the data as quick as possible, but I think this split will be too hairy to sort out correctly. My suggestion is that we do not call receive_memblock_callback until the entire block is received. Opinions?
Comment 17 Raymond 2015-02-17 14:41:02 UTC
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=0e8ae611bda0dfb038da2d865801d8adbd692bad;hp=40a814b0b1789b94c483190b3208729b5182e5bd

seem bug in qemu , you cannot select 5ms when using 44100Hz (i.e. 22.5 samples)

but pulseaudio should check whether the client supply the correct parameter
Comment 18 David Henningsson 2015-03-05 13:57:07 UTC
New patch posted here: http://lists.freedesktop.org/archives/pulseaudio-discuss/2015-March/023311.html

The first one fixes the bug, the second one re-enables srbchannel, the third is just cleanup.

Please test and let me know the result.
Comment 19 Raymond 2015-03-08 02:16:25 UTC
do you mean minreq can contain half frame ?

    /*
-     * qemu audio tick runs at 250 Hz (by default), so processing
-     * data chunks worth 4 ms of sound should be a good fit.
+     * qemu audio tick runs at 100 Hz (by default), so processing
+     * data chunks worth 10 ms of sound should be a good fit.
      */
-    ba.tlength = pa_usec_to_bytes (4 * 1000, &ss);
-    ba.minreq = pa_usec_to_bytes (2 * 1000, &ss);
+    ba.tlength = pa_usec_to_bytes (10 * 1000, &ss);
+    ba.minreq = pa_usec_to_bytes (5 * 1000, &ss);
     ba.maxlength = -1;
     ba.prebuf = -1;

http://freedesktop.org/software/pulseaudio/doxygen/structpa__buffer__attr.html

Playback only: minimum request. 

The server does not request less than minreq bytes from the client, instead waits until the buffer is free enough to request more bytes at once. It is recommended to set this to (uint32_t) -1, which will initialize this to a value that is deemed sensible by the server. This should be set to a value that gives PulseAudio enough time to move the data from the per-stream playback buffer into the hardware playback buffer.
Comment 20 David Henningsson 2015-06-03 07:58:14 UTC
I'll close this bug as "fixed" because I think it is and there has been no reply to comment #18.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.