Bug 84878 - protocol-native: I get endless underruns when I playback when alsa buffer size is small
Summary: protocol-native: I get endless underruns when I playback when alsa buffer siz...
Status: RESOLVED MOVED
Alias: None
Product: PulseAudio
Classification: Unclassified
Component: core (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: pulseaudio-bugs
QA Contact: pulseaudio-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-10 17:23 UTC by Alban Browaeys
Modified: 2018-07-30 09:40 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
aplay out with buffer sizes from 2048 to 8 (by powers of two) (6.33 KB, text/plain)
2014-10-11 23:34 UTC, Alban Browaeys
Details
aplay xrun with pa native under alsa (I removed the -D hw:0,0) (110.04 KB, text/plain)
2014-10-11 23:36 UTC, Alban Browaeys
Details
dump-hw-params both DESCRIPTOR and BURST kernel modes (513 bytes, text/plain)
2015-01-28 16:53 UTC, Alban Browaeys
Details
pcm_avail.c results : results from BURST (70.79 KB, text/plain)
2015-01-28 16:55 UTC, Alban Browaeys
Details
pcm_avail.c results : results from DESCRIPTOR mode (1.09 KB, text/plain)
2015-01-28 16:56 UTC, Alban Browaeys
Details

Description Alban Browaeys 2014-10-10 17:23:35 UTC
On my exynos4 arm board (odroid u2) I cannot playback with most "alsa/oss only" players (moc, alsaplayer, aplay when I set the buffer_size to a small value <= 5000).

Mind if I disable pulseaudio the playback is fine.


At startup underruns occurs and pile up fast. Mind with the workaround told below I also get underruns but they are recovered quite fast . 


I found this to be related to the support or no support of dma residue reporting to alsa. It turns out that pl330 DMAC used on this board has not support for residue. Only granularity descriptor.
I could confirm as I have a set of patch that had this support and the issue vanish if I enable residue support (via granularity burst).
ie at :
https://github.com/prahal/linux/commit/6f6cfc5cc1f795526adb2b730154624639845117
and the three patches before this one.

Logs when granularity is BURST or SEGMENT but not DESCRIPTOR (ie odroid u2 with above patch applied, and all other platform I could get my hand on: two amd64 boxes -one intel the other amd64 - and an x86_32 all behaves the same: minreq is equal prebuf , except a few where minreq is below prebuf ).



PS: PULSE_LATENCY_MSEC=60  workaround the issue, though it looks
like it only does so as it forces the prebuf to tlength (which is always
higher than minreq).


Logs:
- vanilla pulseaudio: broken audio output:
E: [lt-pulseaudio] protocol-native.c: Client requested: maxlength=4194304 bytes tlength=32768 bytes minreq=2048 bytes prebuf=2048 bytes
E: [lt-pulseaudio] protocol-native.c: Client requested: maxlength=23777 ms tlength=185 ms minreq=11 ms prebuf=11 ms
I: [lt-pulseaudio] protocol-native.c: Requested tlength=185,76 ms, minreq=11,61 ms
D: [lt-pulseaudio] protocol-native.c: Early requests mode enabled, configuring sink latency to minreq.
D: [lt-pulseaudio] protocol-native.c: Requested latency=11,61 ms, Received latency=99,95 ms
E: [lt-pulseaudio] protocol-native.c: Client accepted: maxlength=23777 ms tlength=299 ms minreq=99 ms prebuf=11 ms
D: [lt-pulseaudio] memblockq.c: memblockq requested: maxlength=4194304, tlength=52896, base=4, prebuf=2048, minreq=17628 maxrewind=0
D: [lt-pulseaudio] memblockq.c: memblockq sanitized: maxlength=4194304, tlength=52896, base=4, prebuf=2048, minreq=17628 maxrewind=0


- modified pulseaudio: correct audio out.
E: [lt-pulseaudio] protocol-native.c: Client requested: maxlength=4194304 bytes tlength=32768 bytes minreq=2048 bytes prebuf=2048 bytes
E: [lt-pulseaudio] protocol-native.c: Client requested: maxlength=23777 ms tlength=185 ms minreq=11 ms prebuf=11 ms
I: [lt-pulseaudio] protocol-native.c: Requested tlength=185,76 ms, minreq=11,61 ms
D: [lt-pulseaudio] protocol-native.c: Early requests mode enabled, configuring sink latency to minreq.
D: [lt-pulseaudio] protocol-native.c: Requested latency=11,61 ms, Received latency=99,95 ms
E: [lt-pulseaudio] protocol-native.c: Client accepted: maxlength=23777 ms tlength=299 ms minreq=99 ms prebuf=99 ms
D: [lt-pulseaudio] memblockq.c: memblockq requested: maxlength=4194304, tlength=52896, base=4, prebuf=17628, minreq=17628 maxrewind=0
D: [lt-pulseaudio] memblockq.c: memblockq sanitized: maxlength=4194304, tlength=52896, base=4, prebuf=17628, minreq=17628 maxrewind=0


I modified src/pulsecore/protocol-native.c fix_playback_buffer_attr ,I now  set
prebuf to at least minreq (at the end of the function for now).



To sum up there is a way to workaround the issue in the kernel (in avoid no residue , ie granularity DESCRIPTOR). 

Mind the client always request a prebuf equal or greater than minreq. Only when heuristics in fix__playback_buffer_attr increase minreq they do not change prebuf).
Comment 1 Raymond 2014-10-11 00:52:08 UTC
seem your sound card support DMA_RESIDUE_GRANULARITY_SEGMENT if the residue Residue is updated after each successfully completed segment of the transfer

as your sound card does not have hardware register to count the number of dma transfer

https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/tree/include/linux/dmaengine.h


@DMA_RESIDUE_GRANULARITY_SEGMENT: Residue is updated after each successfully
 *  completed segment of the transfer (For cyclic transfers this is after each
 *  period). This is typically implemented by having the hardware generate an
 *  interrupt after each transferred segment and then the drivers updates the
 *  outstanding residue by the size of the segment. Another possibility is if
 *  the hardware supports scatter-gather and the segment descriptor has a field
 *  which gets set after the segment has been completed. The driver then counts
 *  the number of segments without the flag set to compute the residue.
 * @DMA_RESIDUE_GRANULARITY_BURST: Residue is updated after each transferred
 *  burst. This is typically only supported if the hardware has a progress
 *  register of some sort (E.g. a register with the current read/write address
 *  or a register with the amount of bursts/beats/bytes that have been
 *  transferred or still need to be transferred).
Comment 2 Raymond 2014-10-11 07:35:42 UTC
can aplay run without underrun when using hw device ?

for buffer size 4096, 2048, 1024, 512, 256, 128,... 

aplay -v -D hw:0,0 --buffer-size=4096 stereo.wav
Comment 3 Raymond 2014-10-11 16:15:26 UTC
if your sound card can report DMA_RESIDUE_GRANULARITY_BRUST


http://mailman.alsa-project.org/pipermail/alsa-devel/2014-September/081501.html

what is the DMA brust size?
Comment 4 Alban Browaeys 2014-10-11 23:34:18 UTC
Created attachment 107726 [details]
aplay out with buffer sizes from 2048 to 8 (by powers of two)

no xrun without pulseaudio.

I only changed the GRANULARITY flag from BURST to DESCRIPTOR to exercise the issue.
Comment 5 Alban Browaeys 2014-10-11 23:36:59 UTC
Created attachment 107728 [details]
aplay xrun with pa native under alsa (I removed the -D hw:0,0)
Comment 6 Alban Browaeys 2014-10-11 23:51:42 UTC
(In reply to Raymond from comment #1)
> seem your sound card support DMA_RESIDUE_GRANULARITY_SEGMENT if the residue
> Residue is updated after each successfully completed segment of the transfer
> 
> as your sound card does not have hardware register to count the number of
> dma transfer
> 
> https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/tree/include/
> linux/dmaengine.h
> 
> 
> @DMA_RESIDUE_GRANULARITY_SEGMENT: Residue is updated after each successfully
>  *  completed segment of the transfer (For cyclic transfers this is after
> each
>  *  period). This is typically implemented by having the hardware generate an
>  *  interrupt after each transferred segment and then the drivers updates the
>  *  outstanding residue by the size of the segment. Another possibility is if
>  *  the hardware supports scatter-gather and the segment descriptor has a
> field
>  *  which gets set after the segment has been completed. The driver then
> counts
>  *  the number of segments without the flag set to compute the residue.
>  * @DMA_RESIDUE_GRANULARITY_BURST: Residue is updated after each transferred
>  *  burst. This is typically only supported if the hardware has a progress
>  *  register of some sort (E.g. a register with the current read/write
> address
>  *  or a register with the amount of bursts/beats/bytes that have been
>  *  transferred or still need to be transferred).

As of now my only clue regarding this is printk output. (In reply to Raymond from comment #3)
> if your sound card can report DMA_RESIDUE_GRANULARITY_BRUST
> 
> 
> http://mailman.alsa-project.org/pipermail/alsa-devel/2014-September/081501.
> html
> 
> what is the DMA brust size?

I am still looking for this. Sorry for the lag.
Comment 7 Alban Browaeys 2014-11-27 04:58:08 UTC
Seems the burst size are 32 for playback and 64 bytes for capture from an investigation of the kernel sources. My understanding is also that the source addr register and destination one are byte accurate.
Out of curiosity, why does the burst sizes matter ? The residue accuracy I agree.
Comment 8 Raymond 2014-11-27 11:15:23 UTC
aplay -D hw:0,0 --dump-hw-params any.wav

https://bugs.freedesktop.org/attachment.cgi?id=107918

you can try pcm_avail.c to verify whether your sound card can report dma brust granularity is 8 or 16 frames
Comment 9 Raymond 2015-01-09 03:11:14 UTC
buffe size: 64
Lecture WAVE 'Musique/gillscott.wav' : Signed 16 bit Little Endian, Fréquence 44100 Hz, Stéréo
Hardware PCM card 0 'Odroid-U3' device 0 subdevice 0
Its setup is:
  stream       : PLAYBACK
  access       : RW_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 44100
  exact rate   : 44100 (44100/1)
  msbits       : 16
  buffer_size  : 128
  period_size  : 64
  period_time  : 1451
  tstamp_mode  : NONE
  period_step  : 1
  avail_min    : 64
  period_event : 0
  start_threshold  : 128
  stop_threshold   : 128
  silence_threshold: 0aplay 
  silence_size : 0
  boundary     : 1073741824
  appl_ptr     : 0
  hw_ptr       : 0


it is strange that aplay still using 128 instead of 64 as buffer size
Comment 10 Alban Browaeys 2015-01-28 15:54:52 UTC
I read my initial report and it seems I missed the emphasis on the core of the bug report.

that is: 
"the client always request a prebuf equal or greater than minreq. Only when heuristics in fix__playback_buffer_attr increase minreq they do not change prebuf)."

"I modified src/pulsecore/protocol-native.c fix_playback_buffer_attr ,I now  set
prebuf to at least minreq (at the end of the function for now)."

that is : should the heuristics in fix_playback_buffer_attr check that prebuf is at least minreq ?


I also got lost in explaining my setup : the hardware do support all modes (only the current kernel code does not ). I made a patch to fix that missing bit. This patch lower the criticality of the bug in that pulseaudio recover from the underruns.

That pulseaudio should or should not work when only DESCRIPTOR is supported (as it does not without above heuristics fix in fix_playback_buffer_attr) is an open question which might deserve another report.

NB: I will do the tests you asked for , sorry I missed the mail about the penultimate. Will do both at the same time. Only reading back the report I felt it was too confusing from the start.
Comment 11 Alban Browaeys 2015-01-28 16:53:34 UTC
Created attachment 112914 [details]
dump-hw-params both DESCRIPTOR and BURST kernel modes
Comment 12 Alban Browaeys 2015-01-28 16:55:44 UTC
Created attachment 112915 [details]
pcm_avail.c results : results from BURST

DESCRIPTOR results are same for setup output, only diff is in Playing silence
Comment 13 Alban Browaeys 2015-01-28 16:56:37 UTC
Created attachment 112916 [details]
pcm_avail.c results : results from DESCRIPTOR mode
Comment 14 Raymond 2015-02-02 05:28:25 UTC
(In reply to Alban Browaeys from comment #12)
> Created attachment 112915 [details]
> pcm_avail.c results : results from BURST
> 
> DESCRIPTOR results are same for setup output, only diff is in Playing silence

how do the hardware support both modes ?

it is strange that the avail increment by one in brust mode and not by period size in descriptor mode
Comment 15 Raymond 2015-02-02 05:32:11 UTC
(In reply to Alban Browaeys from comment #13)
> Created attachment 112916 [details]
> pcm_avail.c results : results from DESCRIPTOR mode

buffer_size  : 4096
  period_size  : 1024
  period_time  : 21333
  tstamp_mode  : NONE
  period_step  : 1
  avail_min    : 1024
  period_event : 0
  start_threshold  : 1024
  stop_threshold   : 4096
  silence_threshold: 0
  silence_size : 0
  boundary     : 1073741824
  appl_ptr     : 0
  hw_ptr       : 0
Playing silence
now_us-last_us=140370086	 now_us-start_us=23	 time_us-start_us=14	 time_us-last_time=140370077	 time_us-start_time=5	 Available: 0	 loop iteration: 0
now_us-last_us=20732	 now_us-start_us=25674	 time_us-start_us=25666	 time_us-last_time=20728	 time_us-start_time=25657	 Available: 1024	 loop iteration: 769
now_us-last_us=13	 now_us-start_us=67428	 time_us-start_us=67421	 time_us-last_time=32481	 time_us-start_time=67412	 Available: 3072	 loop iteration: 222


avail should be increment by period size , it is strange that 2048 not in your result



http://www.alsa-project.org/main/index.php/XRUN_Debug
Comment 16 Alban Browaeys 2015-02-05 22:37:01 UTC
(In reply to Raymond from comment #14)
> (In reply to Alban Browaeys from comment #12)
> > Created attachment 112915 [details]
> > pcm_avail.c results : results from BURST
> > 
> > DESCRIPTOR results are same for setup output, only diff is in Playing silence
> 
> how do the hardware support both modes ?

The hardware is burst in that it has register to get the progress (destination and source address registers).
Only that current kernel code does not get their values thus behaves the descriptor way.



> it is strange that the avail increment by one in brust mode and not by
> period size in descriptor mode

Sorry an artefact : I switched only mode from BURST to DESCRIPTOR mode. When I revert the whole changeset avail increment by 1024. 

Playing silence
now_us-last_us=1987690780        now_us-start_us=32      time_us-start_us=20     time_us-last_time=1987690768    time_us-start_time=9    Available: 0    loop iteration: 0
now_us-last_us=27        now_us-start_us=20064   time_us-start_us=20040  time_us-last_time=8     time_us-start_time=20029        Available: 1024         loop iteration: 2117
now_us-last_us=26        now_us-start_us=41395   time_us-start_us=41372  time_us-last_time=8     time_us-start_time=41361        Available: 2048         loop iteration: 4844
now_us-last_us=26        now_us-start_us=62725   time_us-start_us=62702  time_us-last_time=8     time_us-start_time=62691        Available: 3072         loop iteration: 7575


still a bit dubious as to why there is a different behaviour then (after all the only diff is that I compute the residue which should not matter in descriptor mode as not used).
But mind that be it with the full revert or the partial revert both exhibit the same issue this report is about : 
"the client always request a prebuf equal or greater than minreq. Only when heuristics in fix__playback_buffer_attr increase minreq they do not change prebuf)."
Comment 17 Raymond 2015-02-06 02:00:41 UTC
(In reply to Alban Browaeys from comment #12)
> Created attachment 112915 [details]
> pcm_avail.c results : results from BURST
> 
> DESCRIPTOR results are same for setup output, only diff is in Playing silence

if dma brust size is 32 or 64 , it is unlikely the hw_ptr increment by one
Comment 18 Raymond 2015-02-07 15:50:18 UTC
not only hwptr not icrement by dma brust

the time elasped varies from 8 to 18 for one frame which is quite unstable



us-last_us=399781389341	 now_us-start_us=27	 time_us-start_us=18	 time_us-last_time=399781389332	 time_us-start_time=7	 Available: 13	 loop iteration: 0
now_us-last_us=24	 now_us-start_us=51	 time_us-start_us=45	 time_us-last_time=27	 time_us-start_time=34	 Available: 22	 loop iteration: 1
now_us-last_us=19	 now_us-start_us=70	 time_us-start_us=64	 time_us-last_time=19	 time_us-start_time=53	 Available: 29	 loop iteration: 2
now_us-last_us=18	 now_us-start_us=88	 time_us-start_us=82	 time_us-last_time=18	 time_us-start_time=71	 Available: 36	 loop iteration: 3
now_us-last_us=19	 now_us-start_us=107	 time_us-start_us=101	 time_us-last_time=19	 time_us-start_time=90	 Available: 43	 loop iteration: 4
now_us-last_us=18	 now_us-start_us=125	 time_us-start_us=119	 time_us-last_time=18	 time_us-start_time=108	 Available: 50	 loop iteration: 5
now_us-last_us=18	 now_us-start_us=143	 time_us-start_us=137	 time_us-last_time=18	 time_us-start_time=126	 Available: 57	 loop iteration: 6
now_us-last_us=17	 now_us-start_us=160	 time_us-start_us=154	 time_us-last_time=17	 time_us-start_time=143	 Available: 63	 loop iteration: 7
now_us-last_us=18	 now_us-start_us=178	 time_us-start_us=172	 time_us-last_time=18	 time_us-start_time=161	 Available: 70	 loop iteration: 8
now_us-last_us=18	 now_us-start_us=196	 time_us-start_us=190	 time_us-last_time=18	 time_us-start_time=179	 Available: 71	 loop iteration: 9
now_us-last_us=17	 now_us-start_us=213	 time_us-start_us=207	 time_us-last_time=17	 time_us-start_time=196	 Available: 72	 loop iteration: 10
now_us-last_us=18	 now_us-start_us=231	 time_us-start_us=225	 time_us-last_time=18	 time_us-start_time=214	 Available: 73	 loop iteration: 11
now_us-last_us=17	 now_us-start_us=248	 time_us-start_us=242	 time_us-last_time=17	 time_us-start_time=231	 Available: 74	 loop iteration: 12
now_us-last_us=18	 now_us-start_us=266	 time_us-start_us=260	 time_us-last_time=18	 time_us-start_time=249	 Available: 75	 loop iteration: 13
now_us-last_us=17	 now_us-start_us=283	 time_us-start_us=277	 time_us-last_time=17	 time_us-start_time=266	 Available: 76	 loop iteration: 14
now_us-last_us=9	 now_us-start_us=309	 time_us-start_us=304	 time_us-last_time=10	 time_us-start_time=293	 Available: 77	 loop iteration: 16
now_us-last_us=18	 now_us-start_us=327	 time_us-start_us=321	 time_us-last_time=17	 time_us-start_time=310	 Available: 78	 loop iteration: 17
now_us-last_us=9	 now_us-start_us=354	 time_us-start_us=349	 time_us-last_time=11	 time_us-start_time=338	 Available: 79	 loop iteration: 19
now_us-last_us=18	 now_us-start_us=372	 time_us-start_us=366	 time_us-last_time=17	 time_us-start_time=355	 Available: 80	 loop iteration: 20
now_us-last_us=17	 now_us-start_us=389	 time_us-start_us=383	 time_us-last_time=17	 time_us-start_time=372	 Available: 81	 loop iteration: 21
now_us-last_us=8	 now_us-start_us=415	 time_us-start_us=410	 time_us-last_time=9	 time_us-start_time=399	 Available: 82	 loop iteration: 23
now_us-last_us=18	 now_us-start_us=433	 time_us-start_us=427	 time_us-last_time=17	 time_us-start_time=416	 Available: 83	 loop iteration: 24
now_us-last_us=17	 now_us-start_us=450	 time_us-start_us=444	 time_us-last_time=17	 time_us-start_time=433	 Available: 84	 loop iteration: 25
now_us-last_us=9	 now_us-start_us=477	 time_us-start_us=471	 time_us-last_time=9	 time_us-start_time=460	 Available: 85	 loop iteration: 27
now_us-last_us=18	 now_us-start_us=495	 time_us-start_us=489	 time_us-last_time=18	 time_us-start_time=478	 Available: 86	 loop iteration: 28
Comment 19 Alban Browaeys 2015-02-07 22:08:16 UTC
(In reply to Raymond from comment #18)
> not only hwptr not icrement by dma brust
> 
> the time elasped varies from 8 to 18 for one frame which is quite unstable
> 

All of the values varies likewise with or without the change to the kernel. I mean even with vanilla (ie the one were none of the "available" wanted values miss : 0, 1024, 2048, 3072.
min is around 7 and max around 22 in new_us-last_us and time_us-kast_time.

What is wrong with the results ? I mean I do not know what you check for  and why.

about the dma burst: its size is not fixed afaik, the transfer width is. Plus the dma burst if too small a size wil be split into a sequence of single transfers.
Comment 20 Raymond 2015-02-08 00:02:09 UTC

20064   time_us-start_us=20040  time_us-start_time=20029        Available: 1024        
41395   time_us-start_us=41372  time_us-start_time=41361        Available: 2048        
62725   time_us-start_us=62702  time_us-start_time=62691        Available: 3072     

the difference of sucessive time just close to period time 21333
Comment 21 GitLab Migration User 2018-07-30 09:40:04 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/pulseaudio/pulseaudio/issues/83.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.