Bug 97799 - Discontinuity in the interpolated delay after corking, flushing and uncorking.
Summary: Discontinuity in the interpolated delay after corking, flushing and uncorking.
Status: RESOLVED MOVED
Alias: None
Product: PulseAudio
Classification: Unclassified
Component: core (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: pulseaudio-bugs
QA Contact: pulseaudio-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-13 19:42 UTC by Niklas Haas
Modified: 2018-07-30 09:41 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
regular playback (71.43 KB, image/png)
2016-09-13 19:42 UTC, Niklas Haas
Details
seeking (76.51 KB, image/png)
2016-09-13 19:42 UTC, Niklas Haas
Details
seeking zoom (61.60 KB, image/png)
2016-09-13 19:43 UTC, Niklas Haas
Details
problem example (seeking with video) (167.94 KB, image/png)
2016-09-13 19:44 UTC, Niklas Haas
Details
no interpolation (72.94 KB, image/png)
2016-09-13 20:00 UTC, Niklas Haas
Details
no interpolation seeking (69.66 KB, image/png)
2016-09-13 20:00 UTC, Niklas Haas
Details
alternate timing hack (77.04 KB, image/png)
2016-09-13 20:01 UTC, Niklas Haas
Details
alternate timing hack seeking (81.06 KB, image/png)
2016-09-13 20:01 UTC, Niklas Haas
Details
PA_SEEK_RELATIVE_ON_READ seeking (75.80 KB, image/png)
2016-09-13 20:29 UTC, Niklas Haas
Details

Description Niklas Haas 2016-09-13 19:42:28 UTC
Created attachment 126489 [details]
regular playback

After a long session of debugging in #pulseaudio, I got no further than this result, so I'm submitting it here for reference and to get more opinions:

Basically, I'm running into the issue where the reported delay (as measured by pa_stream_get_latency before and after writing data) suffers from a relatively big discontinuity (about 50-100ms) when seeking in mpv. Seeking in mpv is implemented by resetting the audio device, which in mpv's terms means:

1. corking the audio stream (pa_stream_cork true)
2. flushing the audio device (pa_stream_flush)
3. uncorking the audio stream (pa_stream_cork false)
4. continuing playback

I have attached a number of plots of mpv's internal timing data to help illustrate the problem. The first attachment (“regular playback”) establishes a baseline reading (10 seconds of uninterrupted playback). Some notes:

1. The green line (ao-delay) is the reported latency as measured directly by pa_stream_get_latency.

2. The blue line (ao-dev) is the difference between the latency and where mpv thinks it should be (assuming the audio device plays at a perfectly even rate). Ideally, this line should be exactly 0, and more importantly, this line should be as stable as possible (no jumps, no jittering and no spikes), because it is compared against the video stream's timing to detect A/V desynchronization.

3. While they're not as interesting for this test, the spikes at the top indicate events (rather than values). The curve going up represents an event starting, and the curve going down indicates an event stopping. It's somewhat harder to read the legend, so I'll repeat it here: green = ao-fill (inside “pa_stream_write”), blue = audio (“decoding more audio”), black = audio wait (“audio thread sleeping”), yellow = sleep (“playback thread sleeping”)

4. The latency values are measured with PA_STREAM_INTERPOLATE_TIMING | PA_STREAM_AUTO_TIMING_UPDATE | PA_STREAM_NOT_MONOTONIC, and the target latency (tlength) is 2000ms. Decreasing the tlength to a lower value decreases the magnitude of the discontinuity.

The second attachment (“seeking”) indicates what happens when triggering a seek in mpv. I have triggered two seeks, one at about 3.3s and one at about 6.6s. We can observe:

1. The reported latency (ao-delay) suffers from an upwards discontinuity of about 50ms - going up from the baseline latency of about 1050ms to a value near 1100ms, and then gradually decreasing back down to 1000ms over the course of the next 1-2 seconds.

2. Similarly, the “apparent” position of mpv suffers from a downwards discontinuity from 0ms to about -35ms, which increases to +20ms over the course of the next 1-2 seconds.

The third attachment (“seeking zoom”) is a detail view of the discontinuity as it happens, in case it helps.

In case you're wondering why these discontinues affect playback in mpv, I've included a fourth attachment demonstrating the same effects during regular playback. I've included two graphs - one with a smaller tlength (~200ms iirc) and one with a larger tlength (2000ms). They're a bit more confusing because they also include video stats, but basically the effect that happens is as follows:

1. The user triggers a seek
2. The apparent audio position (ao-dev, green line at the bottom) shoots downwards (as observed earlier)
3. The measured A-V difference (avdiff, black line) spikes downwards as a result (by about 100ms in the tlength=2000ms case)
4. This exceeds mpv's thresholds for acceptable audio delay and triggers the video output to drop several frames to resynchronize audio and video.  (Blue square at the top), thus restoring avdiff to a value around 0.
5. The apparent audio position rapidly approaches its “true” value again over the course of the next 1-2 seconds (again, as observed earlier)
6. This causes the measured av difference (black line) to very rapidly grow upwards again within these 1-2 seconds, triggering several frame drops a long the way (every time it exceeds the acceptable threshold) - these are the clearly visible spikes and corresponding blue frames in the few seconds after the seek.

The only known work-around in mpv is to decrease the pulse buffer size (tlength) to a smaller value, which makes the issue less severe (as seen on the left) but doesn't fully solve it.

It would be great if this could get fixed upstream in PulseAudio so we don't have to hack around it in mpv.
Comment 1 Niklas Haas 2016-09-13 19:42:54 UTC
Created attachment 126490 [details]
seeking
Comment 2 Niklas Haas 2016-09-13 19:43:19 UTC
Created attachment 126491 [details]
seeking zoom
Comment 3 Niklas Haas 2016-09-13 19:44:33 UTC
Created attachment 126492 [details]
problem example (seeking with video)
Comment 4 Niklas Haas 2016-09-13 20:00:11 UTC
In case it helps at all to debug this issue, I've added some more attachments:

“no interpolation”: the same test as “regular playback” but with interpolation disabled. This basically shows pulseaudio's internal timing. You can clearly see that pulseaudio's auto-timing code keeps halving the timing update frequency, up to a maximum of 1.5s

“no interpolation seeking”: same as “no interpolation” but with two seeks inserted. Every seek event resets the timing update frequency.

“alternate timing hack”: In this test, INTERPOLATE_TIMING and AUTO_TIMING_UPDATE were disabled and the call to pa_stream_get_latency was replaced by the following:

1. pa_stream_update_timing_info + pa_stream_get_timing_info
2. latency = pa_bytes_to_usec(write_index - read_index)
3. return latency - transport_usec + sink_usec

Since this code doesn't do any smoothing, the result is much more rough overall, but it's nonetheless fairly continuous.

“alternate timing hack seeking”: same as “alternate timing hack” but with seek events included. There are still some small downwards blips in the reported delay, but these are in sync with the fact that less data has been written (the blue line is stable).

It's possible that I could work around this issue by using the “alternate timing hack” code path and doing smoothing of the “ao-dev” (blue) variable myself, but I would really rather avoid it.
Comment 5 Niklas Haas 2016-09-13 20:00:31 UTC
Created attachment 126493 [details]
no interpolation
Comment 6 Niklas Haas 2016-09-13 20:00:51 UTC
Created attachment 126494 [details]
no interpolation seeking
Comment 7 Niklas Haas 2016-09-13 20:01:10 UTC
Created attachment 126495 [details]
alternate timing hack
Comment 8 Niklas Haas 2016-09-13 20:01:30 UTC
Created attachment 126496 [details]
alternate timing hack seeking
Comment 9 Niklas Haas 2016-09-13 20:27:15 UTC
I did another test, this I changed the way mpv resets the stream. When continuing playback, I varied the pa_seek_mode used for the first pa_stream_write invocation  after a reset.

My results were as follows:

RELATIVE: already observed, upwards discontinuity
ABSOLUTE: completely explodes (massive, 2 second spikes)
RELATIVE_ON_READ: small downwards blip in latency, ao-dev jumps by about 30ms. (attached)
RELATIVE_END: similar to RELATIVE_ON_READ

It's possible that using RELATIVE_ON_READ after a flush could alleviate this issue somewhat. Advice from somebody who knows more about the PulseAudio API and codebase is appreciated.
Comment 10 Niklas Haas 2016-09-13 20:29:06 UTC
Created attachment 126497 [details]
PA_SEEK_RELATIVE_ON_READ seeking
Comment 11 Tanu Kaskinen 2016-09-13 21:40:10 UTC
I don't understand why RELATIVE_ON_READ behaves any differently than RELATIVE in this case. Flushing should make the read and write indexes the same, and the read index shouldn't move before the first write, if you have prebuffering enabled (and the code[1] looks like you do have prebuffering enabled).

[1] https://github.com/mpv-player/mpv/blob/master/audio/out/ao_pulse.c#L463
Comment 12 Niklas Haas 2016-09-13 21:58:53 UTC
I'm not entirely sure in which order what happens, and these things run on different threads. I don't know the way the audio travels up and down the mpv code well enough to answer ordering questions for you, though if it's important I could add a stats event to every call in ao_pulse to get a clear picture of what's going on. (And maybe reproduce it externally)
Comment 13 Tanu Kaskinen 2016-09-13 22:40:52 UTC
(In reply to Niklas Haas from comment #12)
> I'm not entirely sure in which order what happens, and these things run on
> different threads. I don't know the way the audio travels up and down the
> mpv code well enough to answer ordering questions for you, though if it's
> important I could add a stats event to every call in ao_pulse to get a clear
> picture of what's going on. (And maybe reproduce it externally)

Did you interpret my message so that I was asking you questions about the ordering of something? I did no such thing. I just stated that in this specific case it shouldn't matter whether you use RELATIVE or RELATIVE_ON_READ on the first write after seeking, because the read and write indexes of the stream buffer should be the same due to the flush (and prebuffering should ensure that the read index won't move before the first write).

Is there some uncertainty about the ordering of these calls?:

/* writes preceding the seek */
...
pa_stream_write(RELATIVE);
pa_stream_write(RELATIVE);
pa_stream_write(RELATIVE);

/* seek! */
pa_stream_cork(true);
pa_stream_flush();
pa_stream_cork(false);

/* writes after the seek */
pa_stream_write(RELATIVE_ON_READ);
pa_stream_write(RELATIVE);
pa_stream_write(RELATIVE);
...

Reading the ao_pulse code, cork/flush/uncork clearly happen in that order. The only possible uncertainty is that is the RELATIVE_ON_READ write really the first write after the seek.

Assuming that the ordering is as presented above, and the observations tell that there actually is some difference between RELATIVE and RELATIVE_ON_READ, then that indicates a bug in pulseaudio (or my understanding of the code is just wrong).
Comment 14 Niklas Haas 2016-09-13 23:16:03 UTC
> Did you interpret my message so that I was asking you questions about the
> ordering of something? I did no such thing.

No, I was just pointing out that care needs to be taken when making assumptions
about what order various calls happen in. Although I now realize the point you
were making is independent of the order of events because it does not concern
the delay measurement, just the seek mode.

> Reading the ao_pulse code, cork/flush/uncork clearly happen in that order. The
> only possible uncertainty is that is the RELATIVE_ON_READ write really the
> first write after the seek.

Here is my code for reference:

diff --git a/audio/out/ao_pulse.c b/audio/out/ao_pulse.c
index 5a68553..a02ccb3 100644
--- a/audio/out/ao_pulse.c
+++ b/audio/out/ao_pulse.c
@@ -62,6 +62,9 @@ struct priv {
     char *cfg_sink;
     int cfg_buffer;
     int cfg_latency_hacks;
+
+    // XXX testing
+    bool want_reset;
 };
 
 #define GENERIC_ERR_MSG(str) \
@@ -518,9 +521,15 @@ static void cork(struct ao *ao, bool pause)
 static int play(struct ao *ao, void **data, int samples, int flags)
 {
     struct priv *priv = ao->priv;
+    enum pa_seek_mode seekmode = PA_SEEK_RELATIVE;
+    if (priv->want_reset) {
+        seekmode = PA_SEEK_RELATIVE_ON_READ;
+        priv->want_reset = false;
+    }
+
     pa_threaded_mainloop_lock(priv->mainloop);
     if (pa_stream_write(priv->stream, data[0], samples * ao->sstride, NULL, 0,
-                        PA_SEEK_RELATIVE) < 0) {
+                        seekmode) < 0) {
         GENERIC_ERR_MSG("pa_stream_write() failed");
         samples = -1;
     }
@@ -545,6 +554,7 @@ static void reset(struct ao *ao)
         !priv->retval)
         GENERIC_ERR_MSG("pa_stream_flush() failed");
     cork(ao, false);
+    priv->want_reset = true;
 }
 
 // Pause the audio stream by corking it on the server

Based on the way I implemented it, the stream flushing could should affect the
seek mode of exactly the next pa_stream_write and no more. play() itself
will only ever be called by a single thread, so it can't be the case that
two threads enter this code at the exact same time. But I'll add some printf
debugging to this to be sure:

...
relative
relative
relative
relative
relative
want_reset = true
relative_on_read
relative
relative
relative
relative
...

So yes, it happens in exactly that order.

I double, triple and quadruple checked (running the test multiple times for each
case) and confirmed that the result is definitely different: With this patch
applied, I get only small spikes in ao-delay which translate to small
discontinuities of about 10-20ms in ao-dev.

If I take the exact same source code, replace PA_SEEK_RELATIVE_ON_READ by
PA_SEEK_RELATIVE in the branch I added, recompile and rerun the test, I get the
same behavior as in my first post again, which is the same behavior as without
this diff. So it isn't just down to pure chance.
Comment 15 Niklas Haas 2017-09-18 01:15:24 UTC
Providing a small update to this: The mpv option --pulse-latency-hacks=yes, which disables interpolation/smoothing and does it “manually” inside mpv, works around this issue for me. So it's no longer a huge priority. (Although that option is off by default, so other users may still be affected by this bug)
Comment 16 GitLab Migration User 2018-07-30 09:41:00 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/pulseaudio/pulseaudio/issues/90.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.