Bug 98030

Summary: Stuttering video playback in totem after update to 1.19-rc1
Product: xorg Reporter: Piotr Drąg <piotrdrag>
Component: Server/GeneralAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: ajax, keithp, mario.kleiner
Version: unspecified   
Hardware: Other   
OS: All   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1381014
Whiteboard:
i915 platform: i915 features:

Description Piotr Drąg 2016-10-03 15:11:30 UTC
After updating to xorg-x11-server-1.19.0-0.1.20160929.fc25 and xorg-x11-drv-nouveau-1.0.13-1.fc25 (from <https://bodhi.fedoraproject.org/updates/FEDORA-2016-a5c3ebe67a>) video playback in totem stutters every few seconds to the point of being unwatchable. It doesn't happen with video playback (e.g. YouTube) in Firefox, and system generally works fine. Downgrading the update "fixes" the problem.


I'm using X11 with nouveau on a GT216M (GeForce GT 240M), no hybrid cards:

Providers: number : 1
Provider 0: id: 0x64 cap: 0x7, Source Output, Sink Output, Source Offload crtcs: 2 outputs: 3 associated providers: 0 name:nouveau


Hans de Goede in <https://bugzilla.redhat.com/show_bug.cgi?id=1381014> kindly helped to discover that xf86-video-nouveau-1.0.13 does not show this problem with xserver-1.18.4, but does show it with xserver-1.19-rc1, and that using the modesetting driver with xserver-1.19-rc1 makes the problem go away.
Comment 1 Ilia Mirkin 2016-10-03 21:30:08 UTC
Looks like some issue with DRI2. If you enable DRI3 support in your xorg.conf, should work fine (for DRI3 clients). [I realize this is not a fix, but rather a workaround.]
Comment 2 Ilia Mirkin 2016-10-03 21:40:15 UTC
Apparently the issue also exists with modesetting on top of nouveau - i.e. if you run e.g. LIBGL_DRI3_DISABLE=1 glxgears, you'll end up with the same lag. Trying to collect more data to see if this is nouveau-specific or if it's an issue across drivers.
Comment 3 Mario Kleiner 2016-10-13 21:39:49 UTC
Ok, i think this is related: On 1.19-rc1 (current git master) under DRI2, windowed OpenGL apps like glxgears get stuck unless i press keys on the keyboard or move the mouse to create input events. KDE Plasma 5, being OpenGL driven afaik, has the same problem.

If otoh i run applications which use fullscreen kms page-flipped windows, e.g., Gnome shell, or regular fullscreen GL apps, they work fine under DRI2. If i use DRI3/Present everything works fine. This both with a server built to use the new input-threads and also built without input-threads. This happens at least on nouveau-ddx and ati/amdgpu-ddx.

I just retested glxgears, my own app and totem under gdb, and without me providing mouse/keyboard input, they all get stuck in the DRI2GetBuffersWithFormat request to the X-Server, waiting for a reply. That gets called when Mesa needs new renderbuffers for a new frame after a swapbuffers request, e.g., when glxgears or totem calls glClear().

So far so bad. So something gets stuck in the servers dispatch loop, but "external" input events from the kernel (evdev, kms-pageflip completion events?) gets it unstuck?
Comment 4 Michel Dänzer 2016-10-14 03:29:37 UTC
I bisected this to

commit 8f1edf4bd3a1f050ce9eeb5eac45dd1a8f7a6d5e
Author: Keith Packard <keithp@keithp.com>
Date:   Thu May 19 13:59:54 2016 -0700

    dix: Use list for ready clients

    This converts the dispatch loop into using a list of ready clients
    instead of an array. This changes the WaitForSomething API so that it
    notifies DIX when a client becomes ready to read, instead of returning
    the set of ready clients.
Comment 5 Michel Dänzer 2016-10-14 03:31:21 UTC
(In reply to Mario Kleiner from comment #3)
> If otoh i run applications which use fullscreen kms page-flipped windows,
> e.g., Gnome shell, or regular fullscreen GL apps, they work fine under DRI2.

I can confirm that for mutter, but not for e.g. glxgears in fullscreen. Maybe compositors just get unblocked by something else, similar to how input unblocks all affected DRI2 clients.
Comment 6 Chris Wilson 2016-10-14 08:42:01 UTC
The change in 8f1edf4bd3a1f050ce9eeb5eac45dd1a8f7a6d5e is that we no longer wakeup and flush writers when space in the write queue becomes available - instead before we poll() we flush the writers iff NewOutputPending since the last poll(), i.e.

diff --git a/os/WaitFor.c b/os/WaitFor.c
index 8164c30..5b0c342 100644
--- a/os/WaitFor.c
+++ b/os/WaitFor.c
@@ -207,8 +207,7 @@ WaitForSomething(Bool are_ready)
                 (1000000 / MILLI_PER_SECOND);
             wt = &waittime;
         }
-        if (NewOutputPending)
-            FlushAllOutput();
+       FlushAllOutput();
         /* keep this check close to select() call to minimize race */
         if (dispatchException)
             i = -1;
Comment 7 Michel Dänzer 2016-10-14 08:57:43 UTC
(In reply to Chris Wilson from comment #6)
> -        if (NewOutputPending)
> -            FlushAllOutput();
> +       FlushAllOutput();

This doesn't fix the problem for me.
Comment 8 Chris Wilson 2016-10-14 09:17:35 UTC
(In reply to Michel Dänzer from comment #7)
> (In reply to Chris Wilson from comment #6)
> > -        if (NewOutputPending)
> > -            FlushAllOutput();
> > +       FlushAllOutput();
> 
> This doesn't fix the problem for me.

Yeah, because the first thing FlushAllOutput() does is check NewOutputPending. Bleh. :| I'm still pretty certain it is the change in considering when to flush output that causes the event/replies to become stuck.
Comment 9 Mario Kleiner 2016-10-20 00:25:46 UTC
Btw. Intel-ddx under uxa acceleration + DRI2 is also affected.
sna seems to be immune, even with Option TripleBuffer off.
Comment 10 Keith Packard 2016-10-28 15:06:55 UTC
WaitForSomething caches the value of clients_are_ready() as an 'optimization'. When dri2 wakesup clients, it does that by queuing a work proc. When that work proc is executed, the awoken client is now ready, but WaitForSomething is using the stale local value of are_ready. Recompute are_ready after calling ProcessWorkQueue fixes this.

--------------------------------- os/WaitFor.c ---------------------------------
index 7d5aa32..ff1c85e 100644
@@ -204,8 +204,10 @@ WaitForSomething(Bool are_ready)
        crashed connections and the screen saver timeout */
     while (1) {
         /* deal with any blocked jobs */
-        if (workQueue)
+        if (workQueue) {
             ProcessWorkQueue();
+            are_ready = clients_are_ready();
+        }
 
         if (are_ready)
             timeout = 0;
Comment 11 Michel Dänzer 2016-10-31 02:56:45 UTC
Fixed in 1.19 RC2:

commit 9ed5b263542e5245317927828f0515db6c0a54c8
Author: Keith Packard <keithp@keithp.com>
Date:   Fri Oct 28 08:04:43 2016 -0700

    os: Recompute whether any clients are ready after ProcessWorkQueue() (bug 98030)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.