Bug 97299 - Lag on terminal with compositing after 1f6dfc9
Summary: Lag on terminal with compositing after 1f6dfc9
Status: RESOLVED DUPLICATE of bug 97914
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
Depends on:
Reported: 2016-08-11 14:51 UTC by Andreas Reis
Modified: 2017-03-27 13:38 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:

screenshot (664.31 KB, image/jpeg)
2016-08-11 23:08 UTC, Andreas Reis
no flags Details
"fprintf + dri3_fence_*" patch (1.38 KB, patch)
2016-08-21 20:30 UTC, Andreas Reis
no flags Details | Splinter Review

Description Andreas Reis 2016-08-11 14:51:04 UTC
As written in report 97219, "sna: Only flush GPU bo for a damage event" introduced a lag on my urxvtc.

So far I've mostly (only?) seen it there and only mild ones, meaning to say it's 99% full lines that are delayed and rarely more than ~ ⅛ of the terminal's bottom screen that's outdated.

Can't find anything that reliably reproduces it, either. In fact, times when content doesn't just lag but diverges are so short I've been unable to get a proper screenshot. Times when it lags can be many seconds though if there are no further on-screen changes resp. input afterwards.

The terminal is set to use compositing for transparency and fading; relevant lines from .Xdefault should be:

urxvt*background:      rgba:1111/1111/1111/dddd
URxvt*depth:           32
URxvt.buffered:        true
URxvt*fading:          20%
URxvt*fadeColor:       black
URxvt*shading:         75

A Xorg log is in the report mentioned above. Compositor is compton https://github.com/chjj/compton , its config file:

backend = "glx";
vsync = "opengl";
glx-no-rebind-pixmap = true;
glx-no-stencil = true;
glx-use-gpushader4 = true;
unredir-if-possible = true;
sw-opti = true;

Occurs on:
* vanilla Xorg 1.18.4 & Arch's kernel 4.7
* Xorg git and drm-intel-nightly 2016y-08m-11d-10h-34m-38s
* libdrm and mesa are current git, as starting Xorg with Arch's mesa 12.0.1-7 somehow disables all Intel hardware acceleration (ie. also compositing)
* compiler for self-compiled packages was gcc (GCC) 6.1.1 20160807 with -mtune=native -O3 (Haswell)
Comment 1 Chris Wilson 2016-08-11 15:43:17 UTC
To isolate the changes:

diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
index 05007bc..4726067 100644
--- a/src/sna/sna_accel.c
+++ b/src/sna/sna_accel.c
@@ -17385,7 +17385,7 @@ sna_flush_callback(CallbackListPtr *list, pointer user_data, pointer call_data)
        struct sna *sna = user_data;
-       if (!sna->needs_dri_flush)
+       if (!sna->needs_dri_flush && 0)

should be equivalent to reverting the dri portion. That's worth double checking to see if the dri or shm halves.
Comment 2 Andreas Reis 2016-08-11 16:03:52 UTC
I indeed don't see the lag with the && 0.

My "test" is just messing with the terminal though (compiling stuff while switching between full-terminal mutt & tig on tmux).
Comment 3 Andreas Reis 2016-08-11 23:08:19 UTC
Created attachment 125724 [details]

Managed to get a screenshot after all. This is from repeated quick switching between two tmux windows, one running a compilation, the other with tig.
Comment 4 Chris Wilson 2016-08-17 20:43:06 UTC
Tracking through this the issue really appears to be that in dri2 there is a synchronisation point in comptom with X when it acquires the textures for composition, but in dri3 there is no equivalent X11 request. Instead it queries the XserverRegion corresponding to the damage (rather than tracking the latest damage through DamageNotifyEvents) which means that the rendering in the compositor lags behind the last flush in X when sending the damage event.

As I understand it, it is lacking an appropriate glXWaitX() like:

diff --git a/src/opengl.c b/src/opengl.c
index 5a98f4e..c6cea55 100644
--- a/src/opengl.c
+++ b/src/opengl.c
@@ -1050,6 +1050,8 @@ glx_set_clip(session_t *ps, XserverRegion reg, const reg_data_t *pcache_reg) {
     rects = &rect_blank;
+  glXWaitX();
   if (1 == nrects) {
Comment 5 Andreas Reis 2016-08-18 10:12:12 UTC
Unfortunately, even after applying that patch to compton the lag is still there.

Does seem like it could have gotten less bad, but it's still very visible.
Comment 6 Chris Wilson 2016-08-18 10:18:25 UTC
Yup, in mesa/dri3 there is no synchronisation yet.... It needs something like

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index 2f09431..3315828 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -554,8 +554,17 @@ loader_dri3_wait_x(struct loader_dri3_drawable *draw)
    struct loader_dri3_buffer *front;
    __DRIcontext *dri_context;
-   if (draw == NULL || !draw->have_fake_front)
+   if (draw == NULL)
+      return;
+   if (!draw->have_fake_front) {
+      struct loader_dri3_buffer *back = dri3_back_buffer(draw);
+      dri3_fence_reset(draw->conn, back);
+      dri3_fence_trigger(draw->conn, back);
+      dri3_fence_await(draw->conn, back);
+   }
    front = dri3_fake_front_buffer(draw);
    dri_context = draw->vtable->get_dri_context(draw);

to flesh out glXWaitX().
Comment 7 Andreas Reis 2016-08-18 10:41:01 UTC
applied to mesa and rebooted – still getting the lag
Comment 8 Chris Wilson 2016-08-18 10:58:19 UTC
Hmm. That *should* do an explicit flush of X rendering before reading from those surfaces.

We can't use xtrace because of DRI3 (so we can't confirm if comptom/mesa is signaling the flush) and in the ddx it is hard to identify who calls the sync-flush as it is called very regularly.

So printf debugging! First can you please confirm we are hitting the new code in mesa?
Comment 9 Andreas Reis 2016-08-21 20:30:36 UTC
Created attachment 125937 [details] [review]
"fprintf + dri3_fence_*" patch

Sorry for the delay, I was busy. Now I

* applied the attached "fprintf + dri3_fence_*" patch to loader_dri3_helper.c
* recompiled mesa as usual (ie, no added debug settings, as I didn't read that anywhere)
* added the following to my .zprofile (which afterwards calls startx):
    export LIBGL_DEBUG=verbose
    export MESA_DEBUG=1
    export MESA_LOG_FILE=/tmp/mesa.log
* also recompiled compton with the glXWaitX() for good measure
* and restarted.

Nothing. The MESA_LOG_FILE isn't even created. Did I miss a step?

GCC is at today's 20160821 now, xf86-video-intel at 12c14deb (without the && 0).
Comment 10 Chris Wilson 2016-09-09 22:15:15 UTC
glx-no-rebind-pixmap = true;

seems to be the culprit as it disables correct behaviour.
Comment 11 Andreas Reis 2016-09-10 22:58:04 UTC
Does appear to improve things, but unless my eyes deceive me, I'm still seeing some lag and contents of two tmux windows overlap while switching between.

Still using the glXWaitX'ed compton and patched mesa, current git.
Comment 12 Chris Wilson 2016-10-15 00:47:14 UTC

*** This bug has been marked as a duplicate of bug 97914 ***
Comment 13 Jean Delvare 2017-03-27 08:05:26 UTC
Andreas, for completeness, on what hardware do you get this bug?
Comment 14 Andreas Reis 2017-03-27 13:38:51 UTC
Two Haswells.

Actually I'm no longer getting it.

Have been running mesa-git with Chris's "DRI2/DRI3 sync" mesa patch from the bug report of which this one's been marked as duplicate.

compton.conf is just
backend = "glx";
vsync = "opengl-swc";

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.