Bug 97914 - Redraw lag on Ivy Bridge since 1f6dfc9df678
Summary: Redraw lag on Ivy Bridge since 1f6dfc9df678
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 97299 100379 100408 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-09-24 17:06 UTC by John Lindgren
Modified: 2019-11-27 13:46 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
X.org log (22.80 KB, text/x-log)
2016-09-24 17:06 UTC, John Lindgren
no flags Details
dmesg log (58.97 KB, text/x-log)
2016-09-24 17:06 UTC, John Lindgren
no flags Details
Flush rendering for glXWaitX (8.55 KB, patch)
2016-10-14 22:27 UTC, Chris Wilson
no flags Details | Splinter Review
Flush rendering for glXWaitX (10.11 KB, patch)
2016-10-14 22:55 UTC, Chris Wilson
no flags Details | Splinter Review
Flush rendering for glXWaitX (dri3) (2.92 KB, patch)
2016-10-14 23:59 UTC, Chris Wilson
no flags Details | Splinter Review
Combined patch for mesa 12.0.3 (12.09 KB, patch)
2016-10-15 18:04 UTC, John Lindgren
no flags Details | Splinter Review
DRI2/DRI3 sync (13.30 KB, patch)
2016-10-15 19:36 UTC, Chris Wilson
no flags Details | Splinter Review
Patch for mesa 12.0.3 (12.63 KB, patch)
2016-10-15 20:26 UTC, John Lindgren
no flags Details | Splinter Review
Tod's Xorg.0.log (25.61 KB, text/plain)
2016-10-29 16:59 UTC, Tod Jackson
no flags Details
Tod's glxinfo (25.05 KB, text/plain)
2016-10-29 17:00 UTC, Tod Jackson
no flags Details
Update the drawables once per render (5.23 KB, patch)
2016-11-20 09:25 UTC, Chris Wilson
no flags Details | Splinter Review

Description John Lindgren 2016-09-24 17:06:06 UTC
Created attachment 126761 [details]
X.org log

Screen updates are delayed since commit 1f6dfc9df678 in xf86-video-intel.  The problem is most noticeable when scrolling in Leafpad.

My system is a Dell Latitude E5530 (Core i5 3210m) running Arch Linux x86_64 with kernel 4.7.4 and X.org 1.18.4.  I am running XFCE+compton and using DRI2 acceleration (DRI3 has other problems on this system).
Comment 1 John Lindgren 2016-09-24 17:06:35 UTC
Created attachment 126762 [details]
dmesg log
Comment 2 John Lindgren 2016-09-24 17:10:16 UTC
Switching to DRI3 does not make any difference.
Comment 3 Chris Wilson 2016-09-25 07:18:10 UTC
What are you comptom settings?
Comment 4 John Lindgren 2016-09-25 14:30:41 UTC
compton --backend glx --vsync opengl-swc --detect-client-opacity
Comment 5 Chris Wilson 2016-09-27 12:11:03 UTC
Installed compton and leafpad on an ivb-celeron with xfce4, kernel/ddx/mesa are from git but an older xorg. Using "compton --backend glx --vsync opengl-swc --detect-client-opacity" seems smooth. Anything else that seems to easily provoke the latency?
Comment 6 John Lindgren 2016-09-27 13:09:36 UTC
Nothing else that is 100% repeatable.  Geany does the same thing occasionally when scrolling.  Menus in Thunar also occasionally show garbage (i.e. black with some random memory patterns) for a second before being painted.
Comment 7 John Lindgren 2016-09-28 19:23:12 UTC
Here is a video to illustrate:
https://app.box.com/s/2eckekxee4lja542yjzs7mnm1ymonp99

To repeat the problem, it's important to be using a traditional stepped scroll wheel (not a smooth-scrolling laptop touchpad) and to have the keyboard cursor offscreen (otherwise the blinking forces the screen to update).
Comment 8 Chris Wilson 2016-09-28 20:42:10 UTC
What's the most recent version of -intel you have tested?
Comment 9 John Lindgren 2016-09-28 20:51:02 UTC
Currently on:

linux 4.7.5-1
xorg-server 1.18.4-1
mesa 12.0.3-1
xf86-video-intel 1:2.99.917+708+g8f33f80-1
Comment 10 Chris Wilson 2016-10-14 18:59:39 UTC
Nearly a month later, I've finally got xfce4 + compton + leafpad setup to reproduce the issue.
Comment 11 Chris Wilson 2016-10-14 20:10:38 UTC
I think I'm starting to understand. It's not a missing wait/flush for the target drawable, but missing flushes on the source textures, i.e.

mesa diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c
index ee05f39..d63c89b 100644
--- a/src/glx/dri2_glx.c
+++ b/src/glx/dri2_glx.c
@@ -1007,8 +1007,7 @@ dri2_bind_tex_image(Display * dpy,
    if (pdraw != NULL) {
       psc = (struct dri2_screen *) base->psc;
 
-      if (!pdp->invalidateAvailable && psc->f &&
-           psc->f->base.version >= 3 && psc->f->invalidate)
+      if (psc->f && psc->f->base.version >= 3 && psc->f->invalidate)
         psc->f->invalidate(pdraw->driDrawable);
 
       if (psc->texBuffer->base.version >= 2 &&
Comment 12 Chris Wilson 2016-10-14 22:27:55 UTC
Created attachment 127312 [details] [review]
Flush rendering for glXWaitX
Comment 13 Chris Wilson 2016-10-14 22:55:05 UTC
Created attachment 127314 [details] [review]
Flush rendering for glXWaitX
Comment 14 Chris Wilson 2016-10-14 23:59:12 UTC
Created attachment 127315 [details] [review]
Flush rendering for glXWaitX (dri3)
Comment 15 Chris Wilson 2016-10-15 00:47:14 UTC
*** Bug 97299 has been marked as a duplicate of this bug. ***
Comment 16 John Lindgren 2016-10-15 18:04:58 UTC
Created attachment 127320 [details] [review]
Combined patch for mesa 12.0.3

I backported your two most recent patches to mesa 12.0.3 for testing.  With DRI2, the lag is fixed, but it is still there with DRI3.
Comment 17 Chris Wilson 2016-10-15 18:48:26 UTC
(In reply to John Lindgren from comment #16)
> Created attachment 127320 [details] [review] [review]
> Combined patch for mesa 12.0.3
> 
> I backported your two most recent patches to mesa 12.0.3 for testing.  With
> DRI2, the lag is fixed, but it is still there with DRI3.

Hmm, confirmed that the DRI3 glXWaitX() is insufficient. Time to start double checking it is doing what I think it should be.
Comment 18 Chris Wilson 2016-10-15 19:11:25 UTC
Small bug in the dri3 code failed to setup the sync:

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index f55f766..d192edf 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -1103,9 +1103,9 @@ dri3_update_drawable(__DRIdrawable *driDrawable,
          draw->is_pixmap = true;
          xcb_unregister_for_special_event(draw->conn, draw->special_event);
          draw->special_event = NULL;
-
-         dri3_attach_sync(draw);
       }
+
+      dri3_attach_sync(draw);
    }
    dri3_flush_present_events(draw);
    return true;

but that alone is not enough. The bug stems from rendering done by X between the glXWaitX() at the start of the cycle and the extraction of the damage rectangles. In DRI2 this is papered over by the flush from glXWaitX() being delayed until the texture-from-pixmap operation. We either need the same invalidation (+ flush) using dri3. In compton, the issue goes away with:

diff --git a/src/opengl.c b/src/opengl.c
index 5a98f4e..5e38599 100644
--- a/src/opengl.c
+++ b/src/opengl.c
@@ -1041,6 +1041,7 @@ glx_set_clip(session_t *ps, XserverRegion reg, const reg_data_t *pcache_reg) {
   if (!rects) {
     nrects = 0;
     rects = rects_free = XFixesFetchRegion(ps->dpy, reg, &nrects);
+    glXWaitX();
   }
   // Use one empty rectangle if the region is empty
   if (!nrects) {
Comment 19 Chris Wilson 2016-10-15 19:27:41 UTC
So to make DRI3 do the equivalent serialisation to DRI2 takes something like:

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index d192edf..f370c87 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -1430,6 +1430,15 @@ loader_dri3_get_buffers(__DRIdrawable *driDrawable,
 
       if (!front)
          return false;
+
+      if (draw->is_pixmap && draw->sync_fence) {
+         xshmfence_reset(draw->shm_fence);
+
+         xcb_sync_trigger_fence(draw->conn, draw->sync_fence);
+         xcb_flush(draw->conn);
+
+         xshmfence_await(draw->shm_fence);
+      }
    } else {
       dri3_free_buffers(driDrawable, loader_dri3_buffer_front, draw);
       draw->have_fake_front = 0;
diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 8bcdba3..7d0c69e 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -280,6 +280,7 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx, GLint target,
 {
    struct gl_framebuffer *fb = dPriv->driverPrivate;
    struct brw_context *brw = pDRICtx->driverPrivate;
+   __DRIscreen *dri_screen = pDRICtx->driScreenPriv;
    struct gl_context *ctx = &brw->ctx;
    struct intel_renderbuffer *rb;
    struct gl_texture_object *texObj;
@@ -294,7 +295,8 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx, GLint target,
       return;
 
    intel_update_renderbuffers(pDRICtx, dPriv,
-                              !pDRICtx->driScreenPriv->dri2.useInvalidate);
+                              !dri_screen->dri2.useInvalidate ||
+                             dri_screen->loaderPrivate);
 
    rb = intel_get_renderbuffer(fb, BUFFER_FRONT_LEFT);
    /* If the miptree isn't set, then intel_update_renderbuffers was unable

(on top of the earlier patch)
Comment 20 Chris Wilson 2016-10-15 19:36:01 UTC
Created attachment 127321 [details] [review]
DRI2/DRI3 sync
Comment 21 John Lindgren 2016-10-15 20:26:17 UTC
Created attachment 127322 [details] [review]
Patch for mesa 12.0.3

That fixed it, DRI2 and DRI3 are both working now.  Thank you!

Here's a backport of the patch to mesa 12.0.3.
Comment 22 Hans de Goede 2016-10-20 17:51:39 UTC
Hi,

I've confirmation from a Fedora user that the last mesa patch fixes this for them too. Chris, can you please post the patch upstream to the mesa-dev list ?

Regards,

Hans
Comment 23 John Lindgren 2016-10-29 02:42:46 UTC
Hi Chris,

Is there anything blocking this patch from going into mesa?
Comment 24 Tod Jackson 2016-10-29 16:02:02 UTC
I applied Chris' patch in comment 20 to mesa git at commit 2a4a86862c949055c71637429f6d5f2e725d07d8 and am still having issues with compton --backend glx --config /dev/null -b

As I'm typing this I can see the blinking cursor in this box disappear for longer than it should, and when I scroll with the mousewheel and then move the mouse cursor up it sort of seems to jump by itself. Going back to xf86-video-intel at 49daf5df124b5ae6c7508e934768c292f4143040 fixes it.

00:02.0 VGA compatible controller [0300]: Intel Corporation 3rd Gen Core processor Graphics Controller [8086:0166] (rev 09) (prog-if 00 [VGA controller])

Oh, git log is randomly super slow to draw too.

Should I make a separate report? 

(compton gives me so much grief ;-()
Comment 25 Chris Wilson 2016-10-29 16:35:35 UTC
(In reply to Tod Jackson from comment #24)
> I applied Chris' patch in comment 20 to mesa git at commit
> 2a4a86862c949055c71637429f6d5f2e725d07d8 and am still having issues with
> compton --backend glx --config /dev/null -b

Can you please attach your Xorg.log, glxinfo and all of your compton settings. The first two to make sure you have the patches applied and are uptodate, and I need to check you haven't got any of the compton optimisations that break synchronisation.
Comment 26 Tod Jackson 2016-10-29 16:59:26 UTC
Created attachment 127607 [details]
Tod's Xorg.0.log
Comment 27 Tod Jackson 2016-10-29 17:00:00 UTC
Created attachment 127608 [details]
Tod's glxinfo
Comment 28 Tod Jackson 2016-10-29 17:03:49 UTC
I start compton from .ratpoisonrc with the line:
exec compton --backend glx --config /dev/null -b

Its version is  compton-0.1_beta2
Comment 29 Tod Jackson 2016-10-29 17:20:32 UTC
Oh. Sorry.!.. reading my own Xorg log I didn't have LD_LIBRARY_PATH for my src/mesa libdirs... I'm so dumb. Everything seems fine now. I'm surprised everything seemed to work with only LIBGL_DRIVERS_PATH.
Comment 30 John Lindgren 2016-11-19 22:52:20 UTC
Okay, after a month with a patched mesa, I am still seeing issues with delayed updates/corrupt window contents.  Not in leafpad, but with the XFCE system menu.

Examples:
http://i.imgur.com/KhKossJ.jpg
http://i.imgur.com/dt8u3BK.jpg
http://i.imgur.com/kcMVRiI.jpg

As before, downgrading xf86-video-intel to 1-2.99.917+688+g49daf5d-1 seems to fix the problem.
Comment 31 Chris Wilson 2016-11-20 09:17:53 UTC
DRI2 or 3? For DRI2 there is yet another patch required to stop mesa switching render targets midway through its rendering. DRI3 shouldn't suffer the same. But this may be something else entirely (still looks like a DRI issue though).
Comment 32 Chris Wilson 2016-11-20 09:25:17 UTC
Created attachment 128079 [details] [review]
Update the drawables once per render
Comment 33 John Lindgren 2016-11-20 17:56:51 UTC
(In reply to Chris Wilson from comment #32)
> Created attachment 128079 [details] [review] [review]
> Update the drawables once per render

This didn't fix the problem for me.  I am using DRI2.
Comment 34 Chris Wilson 2017-03-24 15:56:29 UTC
*** Bug 100379 has been marked as a duplicate of this bug. ***
Comment 35 Chris Wilson 2017-03-27 08:41:12 UTC
*** Bug 100408 has been marked as a duplicate of this bug. ***
Comment 36 Jean Delvare 2017-03-27 09:16:00 UTC
For reference, I am seeing this bug on SLED 12 SP2, on an Ivy Bridge CPU, running:

kernel-default-4.4.49-92.11.1.x86_64
kernel-firmware-20160516git-19.4.noarch
xorg-x11-server-7.6_1.18.3-64.1.x86_64
Mesa-11.2.1-103.9.x86_64

My laptop is running Gnome 3.20, no compton here. Xorg.0.log says I use DRI2.

I have to admit I'm a bit puzzled. This bug only seems to happen on Ivy Bridge processors, using the updated intel ddx driver. It was reported on different kernel versions, using different desktop environments, and different Mesa versions. However you blame Mesa for it, even though it was working just fine before the intel driver optimization.

Can you explain why other hardware is not affected by the problem? The same optimizations were applied to the ati and amdgpu ddx drivers, shouldn't they hit the same problem in Mesa?

I looked at the Mesa git repository and I can't see any reference to this bug. It does not look like Chris' patches were ever applied there.

It's been almost 8 months since the regression was introduced, what's the plan?
Comment 37 John Lindgren 2017-03-27 13:05:49 UTC
(In reply to Jean Delvare from comment #36)
> I have to admit I'm a bit puzzled. This bug only seems to happen on Ivy
> Bridge processors, using the updated intel ddx driver. ...

#100408 indicates that Haswell is affected as well.

My impression is that most people affected by this bug are switching to the modesetting driver.  I did so a couple of months ago.  Glamor acceleration gives "good enough" 2D performance, and I haven't seen any similar drawing bugs.

I also question blaming Mesa for this since the same version(s) of Mesa is able to run modesetting on the same hardware without issue.  Theoretical questions of "whose fault is it" aside, introducing this optimization/regression into the Intel driver without sufficient testing shows a lack of concern on Intel's part for its customers, and I'll take that into account in my next hardware purchase.
Comment 38 Jean Delvare 2017-03-28 09:32:46 UTC
Indeed Haswell is affected as well ;-)

I just realized that Mesa includes driver-specific code. My previous comment naively assumed that Mesa was hardware-agnostic. Obviously, if Mesa includes intel-specific code, then the bug can indeed be in that part of the stack, and I understand how the issue could not affect other drivers having undergone similar optimizations. Sorry for the incorrect claim.

That being said, given that this is only a performance optimization, and its relevance was estimated before fixing the Mesa side of things, maybe it should be evaluated again with the Mesa changes applied. As I suppose the changes on the Mesa side will slow things down a bit again, it may no longer be worth it.

At any rate, if the changes are still deemed worth, the ones in Mesa should have gone in first. As it stands, the net result for users is a regression.
Comment 39 Andreas Reis 2018-08-08 18:10:28 UTC
Any update on the status of the "Update the drawables once per render" patch / this bug?

I've been using it ever since (still applies fine to git) since otherwise I'm continuing to see lags with both my Haswells on the terminal.

That was fine until about 1.5 weeks ago, since when some changes to mesa (I still have a build from two weeks ago showing no such behavior) made the Xserver not paint windows anymore, usually within minutes of starting.

(It's fine if I toggle to virtual console and restart X after a pkill, but it'll freeze it entirely if I don't & switch back to the "nonpainting" X instead, no longer reacting to any input.)

I'm not sure if that's actually due to some interaction with the patch, esp. since I still don't know how to exactly reproduce, but it *seems* to go away if its not applied to the build. Then ofc the lags return.
Comment 40 Andreas Reis 2018-09-07 15:20:01 UTC
So after finally getting round to bisect, the commit after which the freezes occur is:

dri3: For 1.2, use root window instead of pixmap drawable
https://cgit.freedesktop.org/mesa/mesa/commit/?id=03a61b977e1f6adb64658aa059ce53e766ff9ad9

Tested by reverting it from latest mesa master & compiling with the old patch (also using git versions of xserver and xf86-video-intel). No new freezes so far.

(The freezes had continued with having "sna: Disable the reduced flush optimisation" applied. Dunno if it actually does anything, but I've reverted that revert as it doesn't hurt.)
Comment 41 Martin Peres 2019-11-27 13:46:08 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-intel/issues/125.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.