Bug 95472

Summary: [i915] Feature request: Add support for fencing for PRIME setups
Product: DRI Reporter: Mike Lothian <mike>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: enhancement    
Priority: medium CC: davyaxel0, intel-gfx-bugs, mario.kleiner, michel, mike, pavel.ondracka, shawn.starr
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: ALL i915 features: GEM/Other
Attachments:
Description Flags
Journalctl -f
none
Add fences to dma-buf shared with amdgpu/radeon/nouveau
none
dmesg output
none
dmesg output
none
dmesg resume fail
none
gpu crash dump
none
Force linear-buffer for igfx
none
Basic pageflip dmabuf-fence tracing for i915 + other dGPU.
none
Tracing with absolute timestamps, comment in proper line for GPU combo.
none
reservation object wait on working setup radeon-kms + r600
none
Absolute timestamp trace on properly working radeon-kms + r600
none
Absolute timestamp trace on Z-Tearing amdgpu-kms + radeonsi
none
Fixes prime sync problems for pageflips.
none
Test patch for prime synced windowed swaps. none

Description Mike Lothian 2016-05-18 12:42:20 UTC
This seems to have been outstanding for a number of years, originally I was told it would come with the atomic work but it doesn't appear to have materialised 

Could we use this bug as a tracker of some kind?

It would be great to eliminate tearing on my Skylake/Tonga laptop
Comment 1 Chris Wilson 2016-08-04 19:44:37 UTC
9Halfway there:

commit ad778f8967ea2f0bfda02701f918bcfcd495b721
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Aug 4 16:32:42 2016 +0100

    drm/i915: Export our request as a dma-buf fence on the reservation object
    
    If the GEM objects being rendered with in this request have been
    exported via dma-buf to a third party, hook ourselves into the dma-buf
    reservation object so that the third party can serialise with our
    rendering via the dma-buf fences.
    
    Testcase: igt/prime_busy
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-26-git-send-email-chris@chris-wilson.co.uk


Exporting rendering from i915.ko to others (nouveau, radeon/amdgpu) should work. Import rendering from others to flip should work (reverse prime + tearfree etc). Last remaining (missing) trick is importing rendering from others into our rendering for updates (windowed swapbuffers, should affect all render offload to a dgpu iirc).
Comment 2 Mike Lothian 2016-08-04 19:46:25 UTC
Great news, let me know when you've completed the second bit and I'll happily test it out for you with amdgpu
Comment 3 Chris Wilson 2016-08-04 19:56:01 UTC
This https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=fence branch should be with everything in place.
Comment 4 Mike Lothian 2016-08-04 20:04:30 UTC
Is it enabled automatically, or do I need to do something to enable it? Are any userspace components required?
Comment 5 Chris Wilson 2016-08-04 20:21:13 UTC
The fencing in the kernel is enabled (not even opt out). X/Prime has been assuming this works since its inception, but there may still be a bug or two in delaying batches, a missing flush, when surfaces have been passed between drivers. Both DRI2 and DRI3 with xrandr should just work. Outside of X, I'm not sure what the current state is.
Comment 6 Mike Lothian 2016-08-04 21:35:31 UTC
When launch Metro 2033 Redux where I normally see I lot of tearing I get a system lock up with this kernel, drm-next, agd5f-4.9-wip, Linus's and intel-nightly all work fine

In the few seconds before lockup it looks just as bad for tearing during the video intro
Comment 7 Mike Lothian 2016-08-04 21:37:03 UTC
Created attachment 125545 [details]
Journalctl -f

Nothing obvious when tailing journalctl remotely
Comment 8 Mike Lothian 2016-08-04 21:45:58 UTC
Just to be clear all those messages appear when things do work correctly. If there's anything else I can do to help debug this let me know
Comment 9 Chris Wilson 2016-08-04 22:01:39 UTC
Bisecting that lockup would be invaluable to me.

DRI2 render offload goes something like:

GL client renders to back
 -> DRI2SwapBuffers
    -> X/dgpu CopyRegion2, copy back to front + post damage
       -> DamageEvent
          -> compositor copies on igfx damaged front onto framebuffer and flips
             -> DRI2SwapBuffers
                -> X/igfx flips

(DRI3 is a couple of steps simpler, in that the dgpu copy is done by the GL client and only the X/igfx is involved with handling Present)

The point at which we require the fence is between the copy by the dgpu inside X and the copy by the igfx inside the compositor. The buffer is passed between dgpu and igfx using a dma-buf, which also carries along the implicit fences. So long as the fencing is correct that should be fine.

Q: Is it only the PRIME offload that tears? Does enabling TearFree do anything? (Just trying to ascertain that it is the fence and not the last SwapBuffers that is at fault.)

If you want to look at intel-gpu-tools/tests/prime_vgem, tests/prime_busy should test the basic export/import of fences. I'll patch in some debug messages tomorrow to show whether we are seeing fences in i915 execbuf.
Comment 10 Mike Lothian 2016-08-04 22:37:16 UTC
I'm running a DRI3 setup so I don't need to run any xrandr commands to get prime offload working

I've never needed or used tearfree

The game runs (albeit slowly) on i915 only

Where would be a good point to start bisecting from?
Comment 11 Mike Lothian 2016-08-04 23:39:05 UTC
I've bisected back to:

6e934d43fe55156c68a2e55f1d51f182003b3b0b is the first bad commit
commit 6e934d43fe55156c68a2e55f1d51f182003b3b0b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 6 14:21:07 2016 +0100

    nonblocking-dispatch

:040000 040000 ee65c1addbef2b6965be4bce69c5af051aa83b07 a72d3da61b145d64f7d8da94c6a0f752f80a8f78 M      drivers

Based on:

git bisect start
# bad: [37dbdf6f78b5e33a3d640a431fc077b358654152] in-out-fence
git bisect bad 37dbdf6f78b5e33a3d640a431fc077b358654152
# good: [eebe64201e8d775c63c034265a3d5f2200ae8070] drm/i915: Micro-optimise i915_get_ggtt_vma_pages()
git bisect good eebe64201e8d775c63c034265a3d5f2200ae8070
# bad: [a2dd67f066f2cb854ff7978eac273030b45c1e27] drm/i915: Serialise execbuf operation after a dma-buf reservation object
git bisect bad a2dd67f066f2cb854ff7978eac273030b45c1e27
# good: [28f79ffbce93e805a6cd7d65c1480cb9e84fd3ce] kfence: Extend kfences for listening on DMA fences
git bisect good 28f79ffbce93e805a6cd7d65c1480cb9e84fd3ce
# bad: [6e934d43fe55156c68a2e55f1d51f182003b3b0b] nonblocking-dispatch
git bisect bad 6e934d43fe55156c68a2e55f1d51f182003b3b0b
# good: [735a36acb474134c42ea48ac4871555999cf4f5f] eb-sync
git bisect good 735a36acb474134c42ea48ac4871555999cf4f5f
# first bad commit: [6e934d43fe55156c68a2e55f1d51f182003b3b0b] nonblocking-dispatch
Comment 12 Chris Wilson 2016-08-05 08:09:57 UTC
(In reply to Mike Lothian from comment #11)
> I've bisected back to:
> 
> 6e934d43fe55156c68a2e55f1d51f182003b3b0b is the first bad commit
> commit 6e934d43fe55156c68a2e55f1d51f182003b3b0b
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Jul 6 14:21:07 2016 +0100
> 
>     nonblocking-dispatch

Of course that's the key one for being able to wait on fences in the execbuf!

I've split that patch up and pushed the branch out to ~ickle/linux-2.6 #fence

Can you please test

commit c9941749bae4030bbadbe268916c1aa43bf0c130
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 5 09:01:21 2016 +0100

    drm/i915: Drive request submission through fence callbacks

which introduces the callback mechanics

and then

commit 9759f6e63e8fecbe5c1d251586cdfe713d20dce0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 5 09:02:16 2016 +0100

    drm/i915: Nonblocking request submission

which is the first use in anger (and presumably where you start to see the issue).

If you have your phone logged in at the time of the hang, can you get any diagnostics? For convenience, you probably want to set panic-on-oops and reboot-on-panic (with a timeout of 1 or 2s) in the kconfig.
Comment 13 Chris Wilson 2016-08-05 09:50:53 UTC
Thinking through the DRI3 chain.

GL/dgpu client renders into back buffer
-> SwapBuffers, GL/dgpu copies into linear buffer (dma-buf pixmap) and calls PresentPixmap
   -> X/igfx either flips the fullscreen dma-buf pixmap, or copies from the dma-buf pixmap into the frontbuffer

Here we should have a write/exclusive fence installed by the client and that should be waited upon by i915.ko inside X. The copy/flip are different call paths inside i915.ko, but both should be waiting for read access on the fence.

Does the behaviour differ if the rendered app is fullscreen (and make sure you have fullscreen unredirection enabled)?
Comment 14 Mike Lothian 2016-08-05 11:35:34 UTC
I'll test those when I get home tonight, I did however notice this in my logs whilst the laptop wasn't doing anything:

[30445.383085] ------------[ cut here ]------------
[30445.383088] WARNING: CPU: 2 PID: 1 at drivers/gpu/drm/i915/intel_ddi.c:1644 intel_ddi_pre_enable+0x2d8/0x330
[30445.383089] WARN_ON(intel_dp->active_streams != 0)
[30445.383090] Modules linked in:
[30445.383091] CPU: 2 PID: 1 Comm: systemd Tainted: G        W       4.7.0-fence+ #5
[30445.383092] Hardware name: Alienware Alienware 15 R2/Alienware 15 R2, BIOS 1.2.8 01/29/2016
[30445.383093]  0000000000000000 ffffffff81351c5c ffff88017e1ff878 0000000000000000
[30445.383094]  ffffffff8108f1c9 ffff8804b22c4000 ffff88017e1ff8c8 ffff8804b22c2000
[30445.383095]  ffff8804b22d0000 0000000000000000 ffff8804b22c2000 ffffffff8108f23a
[30445.383095] Call Trace:
[30445.383097]  [<ffffffff81351c5c>] ? dump_stack+0x46/0x5a
[30445.383098]  [<ffffffff8108f1c9>] ? __warn+0xb9/0xe0
[30445.383099]  [<ffffffff8108f23a>] ? warn_slowpath_fmt+0x4a/0x50
[30445.383100]  [<ffffffff815aff98>] ? intel_ddi_pre_enable+0x2d8/0x330
[30445.383101]  [<ffffffff8159fce4>] ? intel_set_cpu_fifo_underrun_reporting+0x124/0x2a0
[30445.383103]  [<ffffffff815923fa>] ? haswell_crtc_enable+0x2ca/0x870
[30445.383104]  [<ffffffff8158e91d>] ? intel_atomic_commit_tail+0x7fd/0x1000
[30445.383105]  [<ffffffff8144e519>] ? drm_atomic_helper_swap_state+0x149/0x2f0
[30445.383106]  [<ffffffff8158f511>] ? intel_atomic_commit+0x3f1/0x500
[30445.383108]  [<ffffffff8147308c>] ? drm_atomic_check_only+0x17c/0x600
[30445.383109]  [<ffffffff81452c97>] ? restore_fbdev_mode+0x147/0x270
[30445.383110]  [<ffffffff8147257b>] ? drm_modeset_lock_all_ctx+0x9b/0xb0
[30445.383111]  [<ffffffff814546e9>] ? drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x70
[30445.383112]  [<ffffffff81454754>] ? drm_fb_helper_set_par+0x24/0x50
[30445.383114]  [<ffffffff815a845e>] ? intel_fbdev_set_par+0xe/0x60
[30445.383115]  [<ffffffff813a8e3e>] ? fb_set_var+0x1fe/0x3e0
[30445.383116]  [<ffffffff8119f1bc>] ? __slab_free+0x9c/0x320
[30445.383117]  [<ffffffff813a3203>] ? fbcon_blank+0x2b3/0x2f0
[30445.383118]  [<ffffffff81425355>] ? do_unblank_screen+0xc5/0x180
[30445.383119]  [<ffffffff8141bb04>] ? vt_ioctl+0x4c4/0x1220
[30445.383120]  [<ffffffff81416fed>] ? tty_mode_ioctl+0x1ad/0x4b0
[30445.383121]  [<ffffffff814111e5>] ? tty_ioctl+0x315/0xbf0
[30445.383122]  [<ffffffff811cc2db>] ? do_vfs_ioctl+0x8b/0x5c0
[30445.383124]  [<ffffffff8109e6d4>] ? do_sigaction+0x174/0x190
[30445.383125]  [<ffffffff811cc846>] ? SyS_ioctl+0x36/0x70
[30445.383126]  [<ffffffff81001c19>] ? do_syscall_64+0x59/0x140
[30445.383127]  [<ffffffff8108733c>] ? entry_SYSCALL64_slow_path+0x25/0x25
[30445.383133] ---[ end trace 8d21bfa42fc3be8f ]---

Not sure if its related
Comment 15 Chris Wilson 2016-08-05 11:38:18 UTC
(In reply to Mike Lothian from comment #14)
> I'll test those when I get home tonight, I did however notice this in my
> logs whilst the laptop wasn't doing anything:
> 
> [30445.383085] ------------[ cut here ]------------
> [30445.383088] WARNING: CPU: 2 PID: 1 at
> drivers/gpu/drm/i915/intel_ddi.c:1644 intel_ddi_pre_enable+0x2d8/0x330
> [30445.383089] WARN_ON(intel_dp->active_streams != 0)

> Not sure if its related

It's unrelated, but new, so I think Ville would be interested in seeing it. Please do open a bug report with the full dmesg.
Comment 16 Chris Wilson 2016-08-05 13:49:16 UTC
Oh, I think I know what the problem is there. I had dropped a patch from that branch not realising it was required. Update #fence, it shouldn't hang now *fingers crossed*
Comment 17 Chris Wilson 2016-08-05 13:57:54 UTC
Oh noes. I don't see anywhere in amdgpu where they attach a fence to the dma-buf, or wait upon it. :|

Are you using drm/amdgpu or drm/radeon?
Comment 18 Chris Wilson 2016-08-05 15:04:21 UTC
Created attachment 125559 [details] [review]
Add fences to dma-buf shared with amdgpu/radeon/nouveau

One compile-only tested patch that should add the missing fences required for DRI3.
Comment 19 Mike Lothian 2016-08-05 16:54:20 UTC
The good news is the system has stopped crashing

The bad news is I'm still getting tearing with DRI3
Comment 20 Chris Wilson 2016-08-05 17:14:41 UTC
(In reply to Mike Lothian from comment #19)
> The bad news is I'm still getting tearing with DRI3

Did you try with the patch from https://bugs.freedesktop.org/attachment.cgi?id=125559 ? Be prepared for misery... It may be a bit hopeful.
Comment 21 Mike Lothian 2016-08-05 18:04:36 UTC
Yes I applied that patch, the tearing is really obvious in Metro 2033 Redux intro video, I've also confirmed it in Tomb Raider in the game play. Would you like me to try it with DRI2 too?
Comment 22 Chris Wilson 2016-08-05 18:10:44 UTC
(In reply to Mike Lothian from comment #21)
> Yes I applied that patch, the tearing is really obvious in Metro 2033 Redux
> intro video, I've also confirmed it in Tomb Raider in the game play. Would
> you like me to try it with DRI2 too?

Only if you are bored. I'm going to try and get my Haswell/nvidia combo working this evening and at least get to the point of being able to trace the fences coming out of nouveau and into i915.
Comment 23 Mike Lothian 2016-08-05 18:13:39 UTC
OK, so I see less tearing with DRI2 but it's still there (this has always been the case though) The desktop seems more flakey though, more flickering. It's the first time I've used DRI2 in a while though so it might not be your kernel patches that are causing this
Comment 24 Chris Wilson 2016-08-05 19:08:53 UTC
Comment on attachment 125559 [details] [review]
Add fences to dma-buf shared with amdgpu/radeon/nouveau

Hmm, fences on the dma-buf are exported from ttm through gem_prime_res_obj which looks sufficient to couple the reservation objects together.
Comment 25 Chris Wilson 2016-08-05 21:11:22 UTC
Hmm, I am definitely seeing fences being passed from nouveau to i915 with DRI3.

Can you please try with something like:

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 98294f1..55a54cd 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1442,6 +1442,8 @@ static int eb_add_reservations(struct reservation_object *resv,
                        if (!fence_get_rcu(fence))
                                continue;
 
+                       printk(KERN_ERR "third party read fence\n");
+
                        ret = kfence_await_dma_fence(&req->submit, fence,
                                                     GFP_NOWAIT | __GFP_NOWARN);
                        fence_put(fence);
@@ -1452,6 +1454,7 @@ static int eb_add_reservations(struct reservation_object *resv,
 
        fence = rcu_dereference(resv->fence_excl);
        if (fence && !fence_is_i915(fence) && fence_get_rcu(fence)) {
+               printk(KERN_ERR "third party write fence\n");
                ret = kfence_await_dma_fence(&req->submit, fence,
                                             GFP_NOWAIT | __GFP_NOWARN);

just to make sure we are seeing the fences?
Comment 26 Mike Lothian 2016-08-05 23:43:11 UTC
Created attachment 125573 [details]
dmesg output

So plenty of fences according to dmesg but there's still lots of tearing
Comment 27 Chris Wilson 2016-08-06 06:27:54 UTC
Next attempt,

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f72e8e68e394..d29d6824ece3 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1447,6 +1447,8 @@ static int eb_add_reservations(struct reservation_object *resv,
                        fence_put(fence);
                        if (ret < 0)
                                goto err;
+
+                       pr_err("third party read fence: %d\n", ret);
                }
        }
 
@@ -1457,6 +1459,8 @@ static int eb_add_reservations(struct reservation_object *resv,
                fence_put(fence);
                if (ret < 0)
                        goto err;
+
+               pr_err("third party write fence: %d\n", ret);
        }
 
        ret = 0;

just to confirm whether the fences are no-ops or not.
Comment 28 Mike Lothian 2016-08-06 14:10:03 UTC
Created attachment 125575 [details]
dmesg output

Here's the output with your two patches
Comment 29 Chris Wilson 2016-08-06 14:22:58 UTC
That tells us that we never wait upon a fence as it is always complete by the time we copy from the shared buffer.
Comment 30 Mike Lothian 2016-08-06 15:29:50 UTC
Hmm well I'm guessing something isn't quite right as I'm seeing really bad tearing

I'm using SNA on the xf86-video-intel driver
Comment 31 Michel Dänzer 2016-08-07 05:50:34 UTC
FWIW, I wouldn't test with DRI2, because that uses a single shared buffer. Even if there is proper synchronization between the dGPU rendering and the iGPU reading its output, I suspect there may be nothing preventing the dGPU from rendering the next frame before the iGPU is done reading the previous one.
Comment 32 Mike Lothian 2016-08-07 07:42:17 UTC
Is there something wrong with the patch that used to be part of the bug? I've still been using that one
Comment 33 Chris Wilson 2016-08-07 11:41:02 UTC
(In reply to Mike Lothian from comment #32)
> Is there something wrong with the patch that used to be part of the bug?
> I've still been using that one

Yes, it should not be required - at best it just adds the fence twice.
Comment 34 Mike Lothian 2016-08-08 20:01:01 UTC
I've retested with your latest updates to the branch, same issue and the fences always report 0

The tearing I'm seeing looks like an inverted z so from the top left to the bottom right with horizontal lines a 1/5 from the top and 1/5 from the bottom of the screen. I sometimes see older frames too which makes the picture look jerky sometimes
Comment 35 Mike Lothian 2016-08-08 20:14:20 UTC
Created attachment 125609 [details]
dmesg resume fail
Comment 36 Mike Lothian 2016-08-08 20:15:03 UTC
Created attachment 125610 [details]
gpu crash dump
Comment 37 Chris Wilson 2016-08-12 07:17:28 UTC
Refreshed the #fence branch. Still thinking about how to identify where the tears occur. The diagonal tearing is at least indicative of a GPU tear (as opposed to scanout).
Comment 38 Chris Wilson 2016-08-12 18:31:15 UTC
This mesa patch

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index 99e2897..9be5e72 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -167,7 +167,7 @@ loader_dri3_drawable_init(xcb_connection_t *conn,
    draw->vtable = vtable;
    draw->drawable = drawable;
    draw->dri_screen = dri_screen;
-   draw->is_different_gpu = is_different_gpu;
+   draw->is_different_gpu = true;
 
    draw->have_back = 0;
    draw->have_fake_front = 0;

will force every client to use the linear shared buffer + local copy. Can you please test that using DRI_PRIME=0 (i.e. with igfx) and see if it tears? (Obviously there is still a big difference in drivers that might hide any issue, as well as any fence now being native...) On the other hand, if it tears we are closer to finding the right forest.
Comment 39 Mike Lothian 2016-08-13 15:52:52 UTC
With that patch applied Plasma 5 doens't start. If I install it after Plasma 5 starts then apps won't start
Comment 40 Chris Wilson 2016-08-15 19:58:15 UTC
Created attachment 125805 [details] [review]
Force linear-buffer for igfx

Seems like i965 was missing the blitImage callback.
Comment 41 Mike Lothian 2016-08-16 09:48:01 UTC
Can't currently test your kernel due to this build error:

drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_vma_unbind’:
drivers/gpu/drm/i915/i915_gem.c:2824:23: error: ‘struct drm_i915_gem_object’ has no member named ‘pages’
  if (vma->pages != obj->pages) {
                       ^~
Will try out your mesa patch with my previous build
Comment 42 Mike Lothian 2016-08-16 10:13:01 UTC
So I'm seeing some pretty funky graphical glitches with that mesa patch. The compositor doesn't show things sometimes, especially in steam. So when I open the configuration options the new window is blank. It'll eventually appear, when it's shut the window behind appears blank. When playing metro 2033 redux I'm not getting the tearing but I do seem to see older frames sometimes like things are being presented out of sequence 

This can be seen clearly on the main menu when the woman and boy are pacing backwards and forwards
Comment 43 Chris Wilson 2016-08-16 19:27:12 UTC
There's no synchronisation point in dri3 with X:

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index a55719a..6b5736b 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -1399,6 +1399,7 @@ loader_dri3_get_buffers(__DRIdrawable *driDrawable,
       buffers->back = back->image;
    }
 
+   loader_dri3_wait_x(draw);
    draw->stamp = stamp;
 
    return true;

brings it into line with dri2.
Comment 44 Axel Davy 2016-08-16 21:08:49 UTC
(In reply to Chris Wilson from comment #43)
> There's no synchronisation point in dri3 with X:
> 
> diff --git a/src/loader/loader_dri3_helper.c
> b/src/loader/loader_dri3_helper.c
> index a55719a..6b5736b 100644
> --- a/src/loader/loader_dri3_helper.c
> +++ b/src/loader/loader_dri3_helper.c
> @@ -1399,6 +1399,7 @@ loader_dri3_get_buffers(__DRIdrawable *driDrawable,
>        buffers->back = back->image;
>     }
>  
> +   loader_dri3_wait_x(draw);
>     draw->stamp = stamp;
>  
>     return true;
> 
> brings it into line with dri2.

I don't see why waitX is required.

The call has only an impact when a fake front buffer is used.
And when fake front buffer is used, the same buffer is always used and it seems correctly updated when needed.

(In reply to Mike Lothian from comment #42)
Are you sure the dedicated card was not used for this test ?

It looks like the problem is the compositor copies the damaged windows contents before the blitImage calls are executed on the gpu. Could it be a problem with the flushing behaviour in intel_blit_image ?
Comment 45 Chris Wilson 2016-08-17 08:05:48 UTC
(In reply to Axel Davy from comment #44)
> (In reply to Chris Wilson from comment #43)
> > There's no synchronisation point in dri3 with X:
> > 
> > diff --git a/src/loader/loader_dri3_helper.c
> > b/src/loader/loader_dri3_helper.c
> > index a55719a..6b5736b 100644
> > --- a/src/loader/loader_dri3_helper.c
> > +++ b/src/loader/loader_dri3_helper.c
> > @@ -1399,6 +1399,7 @@ loader_dri3_get_buffers(__DRIdrawable *driDrawable,
> >        buffers->back = back->image;
> >     }
> >  
> > +   loader_dri3_wait_x(draw);
> >     draw->stamp = stamp;
> >  
> >     return true;
> > 
> > brings it into line with dri2.
> 
> I don't see why waitX is required.
> 
> The call has only an impact when a fake front buffer is used.
> And when fake front buffer is used, the same buffer is always used and it
> seems correctly updated when needed.

Because the tiled_buffer is not kept to date with X.
Comment 46 Mike Lothian 2016-08-17 09:22:28 UTC
Not sure if you want me to test that patch again with the loader_dri3_wait_x or just do it on it's own and with or without prime
Comment 47 Michel Dänzer 2016-08-25 05:35:31 UTC
FWIW, this is working correctly between the radeon and amdgpu kernel drivers. So no changes to those drivers or to userspace should be necessary.
Comment 48 Chris Wilson 2016-08-25 05:56:57 UTC
(In reply to Michel Dänzer from comment #47)
> FWIW, this is working correctly between the radeon and amdgpu kernel
> drivers. So no changes to those drivers or to userspace should be necessary.

You really should check the compositors for missing glXWaitX because they've been relying on the implicit flushes from DRI2 (hence the dri3_wait_x hack for refreshing the tiled image from the shared linear buffer for tfp support).
Comment 49 Michel Dänzer 2016-08-25 06:10:28 UTC
(In reply to Chris Wilson from comment #48)
> You really should check the compositors for missing glXWaitX because they've
> been relying on the implicit flushes from DRI2 (hence the dri3_wait_x hack
> for refreshing the tiled image from the shared linear buffer for tfp
> support).

We always flush before Damage events are sent to the compositor, so I'm not sure how that's relevant for this bug report.
Comment 50 Michel Dänzer 2016-08-25 07:07:11 UTC
(In reply to Michel Dänzer from comment #49)
> We always flush before Damage events are sent to the compositor, [...]

Actually, the modesetting driver doesn't do that yet, so it's better not to use that for the secondary GPU for now.
Comment 51 Mike Lothian 2016-08-25 15:49:34 UTC
Just and FYI, the fence and eb-fence branches aren't booting for me
Comment 52 Mike Lothian 2016-08-29 23:53:29 UTC
The intel nightly tree is booting for me again, I'll re-test your branch once it's been re-based
Comment 53 Mike Lothian 2016-08-30 08:53:18 UTC
So some good new, the playing frames out of order / jerky rendering has gone

I'm still seeing some tearing during the intro video - could this be related to it being a video rather than rendering?
Comment 54 Mario Kleiner 2016-09-04 21:45:04 UTC
To add some additional data.

I tested the current Linux 4.8-rc4 with DRI3/Present + PRIME renderoffload on:

a) Intel Ivybridge desktop + AMD Cedar (Evergreen family), radeon-kms + r600g
b) Intel Haswell desktop + AMD Tonga (VI family), amdgpu-kms + radeonsi.

This was tested with page-flipping, and the ftrace scripts i'll attach shortly.

In case a) page-flipping worked under all loads without any tearing or rendering artifacts. The traces show that i915 properly blocks on the dmabuf-fence / reservation objects for a couple of milliseconds, longer if the rendering load is higher.

This is consistent with other successful tests i did with radeon-kms/r600 on AMD evergreen + AMD evergreen, and with Intel + Nvidia under nouveau. In all cases i got the right behaviour under DRI + Prime renderoffload.

E.g. from i915_optimus_tracing.sh:

low load:

 2)               |  intel_mmio_flip_work_func [i915]() {
 2) # 3878.042 us |    reservation_object_wait_timeout_rcu();
 2) # 3882.433 us |  }

Output from tonga_tracing.sh with the proper line uncommented for intel+radeon-kms:

  PTB mainthread-3235  [002] ....  7474.200826: radeon_cs_ioctl <-drm_ioctl
  PTB mainthread-3224  [003] ....  7474.202235: reservation_object_wait_timeout_rcu <-radeon_gem_wait_idle_ioctl
  PTB mainthread-3235  [002] ....  7474.202277: radeon_cs_ioctl <-drm_ioctl
  PTB mainthread-3235  [002] ....  7474.202423: radeon_cs_ioctl <-drm_ioctl
               X-2588  [002] ....  7474.202592: intel_crtc_page_flip <-drm_mode_page_flip_ioctl
     kworker/2:1-47    [002] ....  7474.202606: intel_mmio_flip_work_func <-process_one_work
     kworker/2:1-47    [002] ....  7474.202606: reservation_object_wait_timeout_rcu <-intel_mmio_flip_work_func

==> Blocks as expected for a few msecs.

     kworker/2:1-47    [002] ....  7474.206213: intel_pipe_update_start <-intel_mmio_flip_work_func


Case b) otoh., i don't see the pageflip ever blocking on the reservation object, and while rendering looks good under low/medium load, i can get it to show incomplete rendering under high load. Exactly the Z-style tearing pattern one would expect if the Intel flips while the AMD Tonga still renders to the dmabuf.

E.g., output from tonga_tracing.sh with line for intel+amdgpu-kms under very high load, when Z-tearing happens:

  PTB mainthread-12766 [001] .... 12538.537618: amdgpu_cs_ioctl <-drm_ioctl
  PTB mainthread-12766 [001] .... 12538.537875: amdgpu_cs_ioctl <-drm_ioctl
  PTB mainthread-12766 [001] .... 12538.537906: amdgpu_cs_ioctl <-drm_ioctl
               X-12250 [000] .... 12538.537935: intel_crtc_page_flip <-drm_mode_page_flip_ioctl
     kworker/0:2-11651 [000] .... 12538.537957: intel_mmio_flip_work_func <-process_one_work
     kworker/0:2-11651 [000] .... 12538.537957: reservation_object_wait_timeout_rcu <-intel_mmio_flip_work_func

==> Falls through although the massive rendering (over)load should cause a ~ 50 msecs delay

     kworker/0:2-11651 [000] .... 12538.537957: intel_pipe_update_start <-intel_mmio_flip_work_func

Given that intel-kms with pageflipping seems to wait just fine on the dmabuf fence for both old AMD radeon-kms + r600 and for different NVidia's under nouveau, but it goes wrong with Tonga on amdgpu + radeonsi, it looks as if something is missing in amdgpu-kms or radeonsi?

I'll also try with Chris eb-fence branch how it behaves for windowed swaps.
Comment 55 Mario Kleiner 2016-09-04 21:47:39 UTC
Created attachment 126206 [details]
Basic pageflip dmabuf-fence tracing for i915 + other dGPU.
Comment 56 Mario Kleiner 2016-09-04 21:49:54 UTC
Created attachment 126207 [details]
Tracing with absolute timestamps, comment in proper line for GPU combo.
Comment 57 Mario Kleiner 2016-09-04 21:50:53 UTC
Created attachment 126208 [details]
reservation object wait on working setup radeon-kms + r600
Comment 58 Mario Kleiner 2016-09-04 21:51:33 UTC
Created attachment 126209 [details]
Absolute timestamp trace on properly working radeon-kms + r600
Comment 59 Mario Kleiner 2016-09-04 21:52:33 UTC
Created attachment 126210 [details]
Absolute timestamp trace on Z-Tearing amdgpu-kms + radeonsi
Comment 60 Michel Dänzer 2016-09-05 09:05:28 UTC
(In reply to Mario Kleiner from comment #54)
> Given that intel-kms with pageflipping seems to wait just fine on the dmabuf
> fence for both old AMD radeon-kms + r600 and for different NVidia's under
> nouveau, but it goes wrong with Tonga on amdgpu + radeonsi, it looks as if
> something is missing in amdgpu-kms or radeonsi?

I doubt it, since (I just double-checked again) it works correctly with the renderer using amdgpu and Xorg using radeon.

We had similar issues within amdgpu before 1ffd26524380 ("drm/amdgpu: fix waiting for all fences before flipping"). Maybe i915 can learn something from that? Then again, radeon still seems to only consider the exclusive fence.
Comment 61 Michel Dänzer 2016-09-06 10:00:49 UTC
One observation that might explain Mario's results is that intel_mmio_flip_work_func passes false for the wait_all parameter of reservation_object_wait_timeout_rcu. While the radeon driver sets exclusive fences, amdgpu only sets shared ones AFAICT.
Comment 62 Mario Kleiner 2016-09-06 14:21:38 UTC
Indeed, that looks like a good candidate! I'll retest all my Intel + dGPU setups, setting the wait_all flag to true in reservation_object_wait_timeout_rcu().
Comment 63 Mario Kleiner 2016-09-08 00:17:27 UTC
Created attachment 126288 [details] [review]
Fixes prime sync problems for pageflips.
Comment 64 Mario Kleiner 2016-09-08 00:23:40 UTC
So Michel was right, and with the attached patch, page-flipped renderoffload works nicely on Intel Haswell + AMD Tonga under amdgpu-kms, flips properly waiting on the dmabuf fence for longer running rendering. I also retested Intel Ivybridge + AMD Cedar under radeon-kms and Intel Ironlake + GeForce 330M under nouveau-kms to make sure it doesn't cause other problems.

Maybe we can still get this into Linux 4.8-rc6? That would be splendid.

I'll also attach a hacky patch on top of Chris eb-fences branch which seems to remove the Z-Tearing for windowed copy-swaps - and replaces it be the regular horizontal tearing we always have with DRI3/Present, even for single gpu, because we are too slow to get blitting started ahead of scanout.
Comment 65 Mario Kleiner 2016-09-08 00:27:54 UTC
Created attachment 126289 [details] [review]
Test patch for prime synced windowed swaps.
Comment 66 Chris Wilson 2016-09-09 19:21:54 UTC
Fwiw, waits for external reservation_objects is now in drm-intel-nightly. This should do the right thing with nouveau, but amdgpu only uses shared fences so we are not yet waiting on writes from amdgpu. If we can't rely on implicit fencing from dmabuf, we will have to wait until we have explicit fencing between amdgpu/intel. *sigh*
Comment 67 Mike Lothian 2016-09-12 12:06:49 UTC
I'm a little unclear, should Mario's patch be applied to the eb-fence branch, or is that just a work around and another solution is being sought?
Comment 68 Mike Lothian 2016-09-20 18:54:18 UTC
I've traced back the tearing without prime to this commit:

05a76d3d6ad1ee9f9814f88949cc9305fc165460 is the first bad commit 
commit 05a76d3d6ad1ee9f9814f88949cc9305fc165460 
Author: Lyude <cpaul@redhat.com> 
Date:   Wed Aug 17 15:55:57 2016 -0400 

   drm/i915/skl: Ensure pipes with changed wms get added to the state 
    
   If we're enabling a pipe, we'll need to modify the watermarks on all 
   active planes. Since those planes won't be added to the state on 
   their own, we need to add them ourselves. 
    
   Signed-off-by: Lyude <cpaul@redhat.com> 
   Reviewed-by: Matt Roper <matthew.d.roper@intel.com> 
   Cc: stable@vger.kernel.org 
   Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> 
   Cc: Daniel Vetter <daniel.vetter@intel.com> 
   Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> 
   Cc: Hans de Goede <hdegoede@redhat.com> 
   Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> 
   Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-6-git-send-email-cpaul@redhat.com

Which is unrelated to your work
Comment 69 Mike Lothian 2016-11-03 13:24:51 UTC
With Mario's patch https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html I now have tear free gaming

Chris, Mario thanks for all your hard work

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.