Summary: | [i915] Feature request: Add support for fencing for PRIME setups | ||
---|---|---|---|
Product: | DRI | Reporter: | Mike Lothian <mike> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | enhancement | ||
Priority: | medium | CC: | davyaxel0, intel-gfx-bugs, mario.kleiner, michel, mike, pavel.ondracka, shawn.starr |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | ALL | i915 features: | GEM/Other |
Attachments: |
Description
Mike Lothian
2016-05-18 12:42:20 UTC
9Halfway there: commit ad778f8967ea2f0bfda02701f918bcfcd495b721 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Aug 4 16:32:42 2016 +0100 drm/i915: Export our request as a dma-buf fence on the reservation object If the GEM objects being rendered with in this request have been exported via dma-buf to a third party, hook ourselves into the dma-buf reservation object so that the third party can serialise with our rendering via the dma-buf fences. Testcase: igt/prime_busy Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-26-git-send-email-chris@chris-wilson.co.uk Exporting rendering from i915.ko to others (nouveau, radeon/amdgpu) should work. Import rendering from others to flip should work (reverse prime + tearfree etc). Last remaining (missing) trick is importing rendering from others into our rendering for updates (windowed swapbuffers, should affect all render offload to a dgpu iirc). Great news, let me know when you've completed the second bit and I'll happily test it out for you with amdgpu This https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=fence branch should be with everything in place. Is it enabled automatically, or do I need to do something to enable it? Are any userspace components required? The fencing in the kernel is enabled (not even opt out). X/Prime has been assuming this works since its inception, but there may still be a bug or two in delaying batches, a missing flush, when surfaces have been passed between drivers. Both DRI2 and DRI3 with xrandr should just work. Outside of X, I'm not sure what the current state is. When launch Metro 2033 Redux where I normally see I lot of tearing I get a system lock up with this kernel, drm-next, agd5f-4.9-wip, Linus's and intel-nightly all work fine In the few seconds before lockup it looks just as bad for tearing during the video intro Created attachment 125545 [details]
Journalctl -f
Nothing obvious when tailing journalctl remotely
Just to be clear all those messages appear when things do work correctly. If there's anything else I can do to help debug this let me know Bisecting that lockup would be invaluable to me. DRI2 render offload goes something like: GL client renders to back -> DRI2SwapBuffers -> X/dgpu CopyRegion2, copy back to front + post damage -> DamageEvent -> compositor copies on igfx damaged front onto framebuffer and flips -> DRI2SwapBuffers -> X/igfx flips (DRI3 is a couple of steps simpler, in that the dgpu copy is done by the GL client and only the X/igfx is involved with handling Present) The point at which we require the fence is between the copy by the dgpu inside X and the copy by the igfx inside the compositor. The buffer is passed between dgpu and igfx using a dma-buf, which also carries along the implicit fences. So long as the fencing is correct that should be fine. Q: Is it only the PRIME offload that tears? Does enabling TearFree do anything? (Just trying to ascertain that it is the fence and not the last SwapBuffers that is at fault.) If you want to look at intel-gpu-tools/tests/prime_vgem, tests/prime_busy should test the basic export/import of fences. I'll patch in some debug messages tomorrow to show whether we are seeing fences in i915 execbuf. I'm running a DRI3 setup so I don't need to run any xrandr commands to get prime offload working I've never needed or used tearfree The game runs (albeit slowly) on i915 only Where would be a good point to start bisecting from? I've bisected back to: 6e934d43fe55156c68a2e55f1d51f182003b3b0b is the first bad commit commit 6e934d43fe55156c68a2e55f1d51f182003b3b0b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jul 6 14:21:07 2016 +0100 nonblocking-dispatch :040000 040000 ee65c1addbef2b6965be4bce69c5af051aa83b07 a72d3da61b145d64f7d8da94c6a0f752f80a8f78 M drivers Based on: git bisect start # bad: [37dbdf6f78b5e33a3d640a431fc077b358654152] in-out-fence git bisect bad 37dbdf6f78b5e33a3d640a431fc077b358654152 # good: [eebe64201e8d775c63c034265a3d5f2200ae8070] drm/i915: Micro-optimise i915_get_ggtt_vma_pages() git bisect good eebe64201e8d775c63c034265a3d5f2200ae8070 # bad: [a2dd67f066f2cb854ff7978eac273030b45c1e27] drm/i915: Serialise execbuf operation after a dma-buf reservation object git bisect bad a2dd67f066f2cb854ff7978eac273030b45c1e27 # good: [28f79ffbce93e805a6cd7d65c1480cb9e84fd3ce] kfence: Extend kfences for listening on DMA fences git bisect good 28f79ffbce93e805a6cd7d65c1480cb9e84fd3ce # bad: [6e934d43fe55156c68a2e55f1d51f182003b3b0b] nonblocking-dispatch git bisect bad 6e934d43fe55156c68a2e55f1d51f182003b3b0b # good: [735a36acb474134c42ea48ac4871555999cf4f5f] eb-sync git bisect good 735a36acb474134c42ea48ac4871555999cf4f5f # first bad commit: [6e934d43fe55156c68a2e55f1d51f182003b3b0b] nonblocking-dispatch (In reply to Mike Lothian from comment #11) > I've bisected back to: > > 6e934d43fe55156c68a2e55f1d51f182003b3b0b is the first bad commit > commit 6e934d43fe55156c68a2e55f1d51f182003b3b0b > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Wed Jul 6 14:21:07 2016 +0100 > > nonblocking-dispatch Of course that's the key one for being able to wait on fences in the execbuf! I've split that patch up and pushed the branch out to ~ickle/linux-2.6 #fence Can you please test commit c9941749bae4030bbadbe268916c1aa43bf0c130 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Aug 5 09:01:21 2016 +0100 drm/i915: Drive request submission through fence callbacks which introduces the callback mechanics and then commit 9759f6e63e8fecbe5c1d251586cdfe713d20dce0 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Aug 5 09:02:16 2016 +0100 drm/i915: Nonblocking request submission which is the first use in anger (and presumably where you start to see the issue). If you have your phone logged in at the time of the hang, can you get any diagnostics? For convenience, you probably want to set panic-on-oops and reboot-on-panic (with a timeout of 1 or 2s) in the kconfig. Thinking through the DRI3 chain. GL/dgpu client renders into back buffer -> SwapBuffers, GL/dgpu copies into linear buffer (dma-buf pixmap) and calls PresentPixmap -> X/igfx either flips the fullscreen dma-buf pixmap, or copies from the dma-buf pixmap into the frontbuffer Here we should have a write/exclusive fence installed by the client and that should be waited upon by i915.ko inside X. The copy/flip are different call paths inside i915.ko, but both should be waiting for read access on the fence. Does the behaviour differ if the rendered app is fullscreen (and make sure you have fullscreen unredirection enabled)? I'll test those when I get home tonight, I did however notice this in my logs whilst the laptop wasn't doing anything: [30445.383085] ------------[ cut here ]------------ [30445.383088] WARNING: CPU: 2 PID: 1 at drivers/gpu/drm/i915/intel_ddi.c:1644 intel_ddi_pre_enable+0x2d8/0x330 [30445.383089] WARN_ON(intel_dp->active_streams != 0) [30445.383090] Modules linked in: [30445.383091] CPU: 2 PID: 1 Comm: systemd Tainted: G W 4.7.0-fence+ #5 [30445.383092] Hardware name: Alienware Alienware 15 R2/Alienware 15 R2, BIOS 1.2.8 01/29/2016 [30445.383093] 0000000000000000 ffffffff81351c5c ffff88017e1ff878 0000000000000000 [30445.383094] ffffffff8108f1c9 ffff8804b22c4000 ffff88017e1ff8c8 ffff8804b22c2000 [30445.383095] ffff8804b22d0000 0000000000000000 ffff8804b22c2000 ffffffff8108f23a [30445.383095] Call Trace: [30445.383097] [<ffffffff81351c5c>] ? dump_stack+0x46/0x5a [30445.383098] [<ffffffff8108f1c9>] ? __warn+0xb9/0xe0 [30445.383099] [<ffffffff8108f23a>] ? warn_slowpath_fmt+0x4a/0x50 [30445.383100] [<ffffffff815aff98>] ? intel_ddi_pre_enable+0x2d8/0x330 [30445.383101] [<ffffffff8159fce4>] ? intel_set_cpu_fifo_underrun_reporting+0x124/0x2a0 [30445.383103] [<ffffffff815923fa>] ? haswell_crtc_enable+0x2ca/0x870 [30445.383104] [<ffffffff8158e91d>] ? intel_atomic_commit_tail+0x7fd/0x1000 [30445.383105] [<ffffffff8144e519>] ? drm_atomic_helper_swap_state+0x149/0x2f0 [30445.383106] [<ffffffff8158f511>] ? intel_atomic_commit+0x3f1/0x500 [30445.383108] [<ffffffff8147308c>] ? drm_atomic_check_only+0x17c/0x600 [30445.383109] [<ffffffff81452c97>] ? restore_fbdev_mode+0x147/0x270 [30445.383110] [<ffffffff8147257b>] ? drm_modeset_lock_all_ctx+0x9b/0xb0 [30445.383111] [<ffffffff814546e9>] ? drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x70 [30445.383112] [<ffffffff81454754>] ? drm_fb_helper_set_par+0x24/0x50 [30445.383114] [<ffffffff815a845e>] ? intel_fbdev_set_par+0xe/0x60 [30445.383115] [<ffffffff813a8e3e>] ? fb_set_var+0x1fe/0x3e0 [30445.383116] [<ffffffff8119f1bc>] ? __slab_free+0x9c/0x320 [30445.383117] [<ffffffff813a3203>] ? fbcon_blank+0x2b3/0x2f0 [30445.383118] [<ffffffff81425355>] ? do_unblank_screen+0xc5/0x180 [30445.383119] [<ffffffff8141bb04>] ? vt_ioctl+0x4c4/0x1220 [30445.383120] [<ffffffff81416fed>] ? tty_mode_ioctl+0x1ad/0x4b0 [30445.383121] [<ffffffff814111e5>] ? tty_ioctl+0x315/0xbf0 [30445.383122] [<ffffffff811cc2db>] ? do_vfs_ioctl+0x8b/0x5c0 [30445.383124] [<ffffffff8109e6d4>] ? do_sigaction+0x174/0x190 [30445.383125] [<ffffffff811cc846>] ? SyS_ioctl+0x36/0x70 [30445.383126] [<ffffffff81001c19>] ? do_syscall_64+0x59/0x140 [30445.383127] [<ffffffff8108733c>] ? entry_SYSCALL64_slow_path+0x25/0x25 [30445.383133] ---[ end trace 8d21bfa42fc3be8f ]--- Not sure if its related (In reply to Mike Lothian from comment #14) > I'll test those when I get home tonight, I did however notice this in my > logs whilst the laptop wasn't doing anything: > > [30445.383085] ------------[ cut here ]------------ > [30445.383088] WARNING: CPU: 2 PID: 1 at > drivers/gpu/drm/i915/intel_ddi.c:1644 intel_ddi_pre_enable+0x2d8/0x330 > [30445.383089] WARN_ON(intel_dp->active_streams != 0) > Not sure if its related It's unrelated, but new, so I think Ville would be interested in seeing it. Please do open a bug report with the full dmesg. Oh, I think I know what the problem is there. I had dropped a patch from that branch not realising it was required. Update #fence, it shouldn't hang now *fingers crossed* Oh noes. I don't see anywhere in amdgpu where they attach a fence to the dma-buf, or wait upon it. :| Are you using drm/amdgpu or drm/radeon? Created attachment 125559 [details] [review] Add fences to dma-buf shared with amdgpu/radeon/nouveau One compile-only tested patch that should add the missing fences required for DRI3. The good news is the system has stopped crashing The bad news is I'm still getting tearing with DRI3 (In reply to Mike Lothian from comment #19) > The bad news is I'm still getting tearing with DRI3 Did you try with the patch from https://bugs.freedesktop.org/attachment.cgi?id=125559 ? Be prepared for misery... It may be a bit hopeful. Yes I applied that patch, the tearing is really obvious in Metro 2033 Redux intro video, I've also confirmed it in Tomb Raider in the game play. Would you like me to try it with DRI2 too? (In reply to Mike Lothian from comment #21) > Yes I applied that patch, the tearing is really obvious in Metro 2033 Redux > intro video, I've also confirmed it in Tomb Raider in the game play. Would > you like me to try it with DRI2 too? Only if you are bored. I'm going to try and get my Haswell/nvidia combo working this evening and at least get to the point of being able to trace the fences coming out of nouveau and into i915. OK, so I see less tearing with DRI2 but it's still there (this has always been the case though) The desktop seems more flakey though, more flickering. It's the first time I've used DRI2 in a while though so it might not be your kernel patches that are causing this Comment on attachment 125559 [details] [review] Add fences to dma-buf shared with amdgpu/radeon/nouveau Hmm, fences on the dma-buf are exported from ttm through gem_prime_res_obj which looks sufficient to couple the reservation objects together. Hmm, I am definitely seeing fences being passed from nouveau to i915 with DRI3. Can you please try with something like: diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 98294f1..55a54cd 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1442,6 +1442,8 @@ static int eb_add_reservations(struct reservation_object *resv, if (!fence_get_rcu(fence)) continue; + printk(KERN_ERR "third party read fence\n"); + ret = kfence_await_dma_fence(&req->submit, fence, GFP_NOWAIT | __GFP_NOWARN); fence_put(fence); @@ -1452,6 +1454,7 @@ static int eb_add_reservations(struct reservation_object *resv, fence = rcu_dereference(resv->fence_excl); if (fence && !fence_is_i915(fence) && fence_get_rcu(fence)) { + printk(KERN_ERR "third party write fence\n"); ret = kfence_await_dma_fence(&req->submit, fence, GFP_NOWAIT | __GFP_NOWARN); just to make sure we are seeing the fences? Created attachment 125573 [details]
dmesg output
So plenty of fences according to dmesg but there's still lots of tearing
Next attempt, diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index f72e8e68e394..d29d6824ece3 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1447,6 +1447,8 @@ static int eb_add_reservations(struct reservation_object *resv, fence_put(fence); if (ret < 0) goto err; + + pr_err("third party read fence: %d\n", ret); } } @@ -1457,6 +1459,8 @@ static int eb_add_reservations(struct reservation_object *resv, fence_put(fence); if (ret < 0) goto err; + + pr_err("third party write fence: %d\n", ret); } ret = 0; just to confirm whether the fences are no-ops or not. Created attachment 125575 [details]
dmesg output
Here's the output with your two patches
That tells us that we never wait upon a fence as it is always complete by the time we copy from the shared buffer. Hmm well I'm guessing something isn't quite right as I'm seeing really bad tearing I'm using SNA on the xf86-video-intel driver FWIW, I wouldn't test with DRI2, because that uses a single shared buffer. Even if there is proper synchronization between the dGPU rendering and the iGPU reading its output, I suspect there may be nothing preventing the dGPU from rendering the next frame before the iGPU is done reading the previous one. Is there something wrong with the patch that used to be part of the bug? I've still been using that one (In reply to Mike Lothian from comment #32) > Is there something wrong with the patch that used to be part of the bug? > I've still been using that one Yes, it should not be required - at best it just adds the fence twice. I've retested with your latest updates to the branch, same issue and the fences always report 0 The tearing I'm seeing looks like an inverted z so from the top left to the bottom right with horizontal lines a 1/5 from the top and 1/5 from the bottom of the screen. I sometimes see older frames too which makes the picture look jerky sometimes Created attachment 125609 [details]
dmesg resume fail
Created attachment 125610 [details]
gpu crash dump
Refreshed the #fence branch. Still thinking about how to identify where the tears occur. The diagonal tearing is at least indicative of a GPU tear (as opposed to scanout). This mesa patch diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c index 99e2897..9be5e72 100644 --- a/src/loader/loader_dri3_helper.c +++ b/src/loader/loader_dri3_helper.c @@ -167,7 +167,7 @@ loader_dri3_drawable_init(xcb_connection_t *conn, draw->vtable = vtable; draw->drawable = drawable; draw->dri_screen = dri_screen; - draw->is_different_gpu = is_different_gpu; + draw->is_different_gpu = true; draw->have_back = 0; draw->have_fake_front = 0; will force every client to use the linear shared buffer + local copy. Can you please test that using DRI_PRIME=0 (i.e. with igfx) and see if it tears? (Obviously there is still a big difference in drivers that might hide any issue, as well as any fence now being native...) On the other hand, if it tears we are closer to finding the right forest. With that patch applied Plasma 5 doens't start. If I install it after Plasma 5 starts then apps won't start Created attachment 125805 [details] [review] Force linear-buffer for igfx Seems like i965 was missing the blitImage callback. Can't currently test your kernel due to this build error: drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_vma_unbind’: drivers/gpu/drm/i915/i915_gem.c:2824:23: error: ‘struct drm_i915_gem_object’ has no member named ‘pages’ if (vma->pages != obj->pages) { ^~ Will try out your mesa patch with my previous build So I'm seeing some pretty funky graphical glitches with that mesa patch. The compositor doesn't show things sometimes, especially in steam. So when I open the configuration options the new window is blank. It'll eventually appear, when it's shut the window behind appears blank. When playing metro 2033 redux I'm not getting the tearing but I do seem to see older frames sometimes like things are being presented out of sequence This can be seen clearly on the main menu when the woman and boy are pacing backwards and forwards There's no synchronisation point in dri3 with X: diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c index a55719a..6b5736b 100644 --- a/src/loader/loader_dri3_helper.c +++ b/src/loader/loader_dri3_helper.c @@ -1399,6 +1399,7 @@ loader_dri3_get_buffers(__DRIdrawable *driDrawable, buffers->back = back->image; } + loader_dri3_wait_x(draw); draw->stamp = stamp; return true; brings it into line with dri2. (In reply to Chris Wilson from comment #43) > There's no synchronisation point in dri3 with X: > > diff --git a/src/loader/loader_dri3_helper.c > b/src/loader/loader_dri3_helper.c > index a55719a..6b5736b 100644 > --- a/src/loader/loader_dri3_helper.c > +++ b/src/loader/loader_dri3_helper.c > @@ -1399,6 +1399,7 @@ loader_dri3_get_buffers(__DRIdrawable *driDrawable, > buffers->back = back->image; > } > > + loader_dri3_wait_x(draw); > draw->stamp = stamp; > > return true; > > brings it into line with dri2. I don't see why waitX is required. The call has only an impact when a fake front buffer is used. And when fake front buffer is used, the same buffer is always used and it seems correctly updated when needed. (In reply to Mike Lothian from comment #42) Are you sure the dedicated card was not used for this test ? It looks like the problem is the compositor copies the damaged windows contents before the blitImage calls are executed on the gpu. Could it be a problem with the flushing behaviour in intel_blit_image ? (In reply to Axel Davy from comment #44) > (In reply to Chris Wilson from comment #43) > > There's no synchronisation point in dri3 with X: > > > > diff --git a/src/loader/loader_dri3_helper.c > > b/src/loader/loader_dri3_helper.c > > index a55719a..6b5736b 100644 > > --- a/src/loader/loader_dri3_helper.c > > +++ b/src/loader/loader_dri3_helper.c > > @@ -1399,6 +1399,7 @@ loader_dri3_get_buffers(__DRIdrawable *driDrawable, > > buffers->back = back->image; > > } > > > > + loader_dri3_wait_x(draw); > > draw->stamp = stamp; > > > > return true; > > > > brings it into line with dri2. > > I don't see why waitX is required. > > The call has only an impact when a fake front buffer is used. > And when fake front buffer is used, the same buffer is always used and it > seems correctly updated when needed. Because the tiled_buffer is not kept to date with X. Not sure if you want me to test that patch again with the loader_dri3_wait_x or just do it on it's own and with or without prime FWIW, this is working correctly between the radeon and amdgpu kernel drivers. So no changes to those drivers or to userspace should be necessary. (In reply to Michel Dänzer from comment #47) > FWIW, this is working correctly between the radeon and amdgpu kernel > drivers. So no changes to those drivers or to userspace should be necessary. You really should check the compositors for missing glXWaitX because they've been relying on the implicit flushes from DRI2 (hence the dri3_wait_x hack for refreshing the tiled image from the shared linear buffer for tfp support). (In reply to Chris Wilson from comment #48) > You really should check the compositors for missing glXWaitX because they've > been relying on the implicit flushes from DRI2 (hence the dri3_wait_x hack > for refreshing the tiled image from the shared linear buffer for tfp > support). We always flush before Damage events are sent to the compositor, so I'm not sure how that's relevant for this bug report. (In reply to Michel Dänzer from comment #49) > We always flush before Damage events are sent to the compositor, [...] Actually, the modesetting driver doesn't do that yet, so it's better not to use that for the secondary GPU for now. Just and FYI, the fence and eb-fence branches aren't booting for me The intel nightly tree is booting for me again, I'll re-test your branch once it's been re-based So some good new, the playing frames out of order / jerky rendering has gone I'm still seeing some tearing during the intro video - could this be related to it being a video rather than rendering? To add some additional data. I tested the current Linux 4.8-rc4 with DRI3/Present + PRIME renderoffload on: a) Intel Ivybridge desktop + AMD Cedar (Evergreen family), radeon-kms + r600g b) Intel Haswell desktop + AMD Tonga (VI family), amdgpu-kms + radeonsi. This was tested with page-flipping, and the ftrace scripts i'll attach shortly. In case a) page-flipping worked under all loads without any tearing or rendering artifacts. The traces show that i915 properly blocks on the dmabuf-fence / reservation objects for a couple of milliseconds, longer if the rendering load is higher. This is consistent with other successful tests i did with radeon-kms/r600 on AMD evergreen + AMD evergreen, and with Intel + Nvidia under nouveau. In all cases i got the right behaviour under DRI + Prime renderoffload. E.g. from i915_optimus_tracing.sh: low load: 2) | intel_mmio_flip_work_func [i915]() { 2) # 3878.042 us | reservation_object_wait_timeout_rcu(); 2) # 3882.433 us | } Output from tonga_tracing.sh with the proper line uncommented for intel+radeon-kms: PTB mainthread-3235 [002] .... 7474.200826: radeon_cs_ioctl <-drm_ioctl PTB mainthread-3224 [003] .... 7474.202235: reservation_object_wait_timeout_rcu <-radeon_gem_wait_idle_ioctl PTB mainthread-3235 [002] .... 7474.202277: radeon_cs_ioctl <-drm_ioctl PTB mainthread-3235 [002] .... 7474.202423: radeon_cs_ioctl <-drm_ioctl X-2588 [002] .... 7474.202592: intel_crtc_page_flip <-drm_mode_page_flip_ioctl kworker/2:1-47 [002] .... 7474.202606: intel_mmio_flip_work_func <-process_one_work kworker/2:1-47 [002] .... 7474.202606: reservation_object_wait_timeout_rcu <-intel_mmio_flip_work_func ==> Blocks as expected for a few msecs. kworker/2:1-47 [002] .... 7474.206213: intel_pipe_update_start <-intel_mmio_flip_work_func Case b) otoh., i don't see the pageflip ever blocking on the reservation object, and while rendering looks good under low/medium load, i can get it to show incomplete rendering under high load. Exactly the Z-style tearing pattern one would expect if the Intel flips while the AMD Tonga still renders to the dmabuf. E.g., output from tonga_tracing.sh with line for intel+amdgpu-kms under very high load, when Z-tearing happens: PTB mainthread-12766 [001] .... 12538.537618: amdgpu_cs_ioctl <-drm_ioctl PTB mainthread-12766 [001] .... 12538.537875: amdgpu_cs_ioctl <-drm_ioctl PTB mainthread-12766 [001] .... 12538.537906: amdgpu_cs_ioctl <-drm_ioctl X-12250 [000] .... 12538.537935: intel_crtc_page_flip <-drm_mode_page_flip_ioctl kworker/0:2-11651 [000] .... 12538.537957: intel_mmio_flip_work_func <-process_one_work kworker/0:2-11651 [000] .... 12538.537957: reservation_object_wait_timeout_rcu <-intel_mmio_flip_work_func ==> Falls through although the massive rendering (over)load should cause a ~ 50 msecs delay kworker/0:2-11651 [000] .... 12538.537957: intel_pipe_update_start <-intel_mmio_flip_work_func Given that intel-kms with pageflipping seems to wait just fine on the dmabuf fence for both old AMD radeon-kms + r600 and for different NVidia's under nouveau, but it goes wrong with Tonga on amdgpu + radeonsi, it looks as if something is missing in amdgpu-kms or radeonsi? I'll also try with Chris eb-fence branch how it behaves for windowed swaps. Created attachment 126206 [details]
Basic pageflip dmabuf-fence tracing for i915 + other dGPU.
Created attachment 126207 [details]
Tracing with absolute timestamps, comment in proper line for GPU combo.
Created attachment 126208 [details]
reservation object wait on working setup radeon-kms + r600
Created attachment 126209 [details]
Absolute timestamp trace on properly working radeon-kms + r600
Created attachment 126210 [details]
Absolute timestamp trace on Z-Tearing amdgpu-kms + radeonsi
(In reply to Mario Kleiner from comment #54) > Given that intel-kms with pageflipping seems to wait just fine on the dmabuf > fence for both old AMD radeon-kms + r600 and for different NVidia's under > nouveau, but it goes wrong with Tonga on amdgpu + radeonsi, it looks as if > something is missing in amdgpu-kms or radeonsi? I doubt it, since (I just double-checked again) it works correctly with the renderer using amdgpu and Xorg using radeon. We had similar issues within amdgpu before 1ffd26524380 ("drm/amdgpu: fix waiting for all fences before flipping"). Maybe i915 can learn something from that? Then again, radeon still seems to only consider the exclusive fence. One observation that might explain Mario's results is that intel_mmio_flip_work_func passes false for the wait_all parameter of reservation_object_wait_timeout_rcu. While the radeon driver sets exclusive fences, amdgpu only sets shared ones AFAICT. Indeed, that looks like a good candidate! I'll retest all my Intel + dGPU setups, setting the wait_all flag to true in reservation_object_wait_timeout_rcu(). Created attachment 126288 [details] [review] Fixes prime sync problems for pageflips. So Michel was right, and with the attached patch, page-flipped renderoffload works nicely on Intel Haswell + AMD Tonga under amdgpu-kms, flips properly waiting on the dmabuf fence for longer running rendering. I also retested Intel Ivybridge + AMD Cedar under radeon-kms and Intel Ironlake + GeForce 330M under nouveau-kms to make sure it doesn't cause other problems. Maybe we can still get this into Linux 4.8-rc6? That would be splendid. I'll also attach a hacky patch on top of Chris eb-fences branch which seems to remove the Z-Tearing for windowed copy-swaps - and replaces it be the regular horizontal tearing we always have with DRI3/Present, even for single gpu, because we are too slow to get blitting started ahead of scanout. Created attachment 126289 [details] [review] Test patch for prime synced windowed swaps. Fwiw, waits for external reservation_objects is now in drm-intel-nightly. This should do the right thing with nouveau, but amdgpu only uses shared fences so we are not yet waiting on writes from amdgpu. If we can't rely on implicit fencing from dmabuf, we will have to wait until we have explicit fencing between amdgpu/intel. *sigh* I'm a little unclear, should Mario's patch be applied to the eb-fence branch, or is that just a work around and another solution is being sought? I've traced back the tearing without prime to this commit: 05a76d3d6ad1ee9f9814f88949cc9305fc165460 is the first bad commit commit 05a76d3d6ad1ee9f9814f88949cc9305fc165460 Author: Lyude <cpaul@redhat.com> Date: Wed Aug 17 15:55:57 2016 -0400 drm/i915/skl: Ensure pipes with changed wms get added to the state If we're enabling a pipe, we'll need to modify the watermarks on all active planes. Since those planes won't be added to the state on their own, we need to add them ourselves. Signed-off-by: Lyude <cpaul@redhat.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Cc: stable@vger.kernel.org Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-6-git-send-email-cpaul@redhat.com Which is unrelated to your work With Mario's patch https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html I now have tear free gaming Chris, Mario thanks for all your hard work |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.