Bug 99507 - Corrupted frame contents with Vulkan version of DOTA2, Talos Principle and Sascha Willems' demos when they're run Vsynched in fullscreen
Summary: Corrupted frame contents with Vulkan version of DOTA2, Talos Principle and Sa...
Status: VERIFIED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/intel (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-23 15:20 UTC by Eero Tamminen
Modified: 2018-10-03 15:24 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Photo of the problem (64.65 KB, image/jpeg)
2017-01-23 15:20 UTC, Eero Tamminen
Details
Video of issue (1.57 MB, video/mp4)
2018-09-24 00:29 UTC, Kevin S
Details

Description Eero Tamminen 2017-01-23 15:20:44 UTC
Created attachment 129109 [details]
Photo of the problem

Sacha Willem's demos:
  https://github.com/SaschaWillems/Vulkan

Just got support for running in fullscreen, at monitor's native resolution:
  https://github.com/SaschaWillems/Vulkan/issues/268

However, at least with Ubuntu 16.04 on SKL GT2,with latest git versions of Vulkan loader & Mesa, frame content is slightly corrupted when tests are running Vsynched, for example:
  ./raytracing -fullscreen -vsync

(By default the demos are currently vsynched in fullscreen, but it may change, hence the use of -vsync option.)

When I tried running the demos without Vsync (= no "-vsync" option, MAILBOX mode commented out from vulkanswapchain.hpp), i.e. X server will make a copy of the frame content to screen, output looks fine.  Screenshots (=copy of the frame)  also look fine.

I don't see it in all demos, but that could be a fluke.  It was visible at least in these:
- computecullandlod (very faintly)
- computenbody (very clearly)
- dynamic uniform buffers
- gears (occasionally)
- geometryshader
- indirectdraw
- multithreading
- particlefire (in fire)
- pushconstants
- raytracing (compute shader)
- specializationconstants
- texture
- texture3d
- texturearray
- texturecubemap (very faintly)
- texturemipmapgen (very faintly)
Comment 1 Eero Tamminen 2017-02-01 09:47:00 UTC
Same issue is also clearly visible in DOTA2 when one enables Vsync.


Btw: This could be a different issue... When starting DOTA2 Mechanics tutorial, Luna's face animation (at the bottom of screen) has occasional garbage.  Depending on the rendering quality settings, it may be visible only with vsync disabled, but otherwise rendering options (besides face animation setting) don't affect it.  It's not visible with other game characters, only with Luna.
Comment 2 Eero Tamminen 2017-02-01 10:20:47 UTC
Neither of those problems is visible with the DOTA2 GL renderer.
Comment 3 Nanley Chery 2017-02-14 23:09:34 UTC
I was able to reproduce the corruption on X11, but not on Wayland. Is this your experience as well? The demos in which I could very clearly see the corruption were:

- raytracing
- computenbody
Comment 4 Eero Tamminen 2017-02-15 14:36:30 UTC
(In reply to Nanley Chery from comment #3)
> I was able to reproduce the corruption on X11, but not on Wayland. Is this
> your experience as well?

I'm seeing the issue with X11, I haven't built things for Wayland.  Before doing that, I'd like to know how you're running things under Wayland...

* Have you build Wayland versions of Vulkan loader, Mesa and SachaWillems' demos?  I.e. are you running the tests with XWayland, or pure Wayland?  (former would hide the issue)

* If pure Wayland, are you running demos under gnome-shell, Weston, or something else?  I.e. are the demos composited to fullscreen, which would also invalidate your test results (copying the FB contents instead of flipping hides the issue)?

At least Ubuntu 16.04 version of gnome-shell will composite fullscreen windows, Weston I built from git few months ago, didn't composite (that were in native monitor resolution)...
Comment 5 Eero Tamminen 2017-02-15 15:38:15 UTC
I built everything required for Wayland.

-> There's no fullscreen support for Wayland in Sascha Willems' demos, so that's an invalid test for this bug.

Only when test is fullscreen, in native display resolution, and Vynched, it will be flipped to screen.  Everything else will have some kind of copy, either by X (when things aren't Vsynched or in native resolution) or by compositor, which will hide the issue.


PS. Getting Sacha's demos to actually present frames on Wayland required:
-               wl_display_dispatch(display);
+               wl_display_dispatch_pending(display);

in VulkanExampleBase::renderLoop().  Didn't you have that issue?
Comment 6 Nanley Chery 2017-02-15 18:10:19 UTC
My mistake, I forgot about XWayland. I was actually using that and not pure Wayland.
Comment 7 Eero Tamminen 2017-02-16 10:37:18 UTC
Is Vulkan using PTE MOCS for buffers that are potentially display surfaces, like GL side does in brw_update_renderbuffer_surface() & brw_blorp_init()?


The Talos Principle has also the same problem in Vsynched (monitor native resolution) fullscreen.  Note that you apparently need to restart the game for Vsync to take effect (toggling it doesn't enable Apply option).
Comment 8 Jason Ekstrand 2017-02-18 06:27:43 UTC
(In reply to Eero Tamminen from comment #7)
> Is Vulkan using PTE MOCS for buffers that are potentially display surfaces,
> like GL side does in brw_update_renderbuffer_surface() & brw_blorp_init()?

Good question.  Those errors definitely look like a caching problem though I'm a bit surprised I've never noticed it. :(

Looking at anv_private.h, on BDW, we're using the equivalent of GEN8_MOCS_WB for everything, not GEN8_MOCS_PTE.  Sky lake is the same story (only the GEN9 equivalents).  That's most likely the problem.  Thanks for pointing it out!

I see a couple of options here.  One would be to do the same thing as the GL driver does and use PTE for all render targets.  The other would be to only use PTE if the is_scanout flag is set on the BO.  In the end, I'm not sure there's actually a huge difference between the two.  We would also need to update the mocs values we pass to BLORP to also use PTE for render targets.
Comment 9 Eero Tamminen 2017-02-20 08:50:34 UTC
PTE uses/takes settings from page table, and that should be already correctly set up (unless there's a kernel bug); uncached for things that need it and WB otherwise.

If you switch to PTE setting and WB isn't used where it should, I'll see it from resulting perf changes.
Comment 10 Eero Tamminen 2017-04-05 15:55:17 UTC
(In reply to Eero Tamminen from comment #1)
> Btw: This could be a different issue... When starting DOTA2 Mechanics
> tutorial, Luna's face animation (at the bottom of screen) has occasional
> garbage.

This doesn't happen anymore, so it's separate from the issue in this bug.

Jason, is somebody looking/testing the MOCS stuff?
Comment 11 Eero Tamminen 2017-07-14 15:13:36 UTC
This still happens.  Chris' patch from bug 101571 doesn't help.
Comment 12 Tapani Pälli 2017-11-03 11:29:40 UTC
Same artifacts can be seen also on Android with these demos:

- raytracing
- computenbody
Comment 13 Jason Ekstrand 2017-12-20 21:10:04 UTC
Is this still an issue?
Comment 14 Eero Tamminen 2017-12-21 12:51:39 UTC
Tested Mesa git with latest drm-tip kernel on BDW GT2 & SKL GT4e, and Mesa git with Ubuntu 16.04 kernel on KBL GT3e.

I didn't see the issue with BDW GT2, but I saw it still on SKL GT4e & KBL GT3e.  It's fairly visible in Vulkan Multithreading demo.

Raytracing demo, where this was most visible, wasn't testable because of bug 104338. 

Subpasses demo has also some flickering which could be due to this same issue.


Issue is much harder to see in DOTA2 now.  To see it now, in addition to:
* Setting gfx options to highest (to slow things down)
* Setting resolution to monitor native resolution, fullscreen and enable Vsync (= page flipping requirements)

You need to scroll through the game area (e.g. all edges) while looking at the solid colored text message bars to see the issue.  It still happens only with Vsync enabled (=flipping), and is now a rare glitch.

I wasn't able to reproduce the issue anymore with Talos (or Sam3:BFE) Vulkan rendering.
Comment 15 Kevin S 2018-09-24 00:29:52 UTC
Created attachment 141705 [details]
Video of issue

This is still a major/severe issue for me, affecting many apps that use Vulkan. It's extremely noticeable in RetroArch on an Intel HD 520, where the entire top half of the screen glitches away.

I'd be happy to help debug or provide any assistance in getting this resolved.
Comment 16 Eero Tamminen 2018-09-24 08:19:41 UTC
(In reply to Kevin S from comment #15)
> This is still a major/severe issue for me, affecting many apps that use
> Vulkan. It's extremely noticeable in RetroArch on an Intel HD 520, where the
> entire top half of the screen glitches away.
> 
> I'd be happy to help debug or provide any assistance in getting this
> resolved.

Does on your SKL GT2 this also happen only when using fullscreen with Vsync i.e. if you disable either Vsync or fullscreen mode, the problem goes away?
Comment 17 Denis 2018-09-25 10:02:53 UTC
Hello Eero.
I am trying to reproduce this with Dota2 now. Also I checked raytracing app:

>./raytracing --fullscreen --vsync
It works fine for me on:

OpenGL: renderer: Mesa DRI Intel HD Graphics 620 (Kaby Lake GT2)
  v: 4.5 Mesa 18.3.0-develgit-3a9f628
vulkaninfo | grep 'apiVersion'
	apiVersion     = 0x401050  (1.1.80)
4.17.0-041700-generic
Ubuntu 16.04

I don't see any artifacts. Could you please check it also?
Comment 18 Kevin S 2018-09-30 19:55:33 UTC
(In reply to Eero Tamminen from comment #16)
> (In reply to Kevin S from comment #15)
> > This is still a major/severe issue for me, affecting many apps that use
> > Vulkan. It's extremely noticeable in RetroArch on an Intel HD 520, where the
> > entire top half of the screen glitches away.
> > 
> > I'd be happy to help debug or provide any assistance in getting this
> > resolved.
> 
> Does on your SKL GT2 this also happen only when using fullscreen with Vsync
> i.e. if you disable either Vsync or fullscreen mode, the problem goes away?

Correct. Running either application in windowed mode, or disabling Vsync solves the problem. Unfortunately I get a huge amount of tearing without vaync enabled.
Comment 19 Eero Tamminen 2018-10-03 08:05:14 UTC
(In reply to Denis from comment #17)
> I am trying to reproduce this with Dota2 now. Also I checked raytracing app:
> 
> ./raytracing --fullscreen --vsync

Correct option for Vsync in these demos is "-vsync", see:
  https://github.com/SaschaWillems/Vulkan/blob/master/base/vulkanexamplebase.cpp#L673

FYI: option parsing in these demos is *really* broken, see:
  https://github.com/SaschaWillems/Vulkan/issues/468

(Feel free the send Sacha a patch to fix it... You could get part of it from this old comment: https://github.com/SaschaWillems/Vulkan/issues/269#issuecomment-276947651)


> It works fine for me on:
> 
> OpenGL: renderer: Mesa DRI Intel HD Graphics 620 (Kaby Lake GT2)
>   v: 4.5 Mesa 18.3.0-develgit-3a9f628
> vulkaninfo | grep 'apiVersion'
> 	apiVersion     = 0x401050  (1.1.80)
> 4.17.0-041700-generic
> Ubuntu 16.04

Before concluding it working, please make sure that:
* Program is really Vsynched (FPS equals monitor refresh rate)
* Demo window is NOT composited


> I don't see any artifacts. Could you please check it also?

Yes, and it's *much* more visible and worse than when I reported this.  Much more visible also in other demos I tested (except for Triangle).

One reason for this could be that I'm running it now with X server git version where I've enabled modifier support (-> end-to-end render buffer compression).
Comment 20 Jason Ekstrand 2018-10-03 12:30:08 UTC
The problem here is that we aren't properly allowing the kernel to disable caching on scanout buffers.  At first, it was only really noticeable on BDW because there we weren't disabling eDRAM which is a very large cache and so the amount of data missing was substantial.  You could still notice on SKL+ but it was much more minor.  With modifiers, however, the result is now massive corruption.  I've got a patch for the bug which no one has felt like reviewing.
Comment 21 Eero Tamminen 2018-10-03 15:13:04 UTC
Jason's patch here fixes the corruption:
  https://patchwork.freedesktop.org/patch/254424/
Comment 22 Jason Ekstrand 2018-10-03 15:13:19 UTC
This is fixed by the following commit in master:

commit 7a89a0d9edae638e68e4b4ee8e0cbb34baa9c080
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Mon Jul 9 14:21:33 2018 -0700

    anv: Use separate MOCS settings for external BOs
    
    On Broadwell and above, we have to use different MOCS settings to allow
    the kernel to take over and disable caching when needed for external
    buffers.  On Broadwell, this is especially important because the kernel
    can't disable eLLC so we have to do it in userspace.  We very badly
    don't want to do that on everything so we need separate MOCS for
    external and internal BOs.
    
    In order to do this, we add an anv-specific BO flag for "external" and
    use that to distinguish between buffers which may be shared with other
    processes and/or display and those which are entirely internal.  That,
    together with an anv_mocs_for_bo helper lets us choose the right MOCS
    settings for each BO use.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99507
    Cc: mesa-stable@lists.freedesktop.org
    Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

I'm sorry this took so absurdly long to resolve. :-(  It wasn't until I saw the corruption with CCS that it really became easy to triage.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.