Created attachment 129109 [details]
Photo of the problem
Sacha Willem's demos:
Just got support for running in fullscreen, at monitor's native resolution:
However, at least with Ubuntu 16.04 on SKL GT2,with latest git versions of Vulkan loader & Mesa, frame content is slightly corrupted when tests are running Vsynched, for example:
./raytracing -fullscreen -vsync
(By default the demos are currently vsynched in fullscreen, but it may change, hence the use of -vsync option.)
When I tried running the demos without Vsync (= no "-vsync" option, MAILBOX mode commented out from vulkanswapchain.hpp), i.e. X server will make a copy of the frame content to screen, output looks fine. Screenshots (=copy of the frame) also look fine.
I don't see it in all demos, but that could be a fluke. It was visible at least in these:
- computecullandlod (very faintly)
- computenbody (very clearly)
- dynamic uniform buffers
- gears (occasionally)
- particlefire (in fire)
- raytracing (compute shader)
- texturecubemap (very faintly)
- texturemipmapgen (very faintly)
Same issue is also clearly visible in DOTA2 when one enables Vsync.
Btw: This could be a different issue... When starting DOTA2 Mechanics tutorial, Luna's face animation (at the bottom of screen) has occasional garbage. Depending on the rendering quality settings, it may be visible only with vsync disabled, but otherwise rendering options (besides face animation setting) don't affect it. It's not visible with other game characters, only with Luna.
Neither of those problems is visible with the DOTA2 GL renderer.
I was able to reproduce the corruption on X11, but not on Wayland. Is this your experience as well? The demos in which I could very clearly see the corruption were:
(In reply to Nanley Chery from comment #3)
> I was able to reproduce the corruption on X11, but not on Wayland. Is this
> your experience as well?
I'm seeing the issue with X11, I haven't built things for Wayland. Before doing that, I'd like to know how you're running things under Wayland...
* Have you build Wayland versions of Vulkan loader, Mesa and SachaWillems' demos? I.e. are you running the tests with XWayland, or pure Wayland? (former would hide the issue)
* If pure Wayland, are you running demos under gnome-shell, Weston, or something else? I.e. are the demos composited to fullscreen, which would also invalidate your test results (copying the FB contents instead of flipping hides the issue)?
At least Ubuntu 16.04 version of gnome-shell will composite fullscreen windows, Weston I built from git few months ago, didn't composite (that were in native monitor resolution)...
I built everything required for Wayland.
-> There's no fullscreen support for Wayland in Sascha Willems' demos, so that's an invalid test for this bug.
Only when test is fullscreen, in native display resolution, and Vynched, it will be flipped to screen. Everything else will have some kind of copy, either by X (when things aren't Vsynched or in native resolution) or by compositor, which will hide the issue.
PS. Getting Sacha's demos to actually present frames on Wayland required:
in VulkanExampleBase::renderLoop(). Didn't you have that issue?
My mistake, I forgot about XWayland. I was actually using that and not pure Wayland.
Is Vulkan using PTE MOCS for buffers that are potentially display surfaces, like GL side does in brw_update_renderbuffer_surface() & brw_blorp_init()?
The Talos Principle has also the same problem in Vsynched (monitor native resolution) fullscreen. Note that you apparently need to restart the game for Vsync to take effect (toggling it doesn't enable Apply option).
(In reply to Eero Tamminen from comment #7)
> Is Vulkan using PTE MOCS for buffers that are potentially display surfaces,
> like GL side does in brw_update_renderbuffer_surface() & brw_blorp_init()?
Good question. Those errors definitely look like a caching problem though I'm a bit surprised I've never noticed it. :(
Looking at anv_private.h, on BDW, we're using the equivalent of GEN8_MOCS_WB for everything, not GEN8_MOCS_PTE. Sky lake is the same story (only the GEN9 equivalents). That's most likely the problem. Thanks for pointing it out!
I see a couple of options here. One would be to do the same thing as the GL driver does and use PTE for all render targets. The other would be to only use PTE if the is_scanout flag is set on the BO. In the end, I'm not sure there's actually a huge difference between the two. We would also need to update the mocs values we pass to BLORP to also use PTE for render targets.
PTE uses/takes settings from page table, and that should be already correctly set up (unless there's a kernel bug); uncached for things that need it and WB otherwise.
If you switch to PTE setting and WB isn't used where it should, I'll see it from resulting perf changes.
(In reply to Eero Tamminen from comment #1)
> Btw: This could be a different issue... When starting DOTA2 Mechanics
> tutorial, Luna's face animation (at the bottom of screen) has occasional
This doesn't happen anymore, so it's separate from the issue in this bug.
Jason, is somebody looking/testing the MOCS stuff?
This still happens. Chris' patch from bug 101571 doesn't help.
Same artifacts can be seen also on Android with these demos:
Is this still an issue?
Tested Mesa git with latest drm-tip kernel on BDW GT2 & SKL GT4e, and Mesa git with Ubuntu 16.04 kernel on KBL GT3e.
I didn't see the issue with BDW GT2, but I saw it still on SKL GT4e & KBL GT3e. It's fairly visible in Vulkan Multithreading demo.
Raytracing demo, where this was most visible, wasn't testable because of bug 104338.
Subpasses demo has also some flickering which could be due to this same issue.
Issue is much harder to see in DOTA2 now. To see it now, in addition to:
* Setting gfx options to highest (to slow things down)
* Setting resolution to monitor native resolution, fullscreen and enable Vsync (= page flipping requirements)
You need to scroll through the game area (e.g. all edges) while looking at the solid colored text message bars to see the issue. It still happens only with Vsync enabled (=flipping), and is now a rare glitch.
I wasn't able to reproduce the issue anymore with Talos (or Sam3:BFE) Vulkan rendering.
Created attachment 141705 [details]
Video of issue
This is still a major/severe issue for me, affecting many apps that use Vulkan. It's extremely noticeable in RetroArch on an Intel HD 520, where the entire top half of the screen glitches away.
I'd be happy to help debug or provide any assistance in getting this resolved.
(In reply to Kevin S from comment #15)
> This is still a major/severe issue for me, affecting many apps that use
> Vulkan. It's extremely noticeable in RetroArch on an Intel HD 520, where the
> entire top half of the screen glitches away.
> I'd be happy to help debug or provide any assistance in getting this
Does on your SKL GT2 this also happen only when using fullscreen with Vsync i.e. if you disable either Vsync or fullscreen mode, the problem goes away?
I am trying to reproduce this with Dota2 now. Also I checked raytracing app:
>./raytracing --fullscreen --vsync
It works fine for me on:
OpenGL: renderer: Mesa DRI Intel HD Graphics 620 (Kaby Lake GT2)
v: 4.5 Mesa 18.3.0-develgit-3a9f628
vulkaninfo | grep 'apiVersion'
apiVersion = 0x401050 (1.1.80)
I don't see any artifacts. Could you please check it also?
(In reply to Eero Tamminen from comment #16)
> (In reply to Kevin S from comment #15)
> > This is still a major/severe issue for me, affecting many apps that use
> > Vulkan. It's extremely noticeable in RetroArch on an Intel HD 520, where the
> > entire top half of the screen glitches away.
> > I'd be happy to help debug or provide any assistance in getting this
> > resolved.
> Does on your SKL GT2 this also happen only when using fullscreen with Vsync
> i.e. if you disable either Vsync or fullscreen mode, the problem goes away?
Correct. Running either application in windowed mode, or disabling Vsync solves the problem. Unfortunately I get a huge amount of tearing without vaync enabled.
(In reply to Denis from comment #17)
> I am trying to reproduce this with Dota2 now. Also I checked raytracing app:
> ./raytracing --fullscreen --vsync
Correct option for Vsync in these demos is "-vsync", see:
FYI: option parsing in these demos is *really* broken, see:
(Feel free the send Sacha a patch to fix it... You could get part of it from this old comment: https://github.com/SaschaWillems/Vulkan/issues/269#issuecomment-276947651)
> It works fine for me on:
> OpenGL: renderer: Mesa DRI Intel HD Graphics 620 (Kaby Lake GT2)
> v: 4.5 Mesa 18.3.0-develgit-3a9f628
> vulkaninfo | grep 'apiVersion'
> apiVersion = 0x401050 (1.1.80)
> Ubuntu 16.04
Before concluding it working, please make sure that:
* Program is really Vsynched (FPS equals monitor refresh rate)
* Demo window is NOT composited
> I don't see any artifacts. Could you please check it also?
Yes, and it's *much* more visible and worse than when I reported this. Much more visible also in other demos I tested (except for Triangle).
One reason for this could be that I'm running it now with X server git version where I've enabled modifier support (-> end-to-end render buffer compression).
The problem here is that we aren't properly allowing the kernel to disable caching on scanout buffers. At first, it was only really noticeable on BDW because there we weren't disabling eDRAM which is a very large cache and so the amount of data missing was substantial. You could still notice on SKL+ but it was much more minor. With modifiers, however, the result is now massive corruption. I've got a patch for the bug which no one has felt like reviewing.
Jason's patch here fixes the corruption:
This is fixed by the following commit in master:
Author: Jason Ekstrand <firstname.lastname@example.org>
Date: Mon Jul 9 14:21:33 2018 -0700
anv: Use separate MOCS settings for external BOs
On Broadwell and above, we have to use different MOCS settings to allow
the kernel to take over and disable caching when needed for external
buffers. On Broadwell, this is especially important because the kernel
can't disable eLLC so we have to do it in userspace. We very badly
don't want to do that on everything so we need separate MOCS for
external and internal BOs.
In order to do this, we add an anv-specific BO flag for "external" and
use that to distinguish between buffers which may be shared with other
processes and/or display and those which are entirely internal. That,
together with an anv_mocs_for_bo helper lets us choose the right MOCS
settings for each BO use.
Reviewed-by: Lionel Landwerlin <email@example.com>
I'm sorry this took so absurdly long to resolve. :-( It wasn't until I saw the corruption with CCS that it really became easy to triage.