Bug 111568 - Image corruption with vkCmdCopyImage into I915_FORMAT_MOD_Y_TILED_CCS-tiled swapchain image
Summary: Image corruption with vkCmdCopyImage into I915_FORMAT_MOD_Y_TILED_CCS-tiled s...
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/intel (show other bugs)
Version: 19.1
Hardware: x86-64 (AMD64) Linux (All)
: not set not set
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-06 05:38 UTC by Felix
Modified: 2019-09-16 17:32 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Vulkan Api trace showing a wronly copied imaged. (606.59 KB, application/octet-stream)
2019-09-06 05:38 UTC, Felix
Details
Java sources for reproducing example (8.65 KB, application/x-compressed-tar)
2019-09-06 08:34 UTC, Felix
Details
denys_result_of_app_running (44.98 KB, image/png)
2019-09-06 09:45 UTC, Denis
Details
incorrect screenshot (10.65 KB, image/png)
2019-09-06 10:50 UTC, Lionel Landwerlin
Details

Description Felix 2019-09-06 05:38:08 UTC
Created attachment 145271 [details]
Vulkan Api trace showing a wronly copied imaged.

I am using a laptop with: "Intel(R) HD Graphics P630 (Kaby Lake GT2)"

When using Vulkan with gonme-wayland the Xwayland server(version 1.20.4) reports compatibility with all the DRM modifiers, implemented in mesa:
      DRM_FORMAT_MOD_LINEAR,
      I915_FORMAT_MOD_X_TILED,
      I915_FORMAT_MOD_Y_TILED,
      I915_FORMAT_MOD_Y_TILED_CCS,

The mesa intel vulkan driver reports the following two formats for support with an X11-swapchain (created through xwayland):
 VK_FORMAT_B8G8R8A8_UNORM
 VK_FORMAT_B8G8R8A8_SRGB

In https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/intel/isl/isl_format.c we can see that only the UNORM-format supports CCS compression on my hardware (I have devinfo->gen=9)
   SF(  Y,   Y,   x,   Y,   Y,   Y,   Y,   x,  60,  70,   x,  90,   B8G8R8A8_UNORM)
   SF(  Y,   Y,   x,   x,   Y,   Y,   x,   x,   x,   x,   x, 100,   B8G8R8A8_UNORM_SRGB)

When the application now chooses "VK_FORMAT_B8G8R8A8_UNORM" as swapchain image format, the WSI-part offers all 4 DRM-modifiers to anv_image.c which chooses I915_FORMAT_MOD_Y_TILED_CCS based on this scoring function:
https://gitlab.freedesktop.org/mesa/mesa/blob/55c912883c9b3624ee060fe1a0232cf71e329d80/src/intel/vulkan/anv_image.c#L524

The application creates a second (normal, LINEAR-tiled, same VkFormat (of VK_FORMAT_B8G8R8A8_UNORM))-image with the same size as the swapchain-image and loads data into it my memory-mapping it (I chose an all-white image). Afterwards it copies the image data into the swapchain image with `VkCmdCopyImage`.

When this picture is presented however it is corrupted. (I see only a few white dots on black background). I presume this is caused by the code that does `VkCmdCopyImage` not correctly handling conversion based on CCS-tiling.

I have created a small Java-LWJGL-Vulkan application that deterministically reproduces the problem and recorded a Vulkan apitrace (attached). If you would prefer the java-Application for testing I could provide it.
Comment 1 Lionel Landwerlin 2019-09-06 08:15:19 UTC
Please share you application, it would be greatly helpful to reproduce.
Thanks!
Comment 2 Denis 2019-09-06 08:31:42 UTC
Hello Felix, I tried your apitrace and got this error (tested on X and wayland sessions):

[den@den-pc ~]$ /home/den/repository/apitrace/build32/apitrace replay ~/Downloads/wrong_compression.trace
apitrace: /home/den/repository/apitrace/lib/trace/trace_file_brotli.cpp:113: virtual size_t BrotliFile::rawRead(void*, size_t): Assertion `result == BROTLI_DECODER_RESULT_NEEDS_MORE_OUTPUT || result == BROTLI_DECODER_RESULT_SUCCESS' failed.
Aborted (core dumped)

I used "repository" version and manually built apitrace versions. Both are crashing.
So maybe java app would be helpful, yes
Comment 3 Felix 2019-09-06 08:34:34 UTC
Created attachment 145273 [details]
Java sources for reproducing example
Comment 4 Felix 2019-09-06 08:40:56 UTC
@Denis: the problem is that "apitrace" is AFAIK not available for Vulkan. The trace that I recorded is from these tools: https://github.com/LunarG/VulkanTools/blob/master/vktrace/vktrace.md#vkreplay

To compile you require the lwjgl-components "core", "glfw" and "vulkan".
Comment 5 Denis 2019-09-06 08:44:59 UTC
(In reply to Felix from comment #4)
> @Denis: the problem is that "apitrace" is AFAIK not available for Vulkan.
> The trace that I recorded is from these tools:
> https://github.com/LunarG/VulkanTools/blob/master/vktrace/vktrace.md#vkreplay
> 
> To compile you require the lwjgl-components "core", "glfw" and "vulkan".

...
yes, my bad, sorry. Was messed up by the trace extension (*.trace, not *.vktrace). Thank you
Comment 6 Denis 2019-09-06 09:44:35 UTC
hm, vktrace didn't provide me any rendered window. Compiled from source file (it would be really helpful to provide, how to compile it :) ) generates this output:

[den@den-pc 1]$ java -classpath ".:/home/den/Downloads/lwjgl-release-3.2.3-custom/*" ExampleImage
Got: 1 devices
Rendering/Presenting on: Intel(R) HD Graphics 620 (Kaby Lake GT2)
Swapchain Images: 2
Done
59
61
61
60
61


I see small window with a white squad. What should be expected result?
Comment 7 Denis 2019-09-06 09:45:45 UTC
Created attachment 145275 [details]
denys_result_of_app_running

upd - added my current result
Comment 8 Lionel Landwerlin 2019-09-06 10:50:53 UTC
Created attachment 145276 [details]
incorrect screenshot

This is what I get.
Comment 9 Denis 2019-09-06 11:44:21 UTC
hmm, if it will be taken into account, that "correct" result - on my screenshot (1 white squad), and wrong - on Lionel's, then here is a bisect result

[den@den-pc mesa]$ git bisect good
c80c08e226033e9e33abdca43e02e7f8c845ae0a is the first bad commit
commit c80c08e226033e9e33abdca43e02e7f8c845ae0a
Author: Daniel Stone <daniels@collabora.com>
Date:   Thu Jun 8 17:24:30 2017 +0100

    vulkan/wsi/x11: Add support for DRI3 v1.2
    
    Adds support for multiple planes and buffer modifiers.
    
    v4: Rename "has_dri3_v1_1" to "has_dri3_modifiers"
    v12: Multi-planar/modifier support is now DRI3 v1.2; also update release
         versions

 configure.ac                    |   3 +-
 meson.build                     |   2 +-
 src/vulkan/wsi/wsi_common_x11.c | 178 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 164 insertions(+), 19 deletions(-)
Comment 10 Lionel Landwerlin 2019-09-06 11:50:30 UTC
(In reply to Denis from comment #9)
> hmm, if it will be taken into account, that "correct" result - on my
> screenshot (1 white squad), and wrong - on Lionel's, then here is a bisect
> result
> 
> [den@den-pc mesa]$ git bisect good
> c80c08e226033e9e33abdca43e02e7f8c845ae0a is the first bad commit
> commit c80c08e226033e9e33abdca43e02e7f8c845ae0a
> Author: Daniel Stone <daniels@collabora.com>
> Date:   Thu Jun 8 17:24:30 2017 +0100
> 
>     vulkan/wsi/x11: Add support for DRI3 v1.2
>     
>     Adds support for multiple planes and buffer modifiers.
>     
>     v4: Rename "has_dri3_v1_1" to "has_dri3_modifiers"
>     v12: Multi-planar/modifier support is now DRI3 v1.2; also update release
>          versions
> 
>  configure.ac                    |   3 +-
>  meson.build                     |   2 +-
>  src/vulkan/wsi/wsi_common_x11.c | 178
> ++++++++++++++++++++++++++++++++++++----
>  3 files changed, 164 insertions(+), 19 deletions(-)

That's an odd bisect, because my reproduction happens on Wayland.
Comment 11 Denis 2019-09-06 12:02:24 UTC
(In reply to Lionel Landwerlin from comment #10)
> (In reply to Denis from comment #9)
> > hmm, if it will be taken into account, that "correct" result - on my
> > screenshot (1 white squad), and wrong - on Lionel's, then here is a bisect
> > result
> > 
> > [den@den-pc mesa]$ git bisect good
> > c80c08e226033e9e33abdca43e02e7f8c845ae0a is the first bad commit
> > commit c80c08e226033e9e33abdca43e02e7f8c845ae0a
> > Author: Daniel Stone <daniels@collabora.com>
> > Date:   Thu Jun 8 17:24:30 2017 +0100
> > 
> >     vulkan/wsi/x11: Add support for DRI3 v1.2
> >     
> >     Adds support for multiple planes and buffer modifiers.
> >     
> >     v4: Rename "has_dri3_v1_1" to "has_dri3_modifiers"
> >     v12: Multi-planar/modifier support is now DRI3 v1.2; also update release
> >          versions
> > 
> >  configure.ac                    |   3 +-
> >  meson.build                     |   2 +-
> >  src/vulkan/wsi/wsi_common_x11.c | 178
> > ++++++++++++++++++++++++++++++++++++----
> >  3 files changed, 164 insertions(+), 19 deletions(-)
> 
> That's an odd bisect, because my reproduction happens on Wayland.

I double-checked my results so I am pretty sure that it is the first bad commit.
I also used wayland session for bisection (under the X bug can't be reproduced). https://imgur.com/a/htBV9WO here is a gitk screenshot from bisection. I exchanged libs using VK_ICD_FILENAMES variable
Comment 12 Felix 2019-09-06 12:12:39 UTC
Yes, that's the same commit that I also got while bisecting.

Regarding X vs Wayland: The application always connects via X (either directly or via Xwayland), so from mesa-view it is always the X11-wsi-part. The difference between native X and Xwayland (on my system) is that native X reports:
      DRM_FORMAT_MOD_LINEAR,
      I915_FORMAT_MOD_X_TILED,
      I915_FORMAT_MOD_Y_TILED,
as supported drm modifiers and Xwayland reports:
      DRM_FORMAT_MOD_LINEAR,
      I915_FORMAT_MOD_X_TILED,
      I915_FORMAT_MOD_Y_TILED,
      I915_FORMAT_MOD_Y_TILED_CCS,

Because I915_FORMAT_MOD_Y_TILED_CCS is missing in "native" X, the error does not occur there.

Yes, you got perfect screenshots of "expected" vs "actual" behaviour (Denis having "expected"=good and Lionel having "actual"=bad)
Comment 13 Lionel Landwerlin 2019-09-06 12:22:28 UTC
(In reply to Denis from comment #11)
> (In reply to Lionel Landwerlin from comment #10)
> > (In reply to Denis from comment #9)
> > > hmm, if it will be taken into account, that "correct" result - on my
> > > screenshot (1 white squad), and wrong - on Lionel's, then here is a bisect
> > > result
> > > 
> > > [den@den-pc mesa]$ git bisect good
> > > c80c08e226033e9e33abdca43e02e7f8c845ae0a is the first bad commit
> > > commit c80c08e226033e9e33abdca43e02e7f8c845ae0a
> > > Author: Daniel Stone <daniels@collabora.com>
> > > Date:   Thu Jun 8 17:24:30 2017 +0100
> > > 
> > >     vulkan/wsi/x11: Add support for DRI3 v1.2
> > >     
> > >     Adds support for multiple planes and buffer modifiers.
> > >     
> > >     v4: Rename "has_dri3_v1_1" to "has_dri3_modifiers"
> > >     v12: Multi-planar/modifier support is now DRI3 v1.2; also update release
> > >          versions
> > > 
> > >  configure.ac                    |   3 +-
> > >  meson.build                     |   2 +-
> > >  src/vulkan/wsi/wsi_common_x11.c | 178
> > > ++++++++++++++++++++++++++++++++++++----
> > >  3 files changed, 164 insertions(+), 19 deletions(-)
> > 
> > That's an odd bisect, because my reproduction happens on Wayland.
> 
> I double-checked my results so I am pretty sure that it is the first bad
> commit.
> I also used wayland session for bisection (under the X bug can't be
> reproduced). https://imgur.com/a/htBV9WO here is a gitk screenshot from
> bisection. I exchanged libs using VK_ICD_FILENAMES variable

I compiled Mesa at the previous commit :

commit 7258be91c59b20a6479b3b0d750ff8b4a32cf032 (HEAD)
Author: Dylan Baker <dylan@pnwbakers.com>
Date:   Fri Mar 2 09:57:54 2018 -0800

    autotools: include all meson.build files

I'm still reproducing the issue unfortunately.
Comment 14 Denis 2019-09-06 13:29:22 UTC
(In reply to Lionel Landwerlin from comment #13)
> (In reply to Denis from comment #11)
> > (In reply to Lionel Landwerlin from comment #10)
> > > (In reply to Denis from comment #9)
> > > > hmm, if it will be taken into account, that "correct" result - on my
> > > > screenshot (1 white squad), and wrong - on Lionel's, then here is a bisect
> > > > result
> > > > 
> > > > [den@den-pc mesa]$ git bisect good
> > > > c80c08e226033e9e33abdca43e02e7f8c845ae0a is the first bad commit
> > > > commit c80c08e226033e9e33abdca43e02e7f8c845ae0a
> > > > Author: Daniel Stone <daniels@collabora.com>
> > > > Date:   Thu Jun 8 17:24:30 2017 +0100
> > > > 
> > > >     vulkan/wsi/x11: Add support for DRI3 v1.2
> > > >     
> > > >     Adds support for multiple planes and buffer modifiers.
> > > >     
> > > >     v4: Rename "has_dri3_v1_1" to "has_dri3_modifiers"
> > > >     v12: Multi-planar/modifier support is now DRI3 v1.2; also update release
> > > >          versions
> > > > 
> > > >  configure.ac                    |   3 +-
> > > >  meson.build                     |   2 +-
> > > >  src/vulkan/wsi/wsi_common_x11.c | 178
> > > > ++++++++++++++++++++++++++++++++++++----
> > > >  3 files changed, 164 insertions(+), 19 deletions(-)
> > > 
> > > That's an odd bisect, because my reproduction happens on Wayland.
> > 
> > I double-checked my results so I am pretty sure that it is the first bad
> > commit.
> > I also used wayland session for bisection (under the X bug can't be
> > reproduced). https://imgur.com/a/htBV9WO here is a gitk screenshot from
> > bisection. I exchanged libs using VK_ICD_FILENAMES variable
> 
> I compiled Mesa at the previous commit :
> 
> commit 7258be91c59b20a6479b3b0d750ff8b4a32cf032 (HEAD)
> Author: Dylan Baker <dylan@pnwbakers.com>
> Date:   Fri Mar 2 09:57:54 2018 -0800
> 
>     autotools: include all meson.build files
> 
> I'm still reproducing the issue unfortunately.

Strange. Can you test this commit? I had this version already built, and exactly on it I saw correct behavior. So I bisected between mesa-18.0.0 and mesa-master
>OpenGL ES profile version string: OpenGL ES 3.2 Mesa 18.0.0 (git-dceb1ce807)


>Because I915_FORMAT_MOD_Y_TILED_CCS is missing in "native" X, the error does not occur there.
I remember that this parameter was discussed in this ticket https://bugs.freedesktop.org/show_bug.cgi?id=111140
But it was related to i965 and fix was landed not into mesa... maybe this may help somehow...
Comment 15 Felix 2019-09-06 15:26:04 UTC
I have just re-validated the bisect result on my system (just to be safe). Using the java-example on gnome-wayland+xwayland:
7258be91c59b20a6479b3b0d750ff8b4a32cf032 good
c80c08e226033e9e33abdca43e02e7f8c845ae0a bad
So I can confirm that the found commit caused the regression.
Comment 16 Jason Ekstrand 2019-09-06 17:22:34 UTC
Somewhere, we're missing a CCS partial resolve.  Are you 100% sure that the app is transitioning the image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR prior to calling vkQueuePresent?
Comment 17 Felix 2019-09-06 19:38:21 UTC
I am as sure as I can be by reading the source. That's how I build the command buffer I am creating per frame:

- swapchain image: -> VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL
- image to be displayed: -> VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL
- vkCmdCopyImage
- image to be displayed: -> VK_IMAGE_LAYOUT_GENERAL
- swapchain image: -> VK_IMAGE_LAYOUT_PRESENT_SRC_KHR

vkQueueSubmit(the buffer)
vkQueuePresent(...)

So I am as sure as I can be that the app transitions the image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR prior to calling vkQueuePresent.

If you want, you can verify the ImageBarriers yourself in ExampleImage.java:167
Comment 18 Felix 2019-09-14 13:12:47 UTC
I just tested a newer gnome (3.33.92) and the issue still persists. Someone else also tested with (3.34.0) in which the issue still seems to be reproducible.

I also tested another wayland-compositor: sway (1.1.1) which seems to have the same issue. As far as I can tell sway always renders the image via accessing it as texture and does never use glGetTexImage (https://github.com/swaywm/wlroots/blob/734c64a6cc005f2c20df48af8079538a8c4cbd39/render/gles2/renderer.c#L156 ) so the solution from from #111140 does not seem sufficient.
Comment 19 Jason Ekstrand 2019-09-16 15:12:29 UTC
Sorry it took me so long to get around to looking at this.  Your image barrier on the WSI image is from VK_IMAGE_LAYOUT_UNDEFINED to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR.  Any image layout transition with VK_IMAGE_LAYOUT_UNDEFINED tells the driver "Feel free to throw away my image contents" which is exactly what our driver is doing. :-)
Comment 20 Felix 2019-09-16 17:32:31 UTC
Ooh.. what a silly mistake..
And I left out the important part in the description of what my sample is doing.
Thanks for looking at the source code and finding the problem.

I've changed the image barrier in the sample (and also in the larger application where the issue first occurred) and everything looks fine now.

Thank you very much for debugging this problem. Sorry for the wrong bug report.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.