Bug 111140 - Buffer corruption with Chromium on gnome-shell (wayland) after taking a screenshot
Summary: Buffer corruption with Chromium on gnome-shell (wayland) after taking a scree...
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-15 22:52 UTC by Lyude Paul
Modified: 2019-09-25 20:33 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Nested shell repro (1022.70 KB, image/png)
2019-07-16 11:26 UTC, Lionel Landwerlin
Details

Description Lyude Paul 2019-07-15 22:52:29 UTC
So: while viewing a video in Chromium in full screen mode, taking a screenshot of _only_ the window (using Alt+PrintScr) on gnome-shell on Wayland causes some rather hilarious buffer corruption. From the looks of it, only one of the two buffers Xwayland is using get corrupted, causing a rather jarring flickering effect. See the example video here: https://people.freedesktop.org/~lyudess/archive/07-15-2019/VID_20190712_172307.mp4

So far I've managed to confirm this is indeed a hardware acceleration issue, as disabling hardware acceleration seems to fix the issue. Additionally, this issue seems to be specific to intel - I haven't managed to reproduce it with AMD. Note however, the only environment I've managed to reproduce this in is gnome-shell 3.32.2 on Wayland.

The trigger for this seems to be:

- Start playing a video with chromium
- Put the window into full screen mode
- Grab a screenshot of the window using alt + PrintScr (grabbing the whole desktop or a portion of it won't work)
- Shield your eyes as your screen starts flickering

There's also a downstream bug for this on Fedora 30 now opened:

https://bugzilla.redhat.com/show_bug.cgi?id=1729613

Hopefully this doesn't end up being some random gnome-shell or Xwayland issue…
Comment 1 Lionel Landwerlin 2019-07-16 06:15:17 UTC
Reproduced.
From the pattern, it looks like it could be compression/modifier issue.
I wonder if mutter disables it to take the screenshot and then somewhere the state gets messed up.
Comment 2 Lionel Landwerlin 2019-07-16 11:26:33 UTC
Created attachment 144798 [details]
Nested shell repro

Reproduced in nested mode. That rules out any display stuff.
Comment 3 Denis 2019-07-16 11:46:21 UTC
hi, I also able to reproduce this issue. Unfortunately that's not regression so couldn't find "workable" mesa version.
Comment 4 Denis 2019-07-16 12:45:38 UTC
additional info found - works fine with iris driver (system 19.1.1 mesa, manjaro)
Comment 5 Lionel Landwerlin 2019-07-16 13:20:59 UTC
On my side I can see that one of the image is switched from using modifier to not using the anymore (likely when the image is resolved to get the data).
But chromium likely renders to that buffer still in compressed mode, which is why I think this leads to the issue.

I'm guessing the texture/miptree needs to be destroyed, but that's not something the driver can necessarily do on its own, because his view of the texture is consistent.

Denis : do you know if iris is using modifiers in that case?

I wonder if the other drivers don't see the issue because they're not using them.
Comment 6 Lionel Landwerlin 2019-07-16 14:32:16 UTC
Right mutter/cogl ends up calling glGetTexImage() which transfer one of the image of the chromium set of images into a non compressed texture.

I need to read more GL spec to figure out what should actually happen on the driver side.
One way mutter could workaround this would be to use the get_image_via_offscreen() path in meta_shaped_texture_get_image() so that the state of the image is not affected.
Comment 7 Denis 2019-07-16 14:34:47 UTC
>Denis : do you know if iris is using modifiers in that case?

Not sure how to answer correctly on this question, but you lead me to check and compare these modifiers in iris and i965 => /src/mesa/drivers/dri/i965/intel_screen.c

Commenting this fixes the issue for i965 
>{ .modifier = I915_FORMAT_MOD_Y_TILED_CCS , .since_gen = 9 }

Maybe this may help you.
Comment 8 Lionel Landwerlin 2019-07-16 15:14:16 UTC
So looking at the way mutter bind the dmabuf shared with chromium to a 2D texture, it does so through cogl_egl_texture_2d_new_from_image().
That function seems to rely on GL_OES_EGL_image_external extension : https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_image_dma_buf_import.txt

The extension has this bit in the spec :


    There is no support for most of the functions that manipulate
    other texture targets (e.g. you cannot use gl*Tex*Image*() functions with
    TEXTURE_EXTERNAL_OES).

That makes me think what mutter is doing is illegal.
Comment 9 Jonas Ådahl 2019-07-16 16:54:40 UTC
(In reply to Lionel Landwerlin from comment #8)
> So looking at the way mutter bind the dmabuf shared with chromium to a 2D
> texture, it does so through cogl_egl_texture_2d_new_from_image().
> That function seems to rely on GL_OES_EGL_image_external extension :
> https://www.khronos.org/registry/EGL/extensions/EXT/
> EGL_EXT_image_dma_buf_import.txt
> 
> The extension has this bit in the spec :
> 
> 
>     There is no support for most of the functions that manipulate
>     other texture targets (e.g. you cannot use gl*Tex*Image*() functions with
>     TEXTURE_EXTERNAL_OES).
> 
> That makes me think what mutter is doing is illegal.

Does gl*Tex*Image() here imply glGetTexImage() too then you mean? Does reading from it mean "manipulate"?

Either way, I created https://gitlab.gnome.org/GNOME/mutter/merge_requests/687 that should avoid reading directly from it. It didn't reproduce with or without it, so if anyone that can reproduce it can give it a test I'd appreciate it.
Comment 10 Olivier Fourdan 2019-07-16 17:24:56 UTC
FTR, this is the same as https://gitlab.freedesktop.org/xorg/xserver/issues/545
Comment 11 Olivier Fourdan 2019-07-16 17:27:07 UTC
(In reply to Jonas Ådahl from comment #9)
> Either way, I created
> https://gitlab.gnome.org/GNOME/mutter/merge_requests/687 that should avoid
> reading directly from it. It didn't reproduce with or without it, so if
> anyone that can reproduce it can give it a test I'd appreciate it.

I tested and confirm https://gitlab.gnome.org/GNOME/mutter/merge_requests/687 fixes both https://gitlab.freedesktop.org/xorg/xserver/issues/545 *and* the screenshot issue for me.
Comment 12 Kenneth Graunke 2019-07-16 18:02:00 UTC
(In reply to Lionel Landwerlin from comment #5)
> Denis : do you know if iris is using modifiers in that case?

iris does not use the Y+CCS modifier yet.  It uses X/Y/linear modifiers.  @jljusten is working on adding Y+CCS support.  So it probably works there simply because we're not using any modifiers with compression.
Comment 13 Lionel Landwerlin 2019-07-16 18:45:02 UTC
(In reply to Kenneth Graunke from comment #12)
> (In reply to Lionel Landwerlin from comment #5)
> > Denis : do you know if iris is using modifiers in that case?
> 
> iris does not use the Y+CCS modifier yet.  It uses X/Y/linear modifiers. 
> @jljusten is working on adding Y+CCS support.  So it probably works there
> simply because we're not using any modifiers with compression.

Thanks, I was going to test this tomorrow :)
Comment 14 Lionel Landwerlin 2019-07-16 19:14:31 UTC
(In reply to Jonas Ådahl from comment #9)
> (In reply to Lionel Landwerlin from comment #8)
> > So looking at the way mutter bind the dmabuf shared with chromium to a 2D
> > texture, it does so through cogl_egl_texture_2d_new_from_image().
> > That function seems to rely on GL_OES_EGL_image_external extension :
> > https://www.khronos.org/registry/EGL/extensions/EXT/
> > EGL_EXT_image_dma_buf_import.txt
> > 
> > The extension has this bit in the spec :
> > 
> > 
> >     There is no support for most of the functions that manipulate
> >     other texture targets (e.g. you cannot use gl*Tex*Image*() functions with
> >     TEXTURE_EXTERNAL_OES).

Duh! I pointed to the wrong spec...
Here is the good one : https://www.khronos.org/registry/OpenGL/extensions/OES/OES_EGL_image_external.txt

> > 
> > That makes me think what mutter is doing is illegal.
> 
> Does gl*Tex*Image() here imply glGetTexImage() too then you mean? Does
> reading from it mean "manipulate"?

My reading of manipulate goes for both read/write operations.
Essentially anything that could alter the metadata associated with the texture (size, format and those hidden compression settings).

I agree that the spec is a bit vague on this point and I'm not 100% sure what was the intent.
I could image we could allow glGet*Image* in the driver but that would essentially be replicating what you would otherwise do in the application (e.g. blit to an offscreen buffer and read that one back).

The problem is that the driver does a bunch of tracking on what you do with the texture and puts it in the most sensible disposition for what it's used.
Once that goes out of sync with what the other processes have, we're in trouble.

> 
> Either way, I created
> https://gitlab.gnome.org/GNOME/mutter/merge_requests/687 that should avoid
> reading directly from it. It didn't reproduce with or without it, so if
> anyone that can reproduce it can give it a test I'd appreciate it.

That fixes the problem on my side too. Thanks!
Comment 15 Jonas Ådahl 2019-07-16 20:26:46 UTC
(In reply to Lionel Landwerlin from comment #14)

...

> I could image we could allow glGet*Image* in the driver but that would
> essentially be replicating what you would otherwise do in the application
> (e.g. blit to an offscreen buffer and read that one back).

FWIW, OES_EGL_image_external (used with EGLStreams) has similar limitations. Other than that, from a Wayland buffer point of view, there are other things that mandate drawing via an offscreen to get usable output (buffer transforms, viewports etc), so it's not like there shouldn't be paths in the application that handles these kind of situations either way.

> 
> The problem is that the driver does a bunch of tracking on what you do with
> the texture and puts it in the most sensible disposition for what it's used.
> Once that goes out of sync with what the other processes have, we're in
> trouble.

I don't think it's worth the trouble to let the driver handle if it would just complicate things.

> 
> > 
> > Either way, I created
> > https://gitlab.gnome.org/GNOME/mutter/merge_requests/687 that should avoid
> > reading directly from it. It didn't reproduce with or without it, so if
> > anyone that can reproduce it can give it a test I'd appreciate it.
> 
> That fixes the problem on my side too. Thanks!

Thanks for testing! Thanks to Olivier too :)
Comment 16 Daniel van Vugt 2019-07-19 04:50:23 UTC
Fix landed in mutter, scheduled for release in version 3.33.4.
Comment 17 Lionel Landwerlin 2019-07-19 05:19:17 UTC
I would like to keep this open.
I've been tried to make this work with i965.
It's really not clear to me what we can and cannot do with texture bound to external images.

I'll be pretty much the same thing than what mutter did for fixing this bug.
Comment 18 GitLab Migration User 2019-09-25 20:33:49 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1819.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.