Bug 78190

Summary:	protocol: Reporting damage in surface coordinates doesn't work with EGL
Product:	Wayland	Reporter:	Jason Ekstrand <jason>
Component:	wayland	Assignee:	Wayland bug list <wayland-bugs>
Status:	RESOLVED FIXED	QA Contact:
Severity:	major
Priority:	medium
Version:	unspecified
Hardware:	All
OS:	All
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	83092
Attachments:	Weston texture upload problem demonstration

Description Jason Ekstrand 2014-05-02 14:15:33 UTC

Currently, the wl_surface protocol requires damage to be reported in surface coordinates. However, since EGL is unaware of any buffer-to-surface transformations that may have been applied through set_buffer_transform, set_buffer_scale, or wl_viewport, it cannot report damage correctly. In short:

1) The EGL implementation has no way of knowing about buffer transformations.

2) The EGL implementation is responsible to call wl_surface.damage inside of eglSwapBuffers. Because it doesn't know about buffer transformations, it can't damage a rectangle the same size as the surface.

3) Mesa currently damages an area of INT32_MAX x INT32_MAX to deal with this. However, in the version 1.0 of the specification, the surface size was always the same as buffer size. Some EGL implementations (including older versions of mesa) rely on this and only damage the buffer size. Therefore, damage is wrong if the surface is rotated by 90 or 270 degrees.

4) Clients have no way to determine if the EGL stack they are using does max damage or simply damages the buffer area. Therefore, they cannot assume that the EGL implementation will do the right thing. (Especially since this used to be perfectly Ok.) This also means that they can't detect whether or not surface transformations are useable.

It turns out that damage isn't an issue in the case where we have an integer buffer_scale or a transform that's a multiple of 180 degress. This is because, in these cases, the buffer size is at least as large as the surface size and damaging bigger isn't a problem. Using wl_viewport to scale up a surface, however, does coause a problem.

As far as I see it, this leaves us with four options:

1) Clients call wl_surface.damage manually right before eglSwapBuffers. The surface would always get damaged by eglSwapBuffers as well. While the surface would probably get over-damaged, it would at least ensure enough damage.

2) We add an EGL extension for every type of surface transformation.

3) We say that any EGL implementation that does not damage an area of INT32_MAX x INT32_MAX is broken and hope they change.

4) We change the protocol so damage is in buffer coordinates.

Option 1 breaks the whole idea of eglSwapBuffers and hiding stuff inside of EGLSurface and is really messy. Option 2 is bad because we don't want to wait on EGL implementations before clients can use new protocol bits.

In my initial discussions with Kristian on IRC, he was a much bigger fan of 3 and trying to hack around things in clients. Initially, we thought that we could just avoid 90 and 270 degree rotations unless we have EGL_EXT_swap_buffers_with_damage. The idea was that eglSwapBuffersWithDamageEXT would simply pass the damage rectangles through to the compositor so it is then up to the client to hand it the right damage and buffer transforms would then be 100% ok.

Unfortunately, EGL_EXT_swap_buffers_with_damage is even more broken than eglSwapBuffers in some sense. When the EGL_EXT_swap_buffers_with_damage extension was written the authors, trying to stay consistent with the rest of OpenGL, chose to make the coordinates of the damage rectangles relative to the lower-left corner of the surface rather than the upper-left. This means that the EGL implementation has to vertically flip all of the damage rectangles. Normally this is a trivial operation. However, if the EGL implementation does not know the proper height of the surface, the computed rectangles will be wrong. This means that, not only is eglSwapBuffers broken, but eglSwapBuffersWithDamage is broken even worse.

Unless we start re-writing EGL extensions, I think we're stuck with option 4.

Comment 1 Jason Ekstrand 2014-05-02 14:15:54 UTC

The following e-mail thread contains some discussion on the issue:

http://lists.freedesktop.org/archives/wayland-devel/2014-February/013479.html

Comment 2 Pekka Paalanen 2014-05-05 07:32:31 UTC

Jason, that is a very good analysis of the problems, and I agree with your rationale.

If we choose to make damage in buffer coordinates, that will affect the definition of the Presentation extension, and also the core protocol changes it requires. Therefore Presentation cannot land until this issue is solved.

A fifth option came to my mind: keep all the Wayland semantics as is, but add a function to wayland-egl API:

	void wl_egl_window_set_damage_size(struct wl_egl_window *egl_window, int width, int height)

This call would set the damage rectangle, that eglSwapBuffers() normally uses, and the size with which swapbuffers-with-damage uses to flip the rectangles.

It seems to me like this would solve the immediate issue.

However, I don't know if we can change the wayland-egl API at all. It has absolutely no versioning, not on header level nor runtime. Build-time version checks could be done by checking libwayland-client version for the header, and the version reported by wayland-egl.pc (provided by the libwayland-egl.so implementation) for the library version, but there is no way to check at runtime.

<rant>
What's worse, wayland-egl.pc from Mesa reports the Mesa version. Any other EGL implementation likely reports their own version, making the version vendor dependent instead of API dependent. (The same problem applies to egl.pc, gl.pc, glesv1_cm.pc and glesv2.pc.) I have no idea why these .pc files cannot just report the real API version, e.g. EGL 1.4. If someone wants to check the specific implementation's version, there should be an implementation-specific .pc file for that, e.g. mesa3d.pc, since obviously the build is already implementation dependent.
</rant>

Comment 3 Jason Ekstrand 2014-05-05 14:07:50 UTC

Pekka,
Yes, as you mentioned, the EGL api is not versioned, so we can't really change it; at least not easily.  Also, it has the same problem as some of the other options of simply pushing the problem back into EGL.  If we do that, then we are waiting on EGL implementations to fix our problem for us and some of them probably never will.  Also, thanks to the lack of versioning, the clien't can't realy know if the EGL implementation supports it.

Also, for what it's worth, I don't think making eglSwapBuffersWithDamageEXT take damage in surface coordinates makes sense.  It intentionally uses the same coordinate system as glViewport, glReadPixels, and friends which are all inherently in buffer coordinates (or flipped buffer coordinates if you prefer).  It's an EGL call and EGL doesn't know about surface transforms.  Having eglSwapBuffersWithDamageEXT work in flipped window-system-specific surface coordiantes just adds to the proliferation of coordinate systems.

Comment 4 Jason Ekstrand 2014-05-05 14:10:11 UTC

One other idea which I hate to even mention (since it's such a hack) would be to add a damage_done request to wl_surface that tells the compositor to ignore any subsequent damage requests until the commit.  This would allow the client to submit a bunch of damage requests, call damage_done, and then eglSwapBuffers.  The bad damage from eglSwapBuffers would get ignored by the compositor and it would just use the damage from the client.

Like I said, this is a TERRIBLE HACK.  I am including it mainly for completeness of the discussion.  Please ignore it.

Comment 5 Pekka Paalanen 2014-05-06 05:51:36 UTC

Jason, again, you make good points.

We have painted ourselves into a corner, and to me it seems like option 4 "damage in buffer coordinates" would be the best of bad options as far as I can imagine.

Comment 6 Jason Ekstrand 2014-05-07 02:39:17 UTC

The next question, that comes to my mind is how to define damage in certain edge-cases. In particular, what happens if the buffer_transform, buffer_scale, or viewport is changed? If you attach with a non-zero (dx, dy), how is damage treated? My inclination is to say that damage is relative to the previously attached buffer with no respect to any transforms. In order to analyze this, let's look at a few case-studies.

1) A client re-paints the same image only rotated to handle a rotated output. In this case, the difference between the two buffers will most likely be almost total (unless the drawn image has some symmetry somewhere). If it's an SHM buffer, we will have to either do a full texture upload or some sort of CPU-side transformation. However, from a surface perspective, the contents haven't changed at all and the compositor doesn't need to repaint. I think the solution here is that clients shouldn't be changing the buffer_transform all that often and that the one-time cost of a full repaint is ok. Also, the client can wait until something actually changes on its surface before repainting everything transformed.

2) A video client adjusts wl_viewport on a surface and re-attaches the previous buffer. Here having damage relative to the last-attached buffer saves us. We can avoid the texture upload and the compositor will no that, because the wl_viewport was changed, that it needs to repaint that portion of the screen.

3) A client resizing from the top or left. Previously, if the contents in the lower-right didn't change, the client could only damage the upper-left edge that is moving. With buffer-coordinates, you only really have resizing from the lower-right, so you'll have full damage in this case. If it's a DRM buffer, you're going to have to re-allocate and the client will have to do a full repaint to resize anyway. The cost of the compositor repainting the entire surface will be relatively small compared to this. On the other hand, if it's an SHM buffer, the compositor is either going to have to do something very clever or it will have to blow the whole texture and re-upload because there's no way to tell OpenGL to take the current texture contents and shift them down and to the right.

One thing that keeps coming up in every example I can think of is texture upload for SHM buffers. If the damage is in surface coordinates, the compositor has properly track all of the damage and where it maps to on the buffer and what has happened with the buffer in order to figure out the right portions of the image to re-upload. If damage is in buffer-coordinates relative to the previously attached buffer, this is a non-issue. My little weston-damage client didn't demonstrate this, but I'm pretty sure that Weston is broken with respect to suddenly changing buffer_transform and not reporting damage.

Comment 7 Jason Ekstrand 2014-05-07 03:27:27 UTC

Created attachment 98597 [details]
Weston texture upload problem demonstration

This video demonstrates the issues with weston's texture upload if the buffer_transform is changed but the surface is only partially damaged.

Comment 8 Pekka Paalanen 2014-05-07 07:21:27 UTC

Tricky.

Ok, this needs to be thought out from the very beginning.

Postulate: damage needs to be reported in buffer coordinates.

Consequences:

1. Definition: damage describes where the new buffer content differs from the old buffer content. This is the "opposite" of the current protocol definition.

2. The damage must be in the new buffer's coordinate system. Actually, the coordinate systems are always equivalent and only the damage clipping rectangle may differ, as we ignore all transformations and dx,dy of attach which affects the surface, not the buffers.

3. Simple resizing from bottom/right must at least add damage to be bottom/right part that is new (anchor in top-left corner). If the size shrinks and the new content is identical to the old content on the overlapping part, no damage is needed.

4. Simple resizing from top/left requires full damage, as the contents in the buffer coordinate system are all moving (anchor in the bottom-left corner).

5. A new buffer with a new buffer transform, even if it the end result in surface space would be identical, will require full damage. From buffer point of view, all the contents changes wrt. the old buffer.

6. If the buffer contents stay the same, but wl_viewport is changed, the client is not required to send any damage.

Those are the unavoidable consequences for the damage that needs to be reported by a client, AFAICS.

Consequences 3 and 4 are actually dependent on the anchor, not which side is resizing. (Did I just reinvent "window gravity"??)

It also has further consequences on the compositor. It is no longer the client's responsibility to compute the damage in surface coordinates. Therefore the compositor must implicitly add damage in its scene graph when:
- dx,dy of attach changes
- buffer transformation changes
- wl_viewport changes
These changes will change the surface's appearance even if the client does not change the buffer content.

Damage on the buffer is a completely different thing than a change (damage) in surface content. Clients will no longer specify where the surface's appearance changes, but where the buffer contents change.

We lose some corner-case optimizations, and we gain some other corner-case optimizations, but I think none of those cases matter, because they are not common. E.g. changing buffer transformation does not happen often, and resizing a surface very often does require a full redraw, that is full damage, anyway. It becomes trivial to maintain a copy of the buffer contents (glTexSubImage), but it becomes more complicated to compute the damage on screen (or does it?).

With these definitions, I think the logic is quite obvious after all. Can you find a flaw in my reasoning?

Jason, when I read your #6, I didn't feel like I understood it. Now when I did the thinking myself, it seems like I came to the exact same idea and conclusions as you did. Right? :-)

Btw, with the current protocol, if a client changes buffer transformation but issues no damage and no new buffer, it is a client bug. The client is currently responsible to set the damage where the surface appearance changes.

Comment 9 Jason Ekstrand 2014-05-07 14:21:45 UTC

Pekka, Yes, I think we're 100% on the same page here. The compositor will have to have a bit more logic to properly handle damage when things change.

What we have going on here is that there are two different types of damage in play: buffer damage and surface damage. Buffer damage is what has changed between the previous buffer and the current one. Surface damage is the the portion of the surface (buffer transformations included) that has changed. Buffer damage is used for SHM texture upload whild surface damage is used to allow the compositor to only partially repaint.

In the steady state (when the buffer-to-surface transformation matrix isn't changing) it is fairly easy to derive one from the other. As soon as some part of the transformation changes, we have an edge-case and it becomes difficult to accurately determine one from the other. Fortunately, we should be running in the steady-state most of the time so this should only ever be a problem if we are changing the transform, scale, or viewport. Most of such changes will require full damage anyway.

It's worth noting that buffer damage doesn't matter for EGL surfaces or for SHM when using the pixman renderer. This is because there is no texture upload and we are directly copying from the client buffer into the output buffer. The only thing that matters in these cases is the compositor's ability to only partially repaint. However, as I mentioned above, I don't think it makes much sense for eglSwapBuffersWithDamageEXT to accept damage in anything other than buffer coordinates because EGL only knows about buffer coordinates.

Also, resizing from the top-left really isn't any different than resizing from the bottom-right from a buffer damage perspective. Most clients will resize exactly the same from one corner or from the other so, if the menu bar, decorations, etc. don't change, there could be an upper-left rectangle that's not damaged in either case. The only difference is that the compositor has to fully repaint the surface in upper-left case because the undamaged part isn't pinned. Oh, well.

Another option that I don't like would be to have a wl_surface.surface_damage request that explicitly sets the surface damage and let wl_surface.damage set the buffer damage. As I said, I don't like this option and would rather we just let the compositor infer surface damage from buffer damage.

Comment 10 Pekka Paalanen 2014-10-06 08:30:53 UTC

My recent reasoning about interface version backward compatibility is an argument for not changing wl_surface.damage semantics in a version bump. Instead, we should add a new request.

See: http://lists.freedesktop.org/archives/wayland-devel/2014-October/017687.html

"Don't modify old protocol just by version, it won't be backward compatible." seems like a good mantra to me.

Comment 11 Jason Ekstrand 2014-10-06 19:31:39 UTC

(In reply to Pekka Paalanen from comment #10)
> My recent reasoning about interface version backward compatibility is an
> argument for not changing wl_surface.damage semantics in a version bump.
> Instead, we should add a new request.
> 
> See:
> http://lists.freedesktop.org/archives/wayland-devel/2014-October/017687.html
> 
> "Don't modify old protocol just by version, it won't be backward
> compatible." seems like a good mantra to me.

The only reason why changing damage coordinates would work is precicely because it's not backwards compatable.  Backwards is broken.  If we add a new request, then EGL implementations will continue to happily report damage in buffer coordinates to the request that needs surface coordinates and ignore the one that takes buffer coordinates.

Comment 12 Pekka Paalanen 2014-10-07 09:42:29 UTC

(In reply to Jason Ekstrand from comment #11)
> The only reason why changing damage coordinates would work is precicely
> because it's not backwards compatable.  Backwards is broken.  If we add a
> new request, then EGL implementations will continue to happily report damage
> in buffer coordinates to the request that needs surface coordinates and
> ignore the one that takes buffer coordinates.

You want to fix EGL while breaking everything else, e.g. wl_shm-based apps/toolkits that already implement HiDPI support. It is debatable how much the everything else matters in practice, but it would also be breaking our API/protocol stability promise.

The Wayland spec for damage is not broken in itself. It is the EGL API that makes using it always correctly impossible, because it prevents the code inside libEGL that uses the Wayland protocol to have all the necessary information that *is* present in the client. I wouldn't say EGL is broken here either, it's just the combination.

To recap, we may currently have two kinds of EGL implementations:
a) ones that rely on wl_surface v1, and pass buffer size to damage, and
b) ones that use always "infinite" damage, because it is guaranteed to at least work.

Right?

If an application uses buffer_transform/scale, or wl_viewport, case a) is currently broken.

If we changed the meaning of wl_surface.damage coordinates, case a) would become correct, but all non-EGL apps using wl_viewport or buffer_transform/scale would break. Case b) would just continue being sub-optimal and working.

What I'm proposing means the broken case a) will remain broken. It should be a known quantity by now, especially due to the case b) workaround. However, existing apps would not be affected: they continue to be correct or broken just like they used to. Yet, EGL implementations that are actually maintained would be able to implement the correct semantics for damage by using the new wl_surface.buffer_damage when it's available.

I have no idea how many cases of a) there are, especially ones that combine it with buffer_transform/scale or wl_viewport. I would hope it is small. I also do not know how much would break in non-EGL apps if we changed the wl_surface.damage semantics, but I would guess it is more than the EGL cases that would get magically fixed by it.

That is why I would prefer keeping the existing and known behaviour as it is, and offer an opt-in for doing the optimal, correct thing.

Comment 13 Daniel Stone 2014-10-07 10:51:49 UTC

Of the EGL implementations in the wild that I know about, Mesa always sends infinite co-ordinates, and Mali doesn't implement SwapBuffersWithDamage, so it's a moot point. Vivante don't implement SwapBuffersWithDamage either.

There are various single-platform-only implementations around for PowerVR SGX/RGX, some of which do send buffer co-ordinates for damage (I wrote one of them), but as they're generally part of the platform and need updating anyway, rather than distributed to the public at large, I really don't mind breaking 
those.

The Raspberry Pi driver is broken, but that's still in an unmerged branch on GitHub, since it was done on top of the old shim driver; the new driver anholt is writing is based on Mesa, so again will send infinite co-ordinates.

So yes, I do agree with Pekka: I don't really see the upside to breaking backwards compatibility, when it benefits approximately no-one.

Comment 14 Jason Ekstrand 2014-10-08 08:08:14 UTC

I was under the impression that we had a pile of unfixable never-going-to-bee-patched EGL implementations floating around that we needed to work around.  If that's not a huge issue than adding a buffer_damage request is fine

Comment 15 Daniel Stone 2014-10-08 11:07:52 UTC

(In reply to Jason Ekstrand from comment #14)
> I was under the impression that we had a pile of unfixable
> never-going-to-bee-patched EGL implementations floating around that we
> needed to work around.  If that's not a huge issue than adding a
> buffer_damage request is fine

Not really, no. The only one which fits that bill (implements SwapBuffersWithDamage and is known to not always send infinite damage) is PowerVR. However, the PVR implementations are third-party, not distributed by Imagination themselves, so I think breaking those is completely fine: we're not going to break anything by upgrading that no-one can fix.

Even then, the client winsys implementation on PowerVR can be implemented by external modules, as it provides an API that lets you compile externally, even when you don't have the DDK source.

Comment 16 Jason Ekstrand 2014-10-08 12:11:53 UTC

(In reply to Daniel Stone from comment #15)
> (In reply to Jason Ekstrand from comment #14)
> > I was under the impression that we had a pile of unfixable
> > never-going-to-bee-patched EGL implementations floating around that we
> > needed to work around.  If that's not a huge issue than adding a
> > buffer_damage request is fine
> 
> Not really, no. The only one which fits that bill (implements
> SwapBuffersWithDamage and is known to not always send infinite damage) is
> PowerVR. However, the PVR implementations are third-party, not distributed
> by Imagination themselves, so I think breaking those is completely fine:
> we're not going to break anything by upgrading that no-one can fix.

What about just eglSwapBuffers with finite damage?  Even those are broken because, if you do a 90-degree transformation on a non-square surface, your damage will not fill the surface.  For instance, on a 300x400 buffer with a surface transform of 90, you have a surface that's 400x300 and since you damage the buffer size, it gets clipped to 300x300.

Comment 17 Derek Foreman 2015-11-12 19:05:00 UTC

Before actually reading this ticket I stepped in this with both feet:
http://lists.freedesktop.org/archives/wayland-devel/2015-November/025302.html

It seems like there's a consensus that a new request to provide damage in buffer co-ordinates is reasonable, so I'll start a weston implementation.

Comment 18 Derek Foreman 2016-02-11 17:23:57 UTC

I've posted a patch for mesa that uses damage_buffer when available (using Jason's proxy version stuff to query it):

https://lists.freedesktop.org/archives/mesa-dev/2016-February/107312.html

Though, does some specification document somewhere need refinement to indicate we now expect these damage rectangles to be in buffer co-ordinates?

Comment 19 Yong Bakos 2016-04-21 23:20:35 UTC

Derek, did you mean documentation in Wayland/Weston? If so, I can take a stab at it.

Comment 20 Jason Ekstrand 2018-01-17 04:35:31 UTC

This landed in core Wayland a while ago but no one ever closed the bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.