98731 – EGL client behaviour too undefined

Bug 98731 - EGL client behaviour too undefined

Summary: EGL client behaviour too undefined

Status:	RESOLVED FIXED

Alias:	None

Product:	Wayland
Classification:	Unclassified
Component:	wayland (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Wayland bug list
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-11-15 03:59 UTC by Jonas Ådahl
Modified:	2018-06-04 06:56 UTC (History)
CC List:	7 users (show)

See Also:	98766
i915 platform:
i915 features:

Attachments

Description Jonas Ådahl 2016-11-15 03:59:21 UTC

The behaviour of EGL clients is too undefined. For example when exactly a client can expect wl_surface.attach+wl_surface.commit is not known, making it hard to implement a well behaving client correctly. This is especially problematic in combination with xdg_shell with version >= unstable v6, since it is required that the first buffer attached to a surface must comply to the requirements of the initial configure event.

For example, an application that does



surface = wl_compositor_create_surface();
egl_window = wl_egl_window_create(surface, some_width, some_height);
xdg_surface = xdg_shell_get_xdg_surface(surface);

--> on configure

wl_egl_window_resize(egl_window, actual_width, actual_height);
.. draw things ..
eglSwapBuffers()




may work on some EGL implementations, but not on others.

Some uncertainties that needs to be clarified:

1) When can wl_surface.attach be expected to take place
2) When can wl_surface.commit be expected to take place
3) What is the dimension of the wl_buffer used in 1) and 2)



Some requirements that needs to be met in order to guarantee a correct initial frame (as required by the current xdg_shell version):

a) It must be possible to know when the buffer of the first frame is attached and committed so that it is possible to set up the associated state (i.e. windew geometry, input region, opaque region etc)
b) It must be possible to know when buffers of subsequent frames are attached and commited so that is possible to set up the same set of associated state (i.e. window geometry and input region when a surface is resized, or opaque region when the content changes).

Comment 1 Pekka Paalanen 2016-11-15 08:00:36 UTC

Where should this specification live?

Do we need the same for Vulkan?

Comment 2 Jonas Ådahl 2016-11-15 09:12:33 UTC

(In reply to Pekka Paalanen from comment #1)
> Where should this specification live?

Good question. I don't think we mention any of wayland-egl.h in any extension specification. At least not in https://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_platform_wayland.txt

Would it be enough to just add it to wayland-egl.h?

> 
> Do we need the same for Vulkan?

The Vulkan version of wayland-egl.h, VK_KHR_wayland_surface, already some documententation about this:

"... Also, calling vkQueuePresentKHR will result in Vulkan sending wl_surface.commit requests to the underlying wl_surface of each VkSwapchainKHR objects referenced by pPresentInfo. Therefore, if the application wishes to synchronize any window changes with a particular frame, such requests must be sent to the Wayland display server prior to calling vkQueuePresentKHR."

Given the wording about being able to synchronize any window changes, one could probably assume that "wl_surface.commit" will ONLY be sent as a result of calling VkQueuePresentKHR. As the buffer allocation is more explicit in Vulkan, I assume those issues as well are not present there.

Comment 3 Pekka Paalanen 2016-11-15 09:34:38 UTC

I think we already have all the answers, we really just need to agree on the place to document it all.

wayland-egl-core.h, living in Wayland repository, would be a very convenient place at least for starters. We could polish the EGL spec there in the public without the overhead of going through e.g. Khronos. Once it's good, we could see if we need to get it stamped by Khronos in e.g. EGL_KHR_platform_wayland.

Should we also have EGL implementors' notes? In the same place or a different place? E.g. explaining how you define your own private Wayland protocol extensions. Maybe that should be a chapter in the Wayland docbook instead?

Or put everything in the docbook?

Comment 4 Jonas Ådahl 2016-11-15 09:50:14 UTC

(In reply to Pekka Paalanen from comment #3)
> I think we already have all the answers, we really just need to agree on the
> place to document it all.
> 
> wayland-egl-core.h, living in Wayland repository, would be a very convenient
> place at least for starters. We could polish the EGL spec there in the
> public without the overhead of going through e.g. Khronos. Once it's good,
> we could see if we need to get it stamped by Khronos in e.g.
> EGL_KHR_platform_wayland.

Sounds good to me.

> 
> Should we also have EGL implementors' notes? In the same place or a
> different place? E.g. explaining how you define your own private Wayland
> protocol extensions. Maybe that should be a chapter in the Wayland docbook
> instead?

Sounds like the docbook makes more sense for that, and just leave wayland-egl.h (and later EGL_KHR_platform_wayland) for specifying expected behaviour.

> 
> Or put everything in the docbook?

The docbook IMHO is more inaccessible (harder to find, and doesn't contain any API specification so far), so I'd put my vote on wayland-egl.h for now.

Comment 5 Daniel Stone 2016-11-15 10:31:49 UTC

(In reply to Jonas Ådahl from comment #0)
> Some uncertainties that needs to be clarified:
>
> 1) When can wl_surface.attach be expected to take place
> 2) When can wl_surface.commit be expected to take place

The only way to make this sane (especially with subsurfaces!) is: strictly inside eglSwapBuffers (i.e. before it returns), and at absolutely no other time whatsoever. Anything else is going to cause huge issues.

> 3) What is the dimension of the wl_buffer used in 1) and 2)

The semantics that ~everyone has settled on that I'm aware of are, that of the last wl_egl_window_resize call before the first (E)GL call made after SwapBuffers. e.g.:
  swap_A = eglSwapBuffers()
  wl_egl_window_resize(size_A)
  wl_egl_window_resize(size_B)
  eglQuerySurface(EGL_BUFFER_AGE);
  wl_egl_window_resize(size_C)
  glClear()
  swap_B = eglSwapBuffers()
  glDrawArrays()
  swap_C = eglSwapBuffers();

Swap B will have size B, and swap C will have size C.
 
> Some requirements that needs to be met in order to guarantee a correct
> initial frame (as required by the current xdg_shell version):
> 
> a) It must be possible to know when the buffer of the first frame is
> attached and committed so that it is possible to set up the associated state
> (i.e. windew geometry, input region, opaque region etc)
> b) It must be possible to know when buffers of subsequent frames are
> attached and commited so that is possible to set up the same set of
> associated state (i.e. window geometry and input region when a surface is
> resized, or opaque region when the content changes).

Or correctly synchronise subsurface drawing ...

I think documenting it upstream in Wayland and then taking a subset of this text through Khronos would be the best idea.

Comment 6 Daniel Stone 2016-11-15 10:42:07 UTC

CCed a couple of the non-Mesa drivers; I think the only two we're now missing are Imagination and Vivante.

Comment 7 James Jones 2016-11-15 16:57:46 UTC

I think eglSwapBuffers() (and similarly, vkQueuePresent) is specifically too late for us, and that's why this issue was raised, but I'll leave the details to Miguel (Also CC'd now).

Comment 8 Miguel A. Vico 2016-11-15 18:51:39 UTC

For EGLStreams, eglSwapBuffers() is too late because we rely on wl_surface.attach + wl_surface.commit to have the compositor attach a consumer to the stream so that the client can attach the EGLSurface producer that will be returned to the application as the EGL platform window surface.

These first wl_surface.attach + wl_surface.commit don't encode any "new buffer content available" message though, which basically means we are abusing these interfaces.

We will extend our wayland protocol to add the appropriate interfaces for setting EGLStreams up without relying on wl_surface.attach + wl_surface.commit.


However, I don't think forcing a wl_surface.attach + wl_surface.commit to strictly happen inside eglSwapBuffers() is correct, and will cause other interaction issues with, for instance, EGL_NV_stream_fifo_synchronous.

We already have some use cases where fifo_synchronous is used to defer wl_surface.attach + wl_surface.damage + wl_surface.commit until the frame is finished.

I think those are legitimate use cases that we must take into account.

Comment 9 Pekka Paalanen 2016-11-16 08:38:58 UTC

There is one very strict requirement we cannot loosen:

The application MUST be in TOTAL control of when requests to a wl_surface happen.

This applies specifically to wl_surface.commit, because the application must know exactly when it latches in the new window state. If the application is not in total control of commits, it cannot be sure what the window state is, and therefore it cannot reliably drive the GUI.

This also applies to wl_surface.attach, because the application must be able to call wl_surface.commit to change the window state irrespective of EGL workings. If EGL does a wl_surface.attach at arbitrary times, the application may latch in the buffer without EGL expecting it.

The fundamental assumption made when designing all Wayland protocol extensions is that the application is in control of wl_surface.commit, and therefore all extensions that need atomic semantics tie to the wl_surface.commit.

Window state and window content go hand-in-hand, because they need to be kept in sync. Particularly during window state changes, which usually require a re-draw, committing the newly drawn buffer must happen in the exact same commit as latching the new window state. Otherwise the application will be unreliable and/or glitch visually.

Therefore it is unacceptable for EGL to:
- send wl_surface.attach and/or wl_surface.commit asynchronously
- send wl_surface.attach and/or wl_surface.commit from inside API calls that have not been defined to do so
- not send wl_surface.attach and/or wl_surface.commit from an API call that is defined to do so

The above applies also to damage requests and pretty much all (state changing) requests on protocol objects the application can access (mainly wl_surface).

With all the above restrictions, we are more or less free to define the EGL operation. In practice this means we have to define exactly which EGL API calls emit which Wayland requests.

However, the current practice for a long time has been that calling eglSwapBuffers() calls wl_surface attach, damage and commit. If we change that, we break existing applications.

If you cannot strictly work with these restrictions then Miguel is on the right track: add private protocol extension that does what you need, rather than trying to bend the Wayland core protocol specification. If you need to tell something to the server-side EGL at client EGLSurface creation time, use a private extension. If you need to do attach+commit-like operations async or before eglSwapBuffers, do it in your private extension and use the wl_buffer only as a marker/reference of when one should really take the newly drawn image into use (whether or not the image is ready [*]).

[*] Waiting for images to become ready to use must be up to the compositor, because it needs to sync the window state with it. If you really want to postpone the wl_surface.commit with the new image in the client, then you also need to synchronously block the client. Otherwise the window state updates will be messed up. As for waiting for client rendering completion in the compositor, I hope we get a nice solution with explicit fences: the compositor can check the fence to decide when to execute the commit and avoid stalling the whole compositor by using an older frame until the new one is ready.

The important thing is that the application knows exactly which image is (going) on-screen, that window state matches the image, and that we do not mess up the window state set-up with unexpected or missing requests. And of course the compositor needs to be able to also deliver these promises.

Focusing only on the steady-state frame streaming when no window state is changed and optimizing that while sacrificing atomicity of state updates is not ok (even if it is all that games care about). Everything in Wayland revolves around strictly ordered sequences of requests.

Comment 10 Pekka Paalanen 2016-11-16 08:47:29 UTC

Btw. the requirement of window state and content synchronization is what made the queueing part of bug #83092 very difficult to design and later I gave up on it to get at least the feedback part merged.

Comment 11 Pekka Paalanen 2016-11-16 10:01:27 UTC

If it is impossible to support atomic state changes and have optimized performance, maybe we'd need two modes then: normal and streaming.

In normal mode (default) EGL operation is deterministic and window state changes can be executed atomically. This is what I described in comment 9.

In streaming mode, EGL operation can asynchronous and non-deterministic, and window state changes are "forbidden". The application cannot identify which frame is going to the screen or is being displayed at any time. Presentation extension is useless because there is no guarantee which frame submission it corresponds to. This mode must be explicitly opt-in, so probably it would be an EGL/Wayland extension with API to switch the mode.

I presume only games would ever be interested in the streaming mode, as they tend to repaint regardless of the compositor frame cycle anyway and do not care which frame(s) happen to end up on screen. I cannot imagine anything else to be like that. E.g. video players care about audio/video sync, so they care about which frame was shown when.

I wouldn't want to go in this direction, though. It seems like a step backwards, towards unpredictable behaviour.

Also "modes" are hard, we already learnt that with the sub-surface extension which has "synchronized" and "desynchronized" modes. It's a pain to spec and implement.

With the streaming mode there is also a risk of wanting "just one more thing supported while streaming", like first a frame callback, then Presentation feedback, size change, and then it's a never-ending slippery slope of extensions duplicating what Wayland itself already had...

Comment 12 Daniel Stone 2016-11-16 10:06:44 UTC

(In reply to Miguel A. Vico from comment #8)
> However, I don't think forcing a wl_surface.attach + wl_surface.commit to
> strictly happen inside eglSwapBuffers() is correct, and will cause other
> interaction issues with, for instance, EGL_NV_stream_fifo_synchronous.
> 
> We already have some use cases where fifo_synchronous is used to defer
> wl_surface.attach + wl_surface.damage + wl_surface.commit until the frame is
> finished.
> 
> I think those are legitimate use cases that we must take into account.

I agree, but I don't think making posting asynchronous is legitimate. It will provably break things like subsurfaces, and can get you into deadlock situations. For better or worse, this is the protocol we have, and we can't break it with asynchronous submission. Vivante does asynchronous submission today, and it bites.

The concerns in fifo_synchronous are legitimate, but I'd prefer to see them addressed in the compositor. In order to minimise latency, we do want to submit the buffers as early as possible regardless of the transport (Streams or no), but we don't necessarily want to block the compositor on the slowest of all clients.

We can solve this generically - including for clients using explicit fencing - by giving the compositor fences to poll on and allowing it to make a decision in its repaint loop as to which buffer to use. I'd suggest the best mechanism would be to allow the compositor to extract an EGLSyncKHR object which will trigger when the fence is signalled (i.e. buffer ready for immediate use), and then being able to export that to a native sync FD which the compositor can then poll on.

This is more work than deferring to an asynchronous { glFinish(); wl_suface_commit(); } thread, but does have the advantage of not breaking things, and works for more usecases than just Streams.

Comment 13 Miguel A. Vico 2016-11-17 23:03:28 UTC

I'd really like to see fence support added into the compositor so we can stop committing surfaces asynchronously on the client side.

However, I think we should keep discussion about that in a separate bug, as the issue is different than the one tracking in this bug.

I filed https://bugs.freedesktop.org/show_bug.cgi?id=98766 for that purpose.

Comment 14 Daniel Stone 2018-06-04 06:56:45 UTC

I'm going to close this bug oug on the grounds that the behaviour _is_ very well defined, it just doesn't work for EGLStreams.

There are two follow-ups to this:
Document current implicit-synchronisation behaviour and client/compositor responsibilities - https://bugs.freedesktop.org/show_bug.cgi?id=97379
Support explicit synchronisation via dma-fence FDs - https://bugs.freedesktop.org/show_bug.cgi?id=98766

If there are any further issues to deal with, I think filing new bugs is easier than adding more discussion to this one, though a note added at the end of here would probably be helpful.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.