Bug 97353

Summary: Wayland lacks cross-process synchronisation
Product: Wayland Reporter: Tomek Bury <tomek.bury>
Component: waylandAssignee: Wayland bug list <wayland-bugs>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: dancol
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Tomek Bury 2016-08-15 14:32:59 UTC
Currently Wayland synchronisation model is build around an undefined behaviour, which happens to work correctly with FOSS driver, but isn't guaranteed by GL/EGL specifications. In other words a perfectly valid GL/EGL driver may not work with Wayland.

The GL/EGL guarantees read-write synchronisation within a single context and provides tools to synchronise multiple contexts within a single process, but there are no guarantees or methods of achieving cross-process synchronisation, except for Android fences, but those are Android-specific.

The only spec-compliant solution at the moment would be blocking CPU on the compositor side (eglWaitClient() or glFinish()) before a buffer release event is sent to the client. Alternatively, Wayland could create and mandate and extension similar to EGL_ANDROID_native_fence_sync, where EGLSyncKHR objects can be created from and converted to cross-process integer descriptors, that are easy to send between client and compositor over Wayland protocol.

https://www.khronos.org/registry/egl/extensions/ANDROID/EGL_ANDROID_native_fence_sync.txt

Basing Wayland on an unspecified behaviour, where implementation detail of one specific driver happens do deliver a desired result causes grief to other driver maintainers where such behaviour isn't easy to achieve and, limits Wayland adoption.
Comment 1 Daniel Stone 2016-08-15 15:59:29 UTC
Yes, it's under discussion that we would provide something similar to what you describe with native_fence_sync and a release event. For the moment, implicit sync is mandatory, and explicit fencing support is WIP.

Of the available stacks, Wayland EGL support exists on Mesa, Mali, PowerVR and Vivante. The only one which lacks a kernel dependency model providing the right semantics is Vivante, but as it does not implement any kind of kernel dependency management, it already has to execute the equivalent of glFinish() inside eglSwapBuffers(), so is already broken to begin with (in many other ways besides).

NVIDIA have their own implementation based on EGLStreams, however they also do not see this as a barrier to implementing the traditional Wayland EGL / GBM platform.

'a perfectly valid GL/EGL driver', in the context of Wayland, includes supporting these semantics. The driver must already implement support for libwayland-egl, wl_surface, etc; this is just one more thing. There is no such thing as a 'valid EGL driver' which magically works on all window systems.

Do you know of any other stacks which have difficulty implementing this?

tl;dr: I don't believe this is a major issue, but you're right that we should implement support for explicit fencing more generally, and we will.
Comment 2 Tomek Bury 2016-08-15 16:43:13 UTC
As far as I can tell several 3D drivers put read-write synchronisation in user space so it works across contexts within one process but not across processes. Nokia had this problem, Boroadcom driver is another example. 

It's not impossible to implement but the only possible implementation that doesn't involve blocking CPU in the compositor I can think of is rather ugly: create a parallel synchronisation mechanism and deal with 2 of them: regular one in user pace for most things and Wayland special in kernel with an extra kernel round-trip for each read from or write to wl_buffer.
Comment 3 Tomek Bury 2016-08-15 16:46:57 UTC
By the way, is there any spec and/or deadline for the explicit fencing support or is it still in its early stages?
Comment 4 Kristian Høgsberg 2016-08-15 17:35:34 UTC
(In reply to Tomek Bury from comment #2)
> As far as I can tell several 3D drivers put read-write synchronisation in
> user space so it works across contexts within one process but not across
> processes. Nokia had this problem, Boroadcom driver is another example. 
> 
> It's not impossible to implement but the only possible implementation that
> doesn't involve blocking CPU in the compositor I can think of is rather
> ugly: create a parallel synchronisation mechanism and deal with 2 of them:
> regular one in user pace for most things and Wayland special in kernel with
> an extra kernel round-trip for each read from or write to wl_buffer.

The wl_drm interface isn't supposed to be the only buffer sharing interface. Similiar to how drivers can define their own buffer sharing mechanism for X, a driver can implement its own wayland interface for sharing buffers.

For driver architectures that require userspace synchronization, you would define your own buffer sharing interface and pass along fences for each shared buffer.
Comment 5 Daniel Stone 2016-08-16 11:58:34 UTC
(In reply to Tomek Bury from comment #2)
> As far as I can tell several 3D drivers put read-write synchronisation in
> user space so it works across contexts within one process but not across
> processes. Nokia had this problem, Boroadcom driver is another example. 

Nokia didn't have this problem in PowerVR; PVR has had kernel-side sync objects, portable between contexts, for a long time. And the ability to use them as fences for submission.

> It's not impossible to implement but the only possible implementation that
> doesn't involve blocking CPU in the compositor I can think of is rather
> ugly: create a parallel synchronisation mechanism and deal with 2 of them:
> regular one in user pace for most things and Wayland special in kernel with
> an extra kernel round-trip for each read from or write to wl_buffer.

I'm not sure what you mean here. If you can build these things into the kernel, then why would you not just build implicit fencing in the first place? Conceptually it's no leap at all from what Android and ChromeOS already require today.
Comment 6 Tomek Bury 2016-08-16 16:13:14 UTC
I meant Sailfish OS compositor and Jolla (ex. Nokia) - they did have this problem. Sorry for the confusion.

Now, bear in mind that the "implicit fencing" requirement is an implicit requirement itself. It's never spelled out amongst Wayland requirements AFAIK. EGL/GLES2, shared buffers and a couple of extensions are mentioned but implicit sync is not. Please point me to this requirement if it's written somewhere. If it's not, then perhaps this should be a bug report.

As for building implicit sync into driver - it's rather expensive, risky and utterly pointless to significantly change the architecture and rewrite core parts of a working driver just to have implicit cross-process synchronisation on every platform when Wayland is the only platform that assumes it. Android requires EXPLICIT sync and that's a completely different beast.
Comment 7 Tomek Bury 2016-08-16 16:51:28 UTC
(In reply to Kristian Høgsberg from comment #4)
> For driver architectures that require userspace synchronization, you would
> define your own buffer sharing interface and pass along fences for each
> shared buffer.

I'd love to but there are no hook points available or the return path (compositor to client). 

Handing a buffer from client to compositor is easy because client calls eglSwapBuffers(). The implementation of eglSwapBuffers() can take care of client-to-compositor synchronisation and make sure that compositor doesn't read the buffer before client has finished writing.

The hard part is the compositor-to-client. Compositor implementation doesn't call any GLES or EGL function to inform the driver that it's about to release the shared buffer. How is the driver supposed to know that compositor is releasing a buffer? How the client is supposed to wait for compositor before overwriting an already released buffer? Where do you propose to create in the compositor-side fence?
Comment 8 Tomek Bury 2016-08-16 16:51:56 UTC
(In reply to Kristian Høgsberg from comment #4)
> For driver architectures that require userspace synchronization, you would
> define your own buffer sharing interface and pass along fences for each
> shared buffer.

I'd love to but there are no hook points available or the return path (compositor to client). 

Handing a buffer from client to compositor is easy because client calls eglSwapBuffers(). The implementation of eglSwapBuffers() can take care of client-to-compositor synchronisation and make sure that compositor doesn't read the buffer before client has finished writing.

The hard part is the compositor-to-client. Compositor implementation doesn't call any GLES or EGL function to inform the driver that it's about to release the shared buffer. How is the driver supposed to know that compositor is releasing a buffer? How the client is supposed to wait for compositor before overwriting an already released buffer? Where do you propose to create in the compositor-side fence?
Comment 9 Tomek Bury 2016-08-16 17:01:52 UTC
(In reply to Tomek Bury from comment #8)
> (In reply to Kristian Høgsberg from comment #4)

Not sure why it got posted twice (and both times with a typo).
I meant: Where do you propose to create the compositor-side fence? I.e. fence that marks the end of compositor-side reads from the buffer.
Comment 10 Kristian Høgsberg 2016-08-16 17:14:56 UTC
(In reply to Tomek Bury from comment #9)
> (In reply to Tomek Bury from comment #8)
> > (In reply to Kristian Høgsberg from comment #4)
> 
> Not sure why it got posted twice (and both times with a typo).
> I meant: Where do you propose to create the compositor-side fence? I.e.
> fence that marks the end of compositor-side reads from the buffer.

Again, this is implicit (and it really should've been documented better, sorry), but the idea is that the compositor communicates its intention to texture from the wl_buffer by creating the EGLImage for the wl_buffer. Once it's done, it destroys the EGLImage. So once the EGLImage is created you have to wait on the client-rendering-finished fence before first use, Once the EGLImage is destroyed you signal the compositor-texturing-finished fence.
Comment 11 Tomek Bury 2016-08-16 17:40:13 UTC
(In reply to Kristian Høgsberg from comment #10)
> Again, this is implicit (and it really should've been documented better,
> sorry), but the idea is that the compositor communicates its intention to
> texture from the wl_buffer by creating the EGLImage for the wl_buffer. Once
> it's done, it destroys the EGLImage. So once the EGLImage is created you
> have to wait on the client-rendering-finished fence before first use, Once
> the EGLImage is destroyed you signal the compositor-texturing-finished fence.

That's not the case, at least in the latest compositor. Now compositor keeps EGL images for the lifetime of a client and assumes that writes from client and reads from the compositor will be implicitly interlocked, the eglImageCreate() and eglImageDestroy() happens only once per buffer and can't be a driver hook point to create fences.
Comment 12 Kristian Høgsberg 2016-08-16 17:53:54 UTC
(In reply to Tomek Bury from comment #11)
> (In reply to Kristian Høgsberg from comment #10)
> > Again, this is implicit (and it really should've been documented better,
> > sorry), but the idea is that the compositor communicates its intention to
> > texture from the wl_buffer by creating the EGLImage for the wl_buffer. Once
> > it's done, it destroys the EGLImage. So once the EGLImage is created you
> > have to wait on the client-rendering-finished fence before first use, Once
> > the EGLImage is destroyed you signal the compositor-texturing-finished fence.
> 
> That's not the case, at least in the latest compositor. Now compositor keeps
> EGL images for the lifetime of a client and assumes that writes from client
> and reads from the compositor will be implicitly interlocked, the
> eglImageCreate() and eglImageDestroy() happens only once per buffer and
> can't be a driver hook point to create fences.

Which compositor is this? From a quick look at weston, it looks like it unrefs the EGLImages when a new wl_buffer is attached, but maybe there's some subtlety in the ref-counting there.
Comment 13 Daniel Stone 2016-08-16 18:34:36 UTC
(In reply to Kristian Høgsberg from comment #12)
> (In reply to Tomek Bury from comment #11)
> > That's not the case, at least in the latest compositor. Now compositor keeps
> > EGL images for the lifetime of a client and assumes that writes from client
> > and reads from the compositor will be implicitly interlocked, the
> > eglImageCreate() and eglImageDestroy() happens only once per buffer and
> > can't be a driver hook point to create fences.
> 
> Which compositor is this? From a quick look at weston, it looks like it
> unrefs the EGLImages when a new wl_buffer is attached, but maybe there's
> some subtlety in the ref-counting there.

I'd equally consider any compositor which doesn't do this to be broken. Weston to the best of my knowledge (and a quick check) does destroy and recreate.

You're very right that this should be documented better. I'm not sure if the Khronos specs are the best place, or a document in the Wayland repository itself. Can we take this bug as one request for explicit fencing support (being actively pursued), and another one to document the EGL platform requirements for both driver implementations and compositors?
Comment 14 Tomek Bury 2016-08-16 19:18:29 UTC
(In reply to Daniel Stone from comment #13)
> (In reply to Kristian Høgsberg from comment #12)
> > Which compositor is this?

QtWayland as of Qt 5.7.

Back in the Qt 5.4 days (ish) that compositor was actually creating and destroying EGL images every frame, but that days are gone. The latest and greatest creates EGL image once and reuses it for the lifetime of a client.

> > unrefs the EGLImages when a new wl_buffer is attached, but maybe there's
> > some subtlety in the ref-counting there.

I don't know, although Qt implementation also uses ref counting to decide when to end a wl_buffer_release.

> I'd equally consider any compositor which doesn't do this to be broken.
> Weston to the best of my knowledge (and a quick check) does destroy and
> recreate.

Again, I don't know. I didn't see anywhere such requirement or promise that EGL image wrapping wl_buffer is guaranteed to be destroyed before buffer release. If that was the case "release" would be a responsibility of a platform implementation, the same way the "attach" is. To my untrained eye wl_buffer_release() and eglImageDestroy() over 2 independent and asynchronous channels looks like asking for trou^H^H^H^H race condition.

> You're very right that this should be documented better. I'm not sure if the
> Khronos specs are the best place, or a document in the Wayland repository
> itself. Can we take this bug as one request for explicit fencing support
> (being actively pursued), and another one to document the EGL platform
> requirements for both driver implementations and compositors?

Thanks. Would you like me to create 2 dependent Mozilla tickets?
Comment 15 Tomek Bury 2016-08-16 19:31:04 UTC
With regards to documenting requirements: EGL spec explicitly limits sharing and synchronisation to a single "address space" so Wayland requirement for cross-process synchronisation needs an extension if you want to go with Khrons.
Comment 16 Daniel Stone 2016-08-16 19:56:20 UTC
(In reply to Tomek Bury from comment #14)
> (In reply to Daniel Stone from comment #13)
> > (In reply to Kristian Høgsberg from comment #12)
> > > Which compositor is this?
> 
> QtWayland as of Qt 5.7.

OK, that's demonstrably broken, and I believe on some real-world drivers as well. Regardless of whether or not it's documented, it just won't work ...

> > I'd equally consider any compositor which doesn't do this to be broken.
> > Weston to the best of my knowledge (and a quick check) does destroy and
> > recreate.
> 
> Again, I don't know. I didn't see anywhere such requirement or promise that
> EGL image wrapping wl_buffer is guaranteed to be destroyed before buffer
> release. If that was the case "release" would be a responsibility of a
> platform implementation, the same way the "attach" is. To my untrained eye
> wl_buffer_release() and eglImageDestroy() over 2 independent and
> asynchronous channels looks like asking for trou^H^H^H^H race condition.

The wl_display is one communication channel, and EGL implementations can specify their own extensions to communicate over that. wl_drm is one, mali_buffer_sharing is another; anyone could create their own extension which posted a fence event back when an EGLImage created from a wl_buffer was destroyed. This would be guaranteed to be delivered in order.

> > You're very right that this should be documented better. I'm not sure if the
> > Khronos specs are the best place, or a document in the Wayland repository
> > itself. Can we take this bug as one request for explicit fencing support
> > (being actively pursued), and another one to document the EGL platform
> > requirements for both driver implementations and compositors?
> 
> Thanks. Would you like me to create 2 dependent Mozilla tickets?

Sure!

(In reply to Tomek Bury from comment #15)
> With regards to documenting requirements: EGL spec explicitly limits sharing
> and synchronisation to a single "address space" so Wayland requirement for
> cross-process synchronisation needs an extension if you want to go with
> Khrons.

It doesn't give you standardised cross-process primitives, but EGL platforms and winsys mostly require working across process boundaries. It's entirely legitimate to place requirements on how they operate, without having to specify an entire generic mechanism for external fencing.
Comment 17 Tomek Bury 2016-08-17 10:02:13 UTC
(In reply to Daniel Stone from comment #16)
> It doesn't give you standardised cross-process primitives, but EGL platforms
> and winsys mostly require working across process boundaries. 

Definitely not EGL:

"2.3 Direct Rendering and Address Spaces

EGL is assumed to support only direct rendering, unlike similar APIs such as GLX. EGL objects and related context state cannot be used outside of the address space in which they are created. In a single-threaded environment, each process has its own address space. In a multi-threaded environment, all threads may share the same virtual address space; however, this capability is not required, and imple- mentations may choose to restrict their address space to be per-thread even in an environment supporting multiple application threads.

Context state, including both the client and server state of OpenGL and OpenGL ES contexts, exists in the client’s address space; this state cannot be shared by a client in another process.

Support of indirect rendering (in those environments where this concept makes sense) may have the effect of relaxing these limits on sharing. However, such support is beyond the scope of this document."

> It's entirely
> legitimate to place requirements on how they operate, without having to
> specify an entire generic mechanism for external fencing.

I fully agree. Perhaps sync requirement should do here:

https://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_platform_wayland.txt
https://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_platform_wayland.txt

and here:

https://cgit.freedesktop.org/mesa/mesa/tree/docs/specs/WL_bind_wayland_display.spec

(BTW. shouldn't this be submitted to Khronos?)
Comment 18 Tomek Bury 2016-08-17 11:31:49 UTC
(In reply to Daniel Stone from comment #16)
> (In reply to Tomek Bury from comment #14)
> > Thanks. Would you like me to create 2 dependent Bugzilla tickets?
> 
> Sure!

Done, see bug 97379 and bug 97380.
I didn't add any dependencies between this report and the new ones, but feel free to link them one way or another if that works better for you.
Comment 19 Daniel Stone 2016-08-17 14:02:48 UTC
(In reply to Tomek Bury from comment #17)
> (In reply to Daniel Stone from comment #16)
> > It doesn't give you standardised cross-process primitives, but EGL platforms
> > and winsys mostly require working across process boundaries. 
> 
> Definitely not EGL:
> 
> "2.3 Direct Rendering and Address Spaces
> 
> EGL is assumed to support only direct rendering, unlike similar APIs such as
> GLX. EGL objects and related context state cannot be used outside of the
> address space in which they are created. In a single-threaded environment,
> each process has its own address space. In a multi-threaded environment, all
> threads may share the same virtual address space; however, this capability
> is not required, and imple- mentations may choose to restrict their address
> space to be per-thread even in an environment supporting multiple
> application threads.
> 
> Context state, including both the client and server state of OpenGL and
> OpenGL ES contexts, exists in the client’s address space; this state cannot
> be shared by a client in another process.
> 
> Support of indirect rendering (in those environments where this concept
> makes sense) may have the effect of relaxing these limits on sharing.
> However, such support is beyond the scope of this document."

Right, indirect rendering is something else entirely: where the state and commands are serialised over the wire to be executed by another process. Buffer exchange between multiple processes, and synchronisation between them (cf. eglWaitNative), is an explicit goal of EGL.

> > It's entirely
> > legitimate to place requirements on how they operate, without having to
> > specify an entire generic mechanism for external fencing.
> 
> I fully agree. Perhaps sync requirement should do here:
> 
> https://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_platform_wayland.
> txt
> https://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_platform_wayland.
> txt
> 
> and here:
> 
> https://cgit.freedesktop.org/mesa/mesa/tree/docs/specs/
> WL_bind_wayland_display.spec
> 
> (BTW. shouldn't this be submitted to Khronos?)

I don't know if those specs are the right place to do it, for various reasons - mainly that there's not much precedent. I'm trying to work this out over the next day or two though.

Thanks a lot for bringing this up, and for filing the additional bugs - I'll close this one out and we can chase the other two up. The documentation will move relatively quickly I expect, but the explicit fencing work will probably not progress much until XDC, at the end of September.
Comment 20 Tomek Bury 2016-08-17 14:41:30 UTC
(In reply to Daniel Stone from comment #19)
> Buffer exchange between multiple processes, and synchronisation between them
> (cf. eglWaitNative), is an explicit goal of EGL.

No, it isn't. The single "address space" limit is all over EGL spec. 

The eglWaitNative is a counterpart of eglWaitGL. It's only to synchronise access form inside and outside GL(ES), not across processes.
Comment 21 Daniel Stone 2016-08-17 16:25:07 UTC
(In reply to Tomek Bury from comment #20)
> (In reply to Daniel Stone from comment #19)
> > Buffer exchange between multiple processes, and synchronisation between them
> > (cf. eglWaitNative), is an explicit goal of EGL.
> 
> No, it isn't. The single "address space" limit is all over EGL spec. 
> 
> The eglWaitNative is a counterpart of eglWaitGL. It's only to synchronise
> access form inside and outside GL(ES), not across processes.

EGL was mostly written by copying GLX and changing a few terms around. eglWaitNative is the direct analog of glXWaitX (the documentation of which explicitly mentions XSync), and the required implementation of eglWaitNative on X11 is glXWaitX, which necessarily involves crossing a process boundary.

EGL objects - as visible to clients - are not shareable across process contexts. But this does not mean that EGL calls are required to never do anything which could cross a process boundary, because this means that EGL could never work on Wayland, X11, Mir, Android, OS X / iOS, or Windows.

Implementing support for any winsys apart from fbdev or GBM necessarily imposes requirements on cross-process co-ordination; this includes exporting synchronisation objects between processes. If EGL banned cross-process fencing, then glFinish would be mandatory as a part of eglSwapBuffers. This is clearly not the case.
Comment 22 Daniel Stone 2016-08-17 16:26:54 UTC
(In reply to Tomek Bury from comment #20)
> The eglWaitNative is a counterpart of eglWaitGL. It's only to synchronise
> access form inside and outside GL(ES), not across processes.

Specifically, this is wrong. eglWaitClient is glXWaitGL (note the reference to glFinish); eglWaitNative is glXWaitX.
Comment 23 Tomek Bury 2016-08-17 17:14:17 UTC
(In reply to Daniel Stone from comment #21)
> EGL was mostly written by copying GLX and changing a few terms around.
> eglWaitNative is the direct analog of glXWaitX (the documentation of which
> explicitly mentions XSync), and the required implementation of eglWaitNative
> on X11 is glXWaitX, which necessarily involves crossing a process boundary.
A spec that EGL borrows from mentioned XSync therfore EGL mandates cross-process interoperability? Wow, that's a hell of a stretch :D

> EGL objects - as visible to clients - are not shareable across process
> contexts. But this does not mean that EGL calls are required to never do
> anything which could cross a process boundary [...]
I never said that. I'm only saying that cross-process stuff is explicitly outside of EGL scope. As far as core EGL is concerned everything is inside one process. Wayland needs more therefore Wayland needs extension(s) to expose any cross-process functionality to callers.


> Implementing support for any winsys apart from fbdev or GBM necessarily
> imposes requirements on cross-process co-ordination; this includes exporting
> synchronisation objects between processes. 
Yes.

> If EGL banned cross-process fencing [...]
But it doesn't ban cross-process and it doesn't mandate cross-process, it only says "no cross-process out of the box, wirte extension or look elsewhere if you need it".
Comment 24 Tomek Bury 2016-08-17 17:24:33 UTC
(In reply to Daniel Stone from comment #22)
> (In reply to Tomek Bury from comment #20)
> > The eglWaitNative is a counterpart of eglWaitGL. It's only to synchronise
> > access form inside and outside GL(ES), not across processes.
> 
> Specifically, this is wrong. eglWaitClient is glXWaitGL (note the reference
> to glFinish); eglWaitNative is glXWaitX.
Yes, egl is more generic version of glX here. Still nothing says cross-process. It only says GL and X draw commands, even if issued in a sequence from a single thread aren't guaranteed to execute in that sequence, therefore explicit synchronisation points are required. Also neither glX calls nor their EGL counterpart imply blocking the calling thread, while glFinish does.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.