Bug 75303

Summary:	Protocol: wl_buffer.release is racy
Product:	Wayland	Reporter:	Pekka Paalanen <ppaalanen>
Component:	wayland	Assignee:	Wayland bug list <wayland-bugs>
Status:	RESOLVED MOVED	QA Contact:
Severity:	normal
Priority:	medium	CC:	jadahl, reveman
Version:	unspecified
Hardware:	All
OS:	All
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	83092

Description Pekka Paalanen 2014-02-21 06:59:37 UTC

Consider a client doing this:
1. surface1.attach(buffer1)
2. surface1.commit
3. surface2.attach(buffer1)
4. surface2.commit

Then the client receives buffer1.release.

It is ambiguous, whether the release corresponds to step 2 or step 4. That is, the client cannot know if the latter commit still holds the buffer reserved in the server. The server may have had time to process and release the buffer before step 4, in which case the buffer would be reserved again after step 4.

The problem is the same also, if surface1 and surface2 were both just surface1. If the compositor has time to repaint in between the commits, the client may get a release it most likely interprets as the release corresponding to step 4, which would be wrong.

How should this be resolved?

A suggestion from Jason Ekstrand was to modify the protocol to guarantee one release event for each commit that makes the buffer reserved. This would allow clients to use simple reference counting.

Another possibility would be to require, that clients must not commit a wl_buffer that is already reserved by the server. However, this seems quite limiting, especially with the coming presentation extension that allows queueing an arbitrary number of updates which could reference the same buffer more than once (for e.g. simple cyclic animations like spinners or cursors).

Comment 1 Jason Ekstrand 2014-02-28 17:15:31 UTC

Pekka already noted my suggestion of reference counting.  However, this raises the issue of compatibility between clients and servers.  In particular, if clients start to implement buffer reference counting, then we need some sort of a guarantee by the server that it will aid in the reference counting.  Otherwise, in the scenario Pekka describes, the client will wait for two releases and may only get one
It's worth noting that the usual case of waiting for wl_buffer.release before attaching again is the same regardless of reference counting because it means the server has at most one reference.  The issue comes in the other cases where the server may have more references.

One option to solve this would be to add a comment to wl_surface.attach and bump the wl_surface version.  Technically, one would like this to be part of wl_buffer.  However, wl_buffer is typically implemented by either libwayland's SHM implementation or the system EGL implementation and those never touch wl_buffer.release except for EGL receiving client-side.  There is a slight issue with EGL not knowing the wl_surface version, however EGL usually has one buffer pool per surface and commits one buffer at a time.  I think we'll probably be ok here.

Comment 2 Kristian Høgsberg 2014-04-04 21:47:57 UTC

(In reply to comment #0)
> Consider a client doing this:
> 1. surface1.attach(buffer1)
> 2. surface1.commit
> 3. surface2.attach(buffer1)
> 4. surface2.commit
> 
> Then the client receives buffer1.release.
> 
> It is ambiguous, whether the release corresponds to step 2 or step 4. That
> is, the client cannot know if the latter commit still holds the buffer
> reserved in the server. The server may have had time to process and release
> the buffer before step 4, in which case the buffer would be reserved again
> after step 4.
> 
> The problem is the same also, if surface1 and surface2 were both just
> surface1. If the compositor has time to repaint in between the commits, the
> client may get a release it most likely interprets as the release
> corresponding to step 4, which would be wrong.
> 
> How should this be resolved?
> 
> A suggestion from Jason Ekstrand was to modify the protocol to guarantee one
> release event for each commit that makes the buffer reserved. This would
> allow clients to use simple reference counting.

We do that now.  Every attach+commit is followed by a release.  If a client attaches a buffer to two surfaces, it needs to receive two release events before it can use the buffer again without causing artifacts.

Comment 3 Pekka Paalanen 2014-04-07 06:14:07 UTC

(In reply to comment #2)
> We do that now.  Every attach+commit is followed by a release.  If a client
> attaches a buffer to two surfaces, it needs to receive two release events
> before it can use the buffer again without causing artifacts.

No, I do not think the code in Weston works like that at all.

We have exactly one struct weston_buffer per each wl_buffer protocol object, right? Struct weston_buffer has one reference counter. When that reference counter hits zero, wl_buffer.release event is sent.

Struct weston_surface and others use struct weston_buffer_reference to hold several references to various wl_buffers (weston_buffers). We do not have per-surface reference counters, and weston_buffer_reference() function makes no difference between different struct weston_surfaces.

The function weston_buffer_from_resource() does not create a new struct weston_buffer *every* time it is called; it checks if the destroy listener on the wl_buffer already exists, and if it does, it returns the existing weston_buffer. So to me it looks like we never have more than one weston_buffer (and a reference counter) per wl_buffer.

Therefore I cannot see how attaching the same wl_buffer to two wl_surfaces could generate more than one wl_buffer.release, unless the compositor releases before processing the second attach+commit, which would imply at least a compositor repaint in between.

The Presentation extension will add another case. A client may queue the same wl_buffer multiple times to the same wl_surface. That is not merged yet, but it's a use case that should to be kept in mind.

At the very least, if you can show that Weston actually follows the every attach+commit is eventually replied with a wl_buffer.release scheme, then we still need to fix the protocol specification in wayland.xml to explicitly require this. The current protocol specification does not require clients to count the commits and releases but implies that the server handles all that.

Comment 4 Bill Spitzak 2015-09-17 01:24:46 UTC

Could the buffer release include some kind of serial number indicating which attach was the last one done to the buffer?

Comment 5 Pekka Paalanen 2015-09-17 06:10:15 UTC

(In reply to Bill Spitzak from comment #4)
> Could the buffer release include some kind of serial number indicating which
> attach was the last one done to the buffer?

We cannot change the signature of the release event, and we cannot add another event to wl_buffer because wl_buffer has several unrelated factories, which makes extending wl_buffer interface in particular impossible. Using serials would also require extending wl_surface.attach somehow carry a serial in the first place.

Client side reference counting, that is, for every committed attach there will be a release, seems like the preferred solution to me at the moment.

I've forgot the details related to the possibility of implementing wl_proxy_get_version(), but from what I recall, it should be perfectly doable since we already have runtime object version inheritance in libwayland-client (don't we?).

Comment 6 Jonas Ådahl 2015-09-17 06:57:38 UTC

Why can't we just let wl_buffer.release simply be a notification about the buffer being reusable, meaning,

After a client commits a surface with a wl_buffer@1 attached, it should be consider busy until it receives the release event. If the client attaches the same wl_buffer@1 to multiple surfaces, its still up to the server to let the client know when the buffer is no longer used, and may be reused by the client.

I.e.

1. surface1.attach(buffer1)
2. surface1.commit  (client should consider buffer1 busy until further notice)
3. surface2.attach(buffer1)
4. surface2.commit  (this deosn't change the buffer1 busy state)

..would result in a single buffer1.release() event being emitted whenever the server doesn't use buffer1 for neither surfaces. This is how I'd interpret the current wording in the specification as well; being more or less orthogonal to commits and frame callbacks.

Or is there something I'm missing? Why would it be important to get release events per commits?

Comment 7 Pekka Paalanen 2015-09-17 10:53:13 UTC

(In reply to Jonas Ådahl from comment #6)
> Why can't we just let wl_buffer.release simply be a notification about the
> buffer being reusable

Because it has exactly the race explained in the description of this bug. This example applies particular to Weston with gl-renderer and a client using wl_shm.

Case A:

client                messagebuffer              server

surface1.attach(B) ->
surface1.commit    ->
                                        -> surface1.attach(B)
                                        -> surface1.commit
surface2.attach(B) ->
surface2.commit    ->
                                        -> surface2.attach(B)
                                        -> surface2.commit
                                           repaint, GL texture damage flush
                                        <- B.release
B.release          <-

B is free.


Case B:

surface1.attach(B) ->
surface1.commit    ->
                                        -> surface1.attach(B)
                                        -> surface1.commit
                                           repaint, GL texture damage flush
                                        <- B.release
surface2.attach(B) ->
surface2.commit    ->
                                        -> surface2.attach(B)
                                        -> surface2.commit
B.release          <-

B is busy on surface2.


From the client point of view, cases A and B are equivalent, yet the buffer is free in one and busy in the other.

Comment 8 Jonas Ådahl 2015-09-17 12:06:24 UTC

Ok, yes I see it now.(In reply to Pekka Paalanen from comment #7)
> (In reply to Jonas Ådahl from comment #6)
> > Why can't we just let wl_buffer.release simply be a notification about the
> > buffer being reusable
> 
> Because it has exactly the race explained in the description of this bug.
> This example applies particular to Weston with gl-renderer and a client
> using wl_shm.

Yes, I see it now. Thanks for the more detailed explanation.

(In reply to Pekka Paalanen from comment #5)
> Client side reference counting, that is, for every committed attach there
> will be a release, seems like the preferred solution to me at the moment.

How do you expect the client to get the new semantics (one release per commit) or do you mean to change it for everyone?

> 
> I've forgot the details related to the possibility of implementing
> wl_proxy_get_version(), but from what I recall, it should be perfectly
> doable since we already have runtime object version inheritance in
> libwayland-client (don't we?).

I don't think we do. The client needs to track this information itself. The client just passes the version to the wl_registry.bind request and the server uses that to adapt what it sends etc. We can't add it to wl_proxy because sometimes (wl_buffer, wl_sync), the wl_proxy that is created does not inherit the version of the parent proxy. Digging some, I found one field name "version" (in wl_interface), but that is just the version that was in the XML file, hard coded in the generated -protocol.c file.

Comment 9 Pekka Paalanen 2015-09-17 12:32:01 UTC

(In reply to Jonas Ådahl from comment #8)
> Ok, yes I see it now.(In reply to Pekka Paalanen from comment #7)
> (In reply to Pekka Paalanen from comment #5)
> > Client side reference counting, that is, for every committed attach there
> > will be a release, seems like the preferred solution to me at the moment.
> 
> How do you expect the client to get the new semantics (one release per
> commit) or do you mean to change it for everyone?

For everyone, yes.

I would hope that attaching the same wl_buffer to multiple surfaces is rare enough that we can get away with it. Committing the same wl_buffer to a surface before waiting for a release or a frame callback is I hope equally rare. (If you wait for frame callback, you're probably going to draw, so using a busy buffer is a mistake to begin with. If you wait for a release, there is no race.)

Since you cannot get the wl_buffer out of EGL, EGL should never hit a case where the client-side reference count would be more than one.

Also judging by krh's comment, it was probably expected to work in refcount manner to begin with, we (probably just me) just screwed up the spec and Weston.

Bumping the wl_surface version for this is IMO a backup plan, in case we can't just change the semantics for everyone.

For now, this is a quite theoretical race, which is why I think we can solve this now. With Presentation queueing that would change, so this must be solved before queueing can get further.

Comment 10 Jonas Ådahl 2015-09-17 12:43:34 UTC

(In reply to Pekka Paalanen from comment #9)
> (In reply to Jonas Ådahl from comment #8)
> > Ok, yes I see it now.(In reply to Pekka Paalanen from comment #7)
> > (In reply to Pekka Paalanen from comment #5)
> > > Client side reference counting, that is, for every committed attach there
> > > will be a release, seems like the preferred solution to me at the moment.
> > 
> > How do you expect the client to get the new semantics (one release per
> > commit) or do you mean to change it for everyone?
> 
> For everyone, yes.
> 
> I would hope that attaching the same wl_buffer to multiple surfaces is rare
> enough that we can get away with it. Committing the same wl_buffer to a
> surface before waiting for a release or a frame callback is I hope equally
> rare. (If you wait for frame callback, you're probably going to draw, so
> using a busy buffer is a mistake to begin with. If you wait for a release,
> there is no race.)
> 
> Since you cannot get the wl_buffer out of EGL, EGL should never hit a case
> where the client-side reference count would be more than one.
> 
> Also judging by krh's comment, it was probably expected to work in refcount
> manner to begin with, we (probably just me) just screwed up the spec and
> Weston.
> 
> Bumping the wl_surface version for this is IMO a backup plan, in case we
> can't just change the semantics for everyone.
> 
> For now, this is a quite theoretical race, which is why I think we can solve
> this now. With Presentation queueing that would change, so this must be
> solved before queueing can get further.

Reading between the lines in the description, one could assume only one surface was considered, since it only talks about that scenario, so I agree it should be a reasonable change to do.

The only issue would be that it's hard for the client (especially EGL) to know when it can rely on the multi-surface case to work properly, especially if it's not possible to get the version of the bound (wl_surface) proxy (or the newest version supported by the server). I experimented once with implementing a "wl_version" interface to solve a similar issue (request version information from the server given an existing object) without needing to change any client API, but feels a bit hacky to need to do such a roundtrip.

Comment 11 Pekka Paalanen 2015-09-17 12:58:40 UTC

(In reply to Jonas Ådahl from comment #10)
> The only issue would be that it's hard for the client (especially EGL) to
> know when it can rely on the multi-surface case to work properly, especially
> if it's not possible to get the version of the bound (wl_surface) proxy (or
> the newest version supported by the server).

EGL should never hit this. App cannot get the wl_buffer used by EGL, and EGL won't push the same wl_buffer to multiple surfaces or without waiting for release.

How to know that the refcounting actually works, that's an important question. For EGL it should be irrelevant, but other libraries might perhaps care. Looking at the wl_proxy_get_version() issue would be well worth it. FWIW, I don't think we have any case where inheritance wouldn't work. For things like wl_callback that can never be bumped beyond version 1 it doesn't really matter if get_version() returns 5. It's still larger than 1.

Comment 12 Daniel Stone 2018-06-04 08:24:28 UTC

At this point, I think it would be more simple to document it as a limitation that attaching the same wl_buffer to two wl_surfaces has inherent race conditions and is thus a bad idea, and move on with our lives.

EGL clients can't ever access the buffer. dmabuf clients can just create a new buffer referring to the same storage and do refcounting client side; ditto SHM clients. So do we have a good usecase for multi-attach with a hard limit of one buffer object?

Comment 13 Pekka Paalanen 2018-06-04 10:46:41 UTC

That's actually a good point that even wl_shm clients can simply create multiple wl_buffers for the same memory.

Yeah, I'm fine with just documenting the current limitation and not solving the issue by complicating protocol.

Comment 14 Jonas Ådahl 2018-06-04 11:31:51 UTC

FWIW, mutter has implemented the one-release-per-attach semantics, which I've read as the correct way. Is it now meant to be illegal to have a single buffer attached at multiple surfaces?

Comment 15 Pekka Paalanen 2018-06-04 11:45:38 UTC

I think the proposal is to make it undefined.

You mean one release per attach+commit in mutter, right?

So Weston and Mutter implement the opposite behaviours. This means that if a client ever depended on either behaviour, it would malfunction on the other.

Comment 16 Pekka Paalanen 2018-06-04 11:49:55 UTC

Ahumm... wl_cursor_image_get_buffer() always returns the same wl_buffer. Not sure there is a way in the API to get separate wl_buffers, and multiple wl_surfaces for the same image is very possible by having multiple wl_seats.

Comment 17 Daniel Stone 2018-06-04 12:15:24 UTC

(In reply to Pekka Paalanen from comment #15)
> I think the proposal is to make it undefined.

That's my suggestion.

(In reply to Pekka Paalanen from comment #16)
> Ahumm... wl_cursor_image_get_buffer() always returns the same wl_buffer. Not
> sure there is a way in the API to get separate wl_buffers, and multiple
> wl_surfaces for the same image is very possible by having multiple wl_seats.

Ouch. We should definitely fix that.

Comment 18 GitLab Migration User 2018-06-08 23:50:27 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/wayland/wayland/issues/46.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.