91750 – Wayland may have interface bug when multi-threads programing.

Bug 91750 - Wayland may have interface bug when multi-threads programing.

Summary: Wayland may have interface bug when multi-threads programing.

Status:	RESOLVED DUPLICATE of bug 91273

Alias:	None

Product:	Wayland
Classification:	Unclassified
Component:	wayland (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	medium critical
Assignee:	Wayland bug list
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-08-25 07:01 UTC by Yujie Shen
Modified:	2015-08-27 06:38 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:
i915 features:

Attachments
wayland multi-threads interface bug (36.00 KB, text/plain) 2015-08-25 07:01 UTC, Yujie Shen	Details
View All

Description Yujie Shen 2015-08-25 07:01:07 UTC

Created attachment 117908 [details]
wayland multi-threads interface bug

Wayland may have interface bug when multi-threads programming.
This is a very serious bug in every version of wayland. 
It only occurs when multi-threaded programming,but it may cause an "S" level Error(screen freeze or program crash).
Briefly,a wl_proxy's property is still being modifying in a thread,while another thread may have sent it to server.Namely,Any modification of a generated callback wl_proxy may be ineffective,as other thread may have read and dispatched its corresponding events.
I have a detailed description of the problem and my solution in the attachment.
Please refer the attachment for detail.
If you have any question with the bug,pls feel free to contact me.
syjman@gmail.com

Comment 1 Yujie Shen 2015-08-25 07:08:00 UTC

The attachment "wayland multi-threads interface bug" is a excel file.
It can not be displayed correctly by click the link.
Please save the attachment link to to your local computer and check it.

Comment 2 Jonas Ådahl 2015-08-25 07:18:19 UTC

Hi and thanks for the report. This seems to be a duplicate of bug 91273, so closing this one. The problem can be worked around by using the current API to block dispatching on other threads (see the last paragraph of (omment 2 of bug 91273 <https://bugs.freedesktop.org/show_bug.cgi?id=91273#c2>. You will need this until we decide what API we want to introduce fix the issue properly.

*** This bug has been marked as a duplicate of bug 91273 ***

Comment 3 Yujie Shen 2015-08-25 07:57:36 UTC

(In reply to Jonas Ådahl from comment #2)
> Hi and thanks for the report. This seems to be a duplicate of bug 91273, so
> closing this one. The problem can be worked around by using the current API
> to block dispatching on other threads (see the last paragraph of (omment 2
> of bug 91273 <https://bugs.freedesktop.org/show_bug.cgi?id=91273#c2>. You
> will need this until we decide what API we want to introduce fix the issue
> properly.
> 
> *** This bug has been marked as a duplicate of bug 91273 ***

Thank you.
But,this bug is totally different from bug 91273.
Bug 91273 reports a bug that 2 threads polling on same display fd in both threads.
My bug,namely Bug 91750,reports a bug that a wl_proxy's property is still being modifying in a thread,while another thread may have sent it to server.
Please kindly check the attachment"wayland multi-threads interface bug"again.

Here is a piece of code to reappear the bug.

int
main(int argc, char **argv)
{
..
       //sub-thread
	pthread_t id;
	pthread_create(&id, NULL, wl_display_roundtrip, &wl_display);
...
	//main_thread
	while (1) {
        ...
        wl_surface_attach();
	window->callback = wl_surface_frame(window->surface);
//surface_frame.callback event may have been sent and dispatched by wl_display_roundtrip in sub-thread,before wl_proxy_set_queue in main_thread. 
	wl_proxy_set_queue(window->callback,queue2);
	wl_callback_add_listener(window->callback, &frame_listener, window);
	wl_surface_commit(window->surface);
	}
...
	return 0;
}

Comment 4 Jonas Ådahl 2015-08-25 08:16:56 UTC

(In reply to Yujie Shen from comment #3)
> (In reply to Jonas Ådahl from comment #2)
> > Hi and thanks for the report. This seems to be a duplicate of bug 91273, so
> > closing this one. The problem can be worked around by using the current API
> > to block dispatching on other threads (see the last paragraph of (omment 2
> > of bug 91273 <https://bugs.freedesktop.org/show_bug.cgi?id=91273#c2>. You
> > will need this until we decide what API we want to introduce fix the issue
> > properly.
> > 
> > *** This bug has been marked as a duplicate of bug 91273 ***
> 
> Thank you.
> But,this bug is totally different from bug 91273.
> Bug 91273 reports a bug that 2 threads polling on same display fd in both
> threads.
> My bug,namely Bug 91750,reports a bug that a wl_proxy's property is still
> being modifying in a thread,while another thread may have sent it to server.
> Please kindly check the attachment"wayland multi-threads interface bug"again.

I still think this is the same issue, just that wl_display_roundtrip_queue() exposes it. What should happen is that wl_display_roundtrip_queue constructs and sets correct queue atomically, and because it fails to do this, the race you describe happens (the proxy ending up on the default queue, being accessed from the wrong thread).

The current work-around you can do is, with the code below as base.

...
	wl_surface_attach();
	wl_display_prepare_read(display);
	window->callback = wl_surface_frame(window->surface);
	/* surface_frame.callback event will not be queued nor dispatched,
         * at this point, because the "prepare_read" call above blocks other
	 * threads from it.because
	 */
	wl_proxy_set_queue(window->callback,queue2);
	wl_callback_add_listener(window->callback, &frame_listener, window);
	wl_display_cancel_read(display); /* This will effectively unblock other dispatching threads. */
	wl_surface_commit(window->surface);


The difference to your example below is the wl_display_prepare_read(display) before creating the proxy, and wl_display_cancel_read(display) after setting the queue. The effect is that the other thread will not dispatch it, since it will only the dispatch events on the queue associated with it.

Now, wl_display_roundtrip_queue() is a function that currently exposes the bug, but I'm not sure what unforseen side effects would be to add a work-around like the above, and whether it is a good idea to do it. The obvious side effect would be stalling other dispatcher threads a bit more than before.

Pekka, any opinions whether we should temporarly "fix" wl_display_roundtrip_queue() until we introduce some new API?

Comment 5 Pekka Paalanen 2015-08-25 08:54:36 UTC

(In reply to Jonas Ådahl from comment #4)
> The difference to your example below is the wl_display_prepare_read(display)
> before creating the proxy, and wl_display_cancel_read(display) after setting
> the queue. The effect is that the other thread will not dispatch it, since
> it will only the dispatch events on the queue associated with it.
> 
> Now, wl_display_roundtrip_queue() is a function that currently exposes the
> bug, but I'm not sure what unforseen side effects would be to add a
> work-around like the above, and whether it is a good idea to do it. The
> obvious side effect would be stalling other dispatcher threads a bit more
> than before.
> 
> Pekka, any opinions whether we should temporarly "fix"
> wl_display_roundtrip_queue() until we introduce some new API?

It would only fix the roundtrip and none of the other problems. IMHO it would be a wasted effort, time would be better spent solving the underlying problem.

Using wl_display_prepare_read() is problematic, because it requires the wl_event_queue to be empty. You need a dispatch loop to drain the wl_event_queue until prepare_read() succeeds. Maybe that is not *that* bad since wl_display_roundtrip_queue() *is* a dispatching function.

However, wl_display_roundtrip_queue() explicitly documents itself to be incompatible with prepare_read() because roundtrip_queue() uses wl_display_dispatch_queue(). I really do not want to review whether we could break that rule in this specific case or not.

Because of the above, using wl_display_roundtrip{,_queue}() in a program using the same wl_display from multiple threads is a mistake to begin with.

All this makes me wonder if we could rewrite the roundtrip functions on top the prepare_read() API, and just deprecate wl_display_dispatch{,_queue}() completely. Multi-threaded programs using 
wl_display_dispatch{,_queue}() are broken to start with, and single-threaded programs using wl_display_dispatch{,_queue}() cannot use the mutually exclusive APIs at the same time by definition.

Yes, Mesa EGL Wayland platform implementation is broken. It uses wl_display_dispatch_queue().

Comment 6 Jonas Ådahl 2015-08-26 02:48:20 UTC

Yujie: as indirectly stated by Pekka, my example was incomplete. You would also need to dispatch pending events before the wl_display_prepare_read(display) call. This makes it, ignoring error handling, more or less:

	wl_surface_attach();
	while (wl_display_prepare_read_queue(display, queue) < 0)
		wl_display_dispatch_queue_pending(display, queue);
	window->callback = wl_surface_frame(window->surface);
	wl_proxy_set_queue(window->callback,queue2);
	wl_callback_add_listener(window->callback, &frame_listener, window);
	wl_display_cancel_read(display);
	wl_surface_commit(window->surface);

This has the side effect that events already in the queue will be always dispatched before the frame is requested. This may have unwanted effects, so please consider whether this is Ok in your architecture. If it is not, you need to make sure that the above code is done after all events on the queue has been dispatched.

Comment 7 Yujie Shen 2015-08-27 06:04:56 UTC

(In reply to Jonas Ådahl from comment #6)
> Yujie: as indirectly stated by Pekka, my example was incomplete. You would
> also need to dispatch pending events before the
> wl_display_prepare_read(display) call. This makes it, ignoring error
> handling, more or less:
> 
> 	wl_surface_attach();
> 	while (wl_display_prepare_read_queue(display, queue) < 0)
> 		wl_display_dispatch_queue_pending(display, queue);
> 	window->callback = wl_surface_frame(window->surface);
> 	wl_proxy_set_queue(window->callback,queue2);
> 	wl_callback_add_listener(window->callback, &frame_listener, window);
> 	wl_display_cancel_read(display);
> 	wl_surface_commit(window->surface);
> 
> This has the side effect that events already in the queue will be always
> dispatched before the frame is requested. This may have unwanted effects, so
> please consider whether this is Ok in your architecture. If it is not, you
> need to make sure that the above code is done after all events on the queue
> has been dispatched.

Thanks for your kindly help.
By adding 
>while (wl_display_prepare_read_queue(display, queue2) < 0)
>		wl_display_dispatch_queue_pending(display, queue2);
>wl_surface_commit(window->surface);
it works in my case.
but, I still don't understand why it requires the wl_event_queue to be empty??
I'm trying to summary the restrictions of wayland when multi-thread programming.
May I know if you have any idea about it.

Comment 8 Jonas Ådahl 2015-08-27 06:14:19 UTC

(In reply to Yujie Shen from comment #7)
> (In reply to Jonas Ådahl from comment #6)
> > Yujie: as indirectly stated by Pekka, my example was incomplete. You would
> > also need to dispatch pending events before the
> > wl_display_prepare_read(display) call. This makes it, ignoring error
> > handling, more or less:
> > 
> > 	wl_surface_attach();
> > 	while (wl_display_prepare_read_queue(display, queue) < 0)
> > 		wl_display_dispatch_queue_pending(display, queue);
> > 	window->callback = wl_surface_frame(window->surface);
> > 	wl_proxy_set_queue(window->callback,queue2);
> > 	wl_callback_add_listener(window->callback, &frame_listener, window);
> > 	wl_display_cancel_read(display);
> > 	wl_surface_commit(window->surface);
> > 
> > This has the side effect that events already in the queue will be always
> > dispatched before the frame is requested. This may have unwanted effects, so
> > please consider whether this is Ok in your architecture. If it is not, you
> > need to make sure that the above code is done after all events on the queue
> > has been dispatched.
> 
> Thanks for your kindly help.
> By adding 
> >while (wl_display_prepare_read_queue(display, queue2) < 0)
> >		wl_display_dispatch_queue_pending(display, queue2);
> >wl_surface_commit(window->surface);
> it works in my case.
> but, I still don't understand why it requires the wl_event_queue to be
> empty??
> I'm trying to summary the restrictions of wayland when multi-thread
> programming.
> May I know if you have any idea about it.

It is a requirement by wl_display_prepare_read_queue() that to succeed the queue has to be empty and it needs to succeed in order to block the other threads from dispatching. You should put it before wl_surface_frame() though, because it is between wl_surface_frame() and wl_proxy_set_queue() the race can happen, and don't forget to call wl_display_cancel_read().

Earlier today I wrote a summary about the known threading issues in the API. Please see bug 91769.

Comment 9 Yujie Shen 2015-08-27 06:28:43 UTC

(In reply to Jonas Ådahl from comment #8)
> (In reply to Yujie Shen from comment #7)
> > (In reply to Jonas Ådahl from comment #6)
> > > Yujie: as indirectly stated by Pekka, my example was incomplete. You would
> > > also need to dispatch pending events before the
> > > wl_display_prepare_read(display) call. This makes it, ignoring error
> > > handling, more or less:
> > > 
> > > 	wl_surface_attach();
> > > 	while (wl_display_prepare_read_queue(display, queue) < 0)
> > > 		wl_display_dispatch_queue_pending(display, queue);
> > > 	window->callback = wl_surface_frame(window->surface);
> > > 	wl_proxy_set_queue(window->callback,queue2);
> > > 	wl_callback_add_listener(window->callback, &frame_listener, window);
> > > 	wl_display_cancel_read(display);
> > > 	wl_surface_commit(window->surface);
> > > 
> > > This has the side effect that events already in the queue will be always
> > > dispatched before the frame is requested. This may have unwanted effects, so
> > > please consider whether this is Ok in your architecture. If it is not, you
> > > need to make sure that the above code is done after all events on the queue
> > > has been dispatched.
> > 
> > Thanks for your kindly help.
> > By adding 
> > >while (wl_display_prepare_read_queue(display, queue2) < 0)
> > >		wl_display_dispatch_queue_pending(display, queue2);
> > >wl_surface_commit(window->surface);
> > it works in my case.
> > but, I still don't understand why it requires the wl_event_queue to be
> > empty??
> > I'm trying to summary the restrictions of wayland when multi-thread
> > programming.
> > May I know if you have any idea about it.
> 
> It is a requirement by wl_display_prepare_read_queue() that to succeed the
> queue has to be empty and it needs to succeed in order to block the other
> threads from dispatching. You should put it before wl_surface_frame()
> though, because it is between wl_surface_frame() and wl_proxy_set_queue()
> the race can happen, and don't forget to call wl_display_cancel_read().
> 
> Earlier today I wrote a summary about the known threading issues in the API.
> Please see bug 91769.
Thanks.
I know it is a requirement by  wl_display_prepare_read_queue() .
I mean why it is designed as that, only the queue is empty then it block the other
threads from dispatching. It seems that wl_display_prepare_read_queue can also be designed as it still can block other threads from dispatching when the queue is not empty.

Comment 10 Jonas Ådahl 2015-08-27 06:38:18 UTC

(In reply to Yujie Shen from comment #9)
> (In reply to Jonas Ådahl from comment #8)
> > (In reply to Yujie Shen from comment #7)
> > 
> > It is a requirement by wl_display_prepare_read_queue() that to succeed the
> > queue has to be empty and it needs to succeed in order to block the other
> > threads from dispatching. You should put it before wl_surface_frame()
> > though, because it is between wl_surface_frame() and wl_proxy_set_queue()
> > the race can happen, and don't forget to call wl_display_cancel_read().
> > 
> > Earlier today I wrote a summary about the known threading issues in the API.
> > Please see bug 91769.
> Thanks.
> I know it is a requirement by  wl_display_prepare_read_queue() .
> I mean why it is designed as that, only the queue is empty then it block the
> other
> threads from dispatching. It seems that wl_display_prepare_read_queue can
> also be designed as it still can block other threads from dispatching when
> the queue is not empty.

I can't remember if any reason has been explicitly mentioned, but considering the regular use case (not this work around) where one is expected to actually poll() the fd and then read and dispatch events, one should dispatch the already queued events *before* starting to wait indefinitely for new input. The empty-queue requirement enforces the caller not to leave events ready for dispatching in the queue indefinitely.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.