103538 – vkDestroySwapchain causes deadlock with X11

Bug 103538 - vkDestroySwapchain causes deadlock with X11

Summary: vkDestroySwapchain causes deadlock with X11

Status:	RESOLVED MOVED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/Vulkan/Common (show other bugs)
Version:	git
Hardware:	Other All

Importance:	medium normal
Assignee:	mesa-dev
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2017-11-01 20:34 UTC by maister
Modified:	2019-09-18 18:13 UTC (History)
CC List:	5 users (show)

See Also:
i915 platform:
i915 features:

Attachments

Description maister 2017-11-01 20:34:34 UTC

When using the X11 backend in GLFW on RADV (and Anvil for that matter), when tearing down the device, vkDestroySwapchain deadlocks while waiting for a thread to complete execution. vkDeviceWaitIdle() has been called prior.

thread 1:
#0  0x00007ffff747743d in pthread_join () from /usr/lib/libpthread.so.0
#1  0x00007ffff4968fa0 in x11_swapchain_destroy (anv_chain=0x555556680b50, pAllocator=0x55555654eac8) at wsi/wsi_common_x11.c:1088
#2  0x00007ffff4957cdc in radv_DestroySwapchainKHR (_device=0x55555654eac0, _swapchain=0x555556680b50, pAllocator=0x0) at radv_wsi.c:418
#3  0x00007fffeee27eff in ?? () from /usr/lib/libVkLayer_unique_objects.so

thread 2:
#0  0x00007ffff747c38d in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007ffff4a48b90 in cnd_wait (cond=0x5555564bdd18, mtx=0x5555564bdcf0) at ../../include/c11/threads_posix.h:159
#2  0x00007ffff4a49283 in util_queue_thread_func (input=0x5555564be310) at u_queue.c:171
#3  0x00007ffff4a48aa5 in impl_thrd_routine (p=0x5555564be330) at ../../include/c11/threads_posix.h:87
#4  0x00007ffff747608a in start_thread () from /usr/lib/libpthread.so.0
#5  0x00007ffff68c424f in clone () from /usr/lib/libc.so.6

thread 3:
#0  0x00007ffff68b9d4b in poll () from /usr/lib/libc.so.6
#1  0x00007ffff65b18e0 in ?? () from /usr/lib/libxcb.so.1
#2  0x00007ffff65b3779 in xcb_wait_for_special_event () from /usr/lib/libxcb.so.1
#3  0x00007ffff4968a14 in x11_manage_fifo_queues (state=0x555556680b50) at wsi/wsi_common_x11.c:936
#4  0x00007ffff747608a in start_thread () from /usr/lib/libpthread.so.0
#5  0x00007ffff68c424f in clone () from /usr/lib/libc.so.6

Comment 1 maister 2017-11-01 20:34:54 UTC

Running Wayland on Gnome 3 if that helps.

Comment 2 Henri Verbeet 2017-11-03 16:06:22 UTC

That backtrace looks a lot like the issue for which I originally wrote https://patchwork.freedesktop.org/patch/183215/. Unfortunately it looks like it's harder than that.

Comment 3 maister 2018-01-07 10:42:31 UTC

Also seen this issue now on Xorg. Also observed with the new AMD open driver, could be an Xorg bug?

Comment 4 Emil Velikov 2018-01-16 17:17:36 UTC

Care to do a simple crucible test [1]? Devs might be more tempted if there's a simple reproducer ;-)

[1] https://cgit.freedesktop.org/mesa/crucible/

Comment 5 Henri Verbeet 2018-01-16 17:28:06 UTC

(In reply to Emil Velikov from comment #4)
> Care to do a simple crucible test [1]? Devs might be more tempted if there's
> a simple reproducer ;-)
> 
I'd be tempted, but isn't crucible headless? This is a WSI issue.

Comment 6 Emil Velikov 2018-01-16 18:28:00 UTC

Hmm it seems to be - thanks for the correction. Might be worth poking the devs if they're OK with the idea of having WSI tests.

Comment 7 maister 2018-01-17 07:11:16 UTC

This might not be a bug after all. The app destroyed the X window before tearing down the swapchain, which is bogus, and probably where the deadlock comes from. It stopped once it was done properly.

Comment 8 Henri Verbeet 2018-01-17 09:45:17 UTC

(In reply to maister from comment #7)
> This might not be a bug after all. The app destroyed the X window before
> tearing down the swapchain, which is bogus, and probably where the deadlock
> comes from.
It's what triggers the deadlock, yes, but I also think that's a scenario that's supposed to work.

Comment 9 Chad Versace 2018-01-17 21:24:47 UTC

No, that's not guaranteed to work. If the app destroys the X11 window before calling vkDestroySwapchainKHR, then the X connection may recycle the window's XID between the two calls, causing havoc. The manpage for XDestroyWindow says this: "The window should never be referenced again" after destruction.

Even though that XDestroyWindow-before-vkDestroySwapchainKHR is not guaranteed to work, deadlock is still an undesirable outcome. Maybe there is a way for Mesa to avoid the deadlock.

Comment 10 Henri Verbeet 2018-01-18 11:56:51 UTC

(In reply to Chad Versace from comment #9)
> No, that's not guaranteed to work. If the app destroys the X11 window before
> calling vkDestroySwapchainKHR, then the X connection may recycle the
> window's XID between the two calls, causing havoc.
Yeah, there's that. Ultimately it's not up to the application whether the X11 window goes away or not though. If the window manager is nice it'll send a WM_DELETE_WINDOW before the fact, but that's up to the window manager; the application may only get a DestroyNotify after the window is already gone. I think it would be unfortunate if the Vulkan WSI was unable to handle that scenario.

As far as the Vulkan spec is concerned, in the version of the spec I have here (1.0.68) there doesn't seem to be any language specifically about the lifetime of the xcb/X11 window. The closest reference I found was "Several WSI functions return VK_ERROR_SURFACE_LOST_KHR if the surface becomes no longer available." in 30.2.10, "Platform-Independent Information", which seems to suggest that's how the Vulkan WSI should handle the native window going away, but it hardly sounds like any kind of guarantee. The spec doesn't explicitly require the native window to stay available either though.

> Even though that XDestroyWindow-before-vkDestroySwapchainKHR is not
> guaranteed to work, deadlock is still an undesirable outcome. Maybe there is
> a way for Mesa to avoid the deadlock.
https://patchwork.freedesktop.org/patch/183215/ has some discussion about the issue. In short though, the xcb_poll_for_special_event() in x11_manage_fifo_queues() may never return if the window was destroyed before the present request completed. Avoiding that by instead using xcb_poll_for_special_event() risks introducing random stuttering. I think the suggested xcb_wait_for_special_event_with_timeout() would work, but that would require introducing new XCB API, which is a bit more effort than I personally care for at the moment.

Comment 11 GitLab Migration User 2019-09-18 18:13:02 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/176.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.