Bug 106753 - [Firefox/Wayland] multithread deadlock at eglSwapBuffers() / wl_display_dispatch_queue()
Summary: [Firefox/Wayland] multithread deadlock at eglSwapBuffers() / wl_display_dispa...
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: EGL/Wayland (show other bugs)
Version: 18.0
Hardware: Other All
: medium normal
Assignee: Wayland bug list
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-31 08:48 UTC by Martin Stransky
Modified: 2018-05-31 09:57 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
firefox WAYLAND_DEBUG=1 log (21.57 KB, text/plain)
2018-05-31 09:19 UTC, Martin Stransky
Details

Description Martin Stransky 2018-05-31 08:48:19 UTC
When eglSwapBuffers() is called from non-main thread but on the same display as it's operated Gtk+/main loop the eglSwapBuffers() freezes at wl_display_poll()/poll() on the display fd.

Mozilla bugzilla bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1464823

backtrace of the lock (Compositor thread):
#0  0x00007ffff6da6589 in poll () at /lib64/libc.so.6
#1  0x00007ffff28776a9 in poll (__timeout=-1, __nfds=1, __fds=0x7fffc6f65ce0) at /usr/include/bits/poll2.h:46
#2  0x00007ffff28776a9 in wl_display_poll (display=display@entry=0x7ffff6a61440, events=events@entry=1)
    at src/wayland-client.c:1717
#3  0x00007ffff287913c in wl_display_dispatch_queue (display=0x7ffff6a61440, queue=0x7fffb49ec380)
    at src/wayland-client.c:1790
#4  0x00007fffc19650bb in  () at /lib64/libEGL_mesa.so.0
#5  0x00007fffc19560ea in eglSwapBuffers () at /lib64/libEGL_mesa.so.0
#6  0x00007fffe18010bd in mozilla::gl::GLLibraryEGL::fSwapBuffers(void*, void*) const (this=0x7fffeb8c5780 <mozilla::gl::sEGLLibrary>, dpy=0x7fffc2db2800, surface=0x7fffade57000) at /home/komat/tmp676-trunk-gtk3/src2/gfx/gl/GLLibraryEGL.h:229
#7  0x00007fffe1811ad8 in mozilla::gl::GLContextEGL::SwapBuffers() (this=0x7fffacc87000)
    at /home/komat/tmp676-trunk-gtk3/src2/gfx/gl/GLContextProviderEGL.cpp:501
#8  0x00007fffe18e4dc2 in mozilla::layers::CompositorOGL::EndFrame() (this=0x7fffade6d400)
    at /home/komat/tmp676-trunk-gtk3/src2/gfx/layers/opengl/CompositorOGL.cpp:1771
[...]
Comment 1 Jonas Ådahl 2018-05-31 08:58:57 UTC
I suspect it's not a bug, but because you're using eglSwapInterval(>0) and you're waiting for a frame callback that might not arrive.

Could you attach the output of reproducing the issue WAYLAND_DEBUG=1 set?
Comment 2 Martin Stransky 2018-05-31 09:07:00 UTC
Sure, there is the log:

[New Thread 0x7fffc284a700 (LWP 14998)]
Attempting load of libEGL.so
warning: Loadable section ".note.gnu.property" outside of ELF segments
Missing separate debuginfo for /lib64/libglapi.so.0
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/0d/4f60e608c20f6ec43d455caa77cc5e015f7a9d.debug
[3038562.485]  -> wl_display@1.get_registry(new id wl_registry@50)
[3038562.498]  -> wl_display@1.sync(new id wl_callback@51)
[3038562.867] wl_display@1.delete_id(51)
[3038562.877] wl_registry@50.global(1, "wl_drm", 2)
[3038562.884]  -> wl_registry@50.bind(1, "wl_drm", 2, new id [unknown]@52)
[3038562.912] wl_registry@50.global(2, "wl_compositor", 4)
[3038562.923] wl_registry@50.global(3, "wl_shm", 1)
[3038562.929] wl_registry@50.global(4, "wl_output", 2)
[3038562.956] wl_registry@50.global(5, "zxdg_output_manager_v1", 1)
[3038562.982] wl_registry@50.global(6, "wl_data_device_manager", 3)
[3038563.010] wl_registry@50.global(7, "gtk_primary_selection_device_manager", 1)
[3038563.035] wl_registry@50.global(8, "wl_subcompositor", 1)
[3038563.045] wl_registry@50.global(9, "xdg_wm_base", 1)
[3038563.055] wl_registry@50.global(10, "zxdg_shell_v6", 1)
[3038563.065] wl_registry@50.global(11, "wl_shell", 1)
[3038563.088] wl_registry@50.global(12, "gtk_shell1", 2)
[3038563.112] wl_registry@50.global(13, "zwp_pointer_gestures_v1", 1)
[3038563.122] wl_registry@50.global(14, "zwp_tablet_manager_v2", 1)
[3038563.132] wl_registry@50.global(15, "wl_seat", 5)
[3038563.141] wl_registry@50.global(16, "zwp_relative_pointer_manager_v1", 1)
[3038563.151] wl_registry@50.global(17, "zwp_pointer_constraints_v1", 1)
[3038563.161] wl_registry@50.global(18, "zxdg_exporter_v1", 1)
[3038563.170] wl_registry@50.global(19, "zxdg_importer_v1", 1)
[3038563.180] wl_registry@50.global(20, "zwp_linux_dmabuf_v1", 3)
[3038563.190]  -> wl_registry@50.bind(20, "zwp_linux_dmabuf_v1", 3, new id [unknown]@53)
[3038563.203] wl_registry@50.global(21, "zwp_keyboard_shortcuts_inhibit_manager_v1", 1)
[3038563.213] wl_registry@50.global(22, "gtk_text_input_manager", 1)
[3038563.224] wl_callback@51.done(14511)
[3038563.230]  -> wl_display@1.sync(new id wl_callback@51)
[3038563.390] wl_display@1.delete_id(51)
[3038563.396] wl_drm@52.device("/dev/dri/card1")
[3038563.447]  -> wl_drm@52.authenticate(11)
[3038563.454] wl_drm@52.format(808669761)
[3038563.460] wl_drm@52.format(808669784)
[3038563.464] wl_drm@52.format(875713089)
[3038563.468] wl_drm@52.format(875713112)
[3038563.472] wl_drm@52.format(909199186)
[3038563.478] wl_drm@52.format(961959257)
[3038563.482] wl_drm@52.format(825316697)
[3038563.486] wl_drm@52.format(842093913)
[3038563.492] wl_drm@52.format(909202777)
[3038563.498] wl_drm@52.format(875713881)
[3038563.502] wl_drm@52.format(842094158)
[3038563.505] wl_drm@52.format(909203022)
[3038563.510] wl_drm@52.format(1448695129)
[3038563.515] wl_drm@52.capabilities(1)
[3038563.521] zwp_linux_dmabuf_v1@53.format(875713089)
[3038563.528] zwp_linux_dmabuf_v1@53.modifier(875713089, 0, 0)
[3038563.539] zwp_linux_dmabuf_v1@53.modifier(875713089, 16777216, 1)
[3038563.549] zwp_linux_dmabuf_v1@53.modifier(875713089, 16777216, 2)
[3038563.558] zwp_linux_dmabuf_v1@53.modifier(875713089, 16777216, 4)
[3038563.567] zwp_linux_dmabuf_v1@53.format(875713112)
[3038563.572] zwp_linux_dmabuf_v1@53.modifier(875713112, 0, 0)
[3038563.580] zwp_linux_dmabuf_v1@53.modifier(875713112, 16777216, 1)
[3038563.591] zwp_linux_dmabuf_v1@53.modifier(875713112, 16777216, 2)
[3038563.599] zwp_linux_dmabuf_v1@53.modifier(875713112, 16777216, 4)
[3038563.607] zwp_linux_dmabuf_v1@53.format(808669761)
[3038563.611] zwp_linux_dmabuf_v1@53.modifier(808669761, 0, 0)
[3038563.621] zwp_linux_dmabuf_v1@53.modifier(808669761, 16777216, 1)
[3038563.631] zwp_linux_dmabuf_v1@53.modifier(808669761, 16777216, 2)
[3038563.639] zwp_linux_dmabuf_v1@53.format(909199186)
[3038563.644] zwp_linux_dmabuf_v1@53.modifier(909199186, 0, 0)
[3038563.654] zwp_linux_dmabuf_v1@53.modifier(909199186, 16777216, 1)
[3038563.665] zwp_linux_dmabuf_v1@53.modifier(909199186, 16777216, 2)
[3038563.674] wl_callback@51.done(14511)
[3038563.682]  -> wl_display@1.sync(new id wl_callback@51)
[3038563.797] wl_display@1.delete_id(51)
[3038563.810] wl_drm@52.authenticated()
[3038563.815] wl_callback@51.done(14511)
Initializing context 0x7fffbef10120 surface 0x7fffc3ed1000 on display 0x7fffc2ec8000
warning: Loadable section ".note.gnu.property" outside of ELF segments
[14914, Compositor] WARNING: robust_buffer_access_behavior marked as unsupported: file /home/komat/tmp676-trunk-gtk3/src2/gfx/gl/GLContextFeatures.cpp, line 915
[14914, Compositor] WARNING: Robustness supported, strategy is not LOSE_CONTEXT_ON_RESET!: file /home/komat/tmp676-trunk-gtk3/src2/gfx/gl/GLContext.cpp, line 1024
[14914, Compositor] WARNING: robustness marked as unsupported: file /home/komat/tmp676-trunk-gtk3/src2/gfx/gl/GLContextFeatures.cpp, line 915
[3038617.273]  -> wl_surface@40.frame(new id wl_callback@51)
[3038617.298]  -> zwp_linux_dmabuf_v1@53.create_params(new id zwp_linux_buffer_params_v1@54)
[3038617.314]  -> zwp_linux_buffer_params_v1@54.add(fd 40, 0, 0, 5120, 16777216, 4)
[3038617.339]  -> zwp_linux_buffer_params_v1@54.add(fd 41, 1, 5079040, 256, 16777216, 4)
[3038617.351]  -> zwp_linux_buffer_params_v1@54.create_immed(new id wl_buffer@55, 1280, 964, 875713089, 0)
[3038617.369]  -> zwp_linux_buffer_params_v1@54.destroy()
[3038617.375]  -> wl_surface@40.attach(wl_buffer@55, 0, 0)
[3038617.383]  -> wl_surface@40.damage(0, 0, 2147483647, 2147483647)
[3038618.014]  -> wl_surface@40.commit()
[3038618.345]  -> wl_surface@39.attach(wl_buffer@49, 0, 0)
[3038618.367]  -> wl_surface@39.set_buffer_scale(1)
[3038618.373]  -> wl_surface@39.damage(0, 0, 1280, 964)
[3038618.386]  -> xdg_toplevel@44.set_min_size(0, 0)
[3038618.393]  -> xdg_toplevel@44.set_max_size(0, 0)
[3038618.400]  -> xdg_surface@43.set_window_geometry(0, 0, 1280, 964)
[3038618.413]  -> wl_compositor@4.create_region(new id wl_region@56)
[3038618.419]  -> wl_region@56.add(7, 0, 1266, 7)
[3038618.464]  -> wl_region@56.add(0, 7, 1280, 957)
[3038618.475]  -> wl_surface@39.set_opaque_region(wl_region@56)
[3038618.501]  -> wl_region@56.destroy()
[3038618.508]  -> wl_compositor@4.create_region(new id wl_region@57)
[3038618.516]  -> wl_region@57.add(-10, -10, 1300, 984)
[3038618.530]  -> wl_surface@39.set_input_region(wl_region@57)
[3038618.536]  -> wl_region@57.destroy()
[3038635.245] wl_display@1.delete_id(54)
[3038635.263] wl_display@1.delete_id(56)
[3038635.268] wl_display@1.delete_id(57)

You can also easily reproduce that on Fedora Firefox builds:

1) take any firefox-60.0.1-4 build from koji
https://koji.fedoraproject.org/koji/packageinfo?packageID=37
(be sure it's -4, -5 builds have a workaround for it)

2) set webgl.force-enabled and layers.acceleration.force-enabled to true at about:config

3) run "firefox-wayland" on console
Comment 3 Martin Stransky 2018-05-31 09:09:44 UTC
Btw: You may want to look at "Compositor" thread (usually no. 26) which runs the Mesa GL rendering.
Comment 4 Jonas Ådahl 2018-05-31 09:14:25 UTC
Is that really the whole log? Seems like lot of messages are missing. Also, when attaching log, *attach* them as attachments, not paste them inline.
Comment 5 Martin Stransky 2018-05-31 09:19:23 UTC
Created attachment 139884 [details]
firefox WAYLAND_DEBUG=1 log

Sorry, there's the full log attached.
Comment 6 Jonas Ådahl 2018-05-31 09:55:49 UTC
So what it looks like is that:

1. Firefox creates two surfaces, wl_surface@39 and wl_surface@40
2. Firefox turn wl_surface@40 into a desynchronous subsurface on top of wl_surface@39
3. Firefox turn wl_surface@39 into a xdg_toplevel and commits the initial (empty) state
4. The compositor sends the configure event to the xdg_toplevel
5. Firefox replies immediately with ack_configure() without attaching a buffer

This will probably consume any compositor: you just mapped a window, asked to configure it (draw the first frame), but what happened is that no content was posted even though the configure event was acknowledged

6. Firefox asks for a frame callback, attaches a new buffer to the subsurface and commits it

When Firefox tries to do eglSwapBuffers() again

When it does this, it waits for the frame callback, but that will never happen because the toplevel was never mapped, meaning the subsurface never has a chance to be displayed, meaning the frame callback will never be invoked.

The xdg_surface interface states the following:

      For an xdg_surface to be mapped by the compositor, the following
      conditions must be met:
      (1) the client has assigned an xdg_surface-based role to the surface
      (2) the client has set and committed the xdg_surface state and the
	  role-dependent state to the surface
      (3) the client has committed a buffer to the surface

The third condition here was never met.

This raises the question, what is the intended content of the subsurface and what is the intended content of the toplevel?
Comment 7 Jonas Ådahl 2018-05-31 09:57:55 UTC
Closing this bug here for now, feel free to CC me on the Firefox bug to continue.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.