Bug 72612 - weston-subsurfaces does not render/display
Summary: weston-subsurfaces does not render/display
Status: VERIFIED FIXED
Alias: None
Product: Wayland
Classification: Unclassified
Component: weston (show other bugs)
Version: unspecified
Hardware: Other All
: medium major
Assignee: Wayland bug list
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-11 19:00 UTC by U. Artie Eoff
Modified: 2014-01-21 21:52 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Weston startup log (3.60 KB, text/plain)
2013-12-11 19:00 UTC, U. Artie Eoff
Details

Description U. Artie Eoff 2013-12-11 19:00:59 UTC
Created attachment 90622 [details]
Weston startup log

The weston-subsurfaces client does not render/display in the Weston compositor using DRM or X11 backend with the gl-renderer.  The weston-subsurfaces client logs the following to the console:

Chosen EGL config details:
 RGBA bits: 8 8 8 8
 swap interval range: 0 - 1

----

wayland (master) heads/master-0-g1a58c7f
drm (master) libdrm-2.4.50-0-g4c5de72
mesa (master) heads/master-0-g1e71493
libva (master) heads/master-0-g73a11b3
intel-driver (master) heads/master-0-g9d0bd94
weston (master) heads/master-0-g1aaf3e4
Comment 1 Pekka Paalanen 2013-12-12 06:36:39 UTC
Strange, I am using the same weston revision and it works just fine. But my Mesa is a bit old, Mesa 10.0.0-devel (git-3785fe2). x11-backend with gl-renderer.

Eh, and trying to interactively resize it, Weston segfaulted somewhere inside the i965 Mesa driver, but that's another story.

Are you using cairo-glesv2 or cairo-image for toytoolkit? If it is cairo-image, I would expect all except perhaps the gl widget to work just fine, unless the problem is actually in Weston-Mesa interaction.
Comment 2 U. Artie Eoff 2013-12-12 12:55:48 UTC
(In reply to comment #1)
> Strange, I am using the same weston revision and it works just fine. But my
> Mesa is a bit old, Mesa 10.0.0-devel (git-3785fe2). x11-backend with
> gl-renderer.
> 

I'll try with an older Mesa and report back...

FWIW, during Weston 1.3 release cycle testing I used Mesa 9.2 and subsurfaces was working fine.

> Eh, and trying to interactively resize it, Weston segfaulted somewhere
> inside the i965 Mesa driver, but that's another story.
> 

Ugh, not good.  I use i965 driver, too.

> Are you using cairo-glesv2 or cairo-image for toytoolkit? If it is
> cairo-image, I would expect all except perhaps the gl widget to work just
> fine, unless the problem is actually in Weston-Mesa interaction.

I'm using cairo-glesv2 for toytk.
Comment 3 U. Artie Eoff 2013-12-12 15:13:49 UTC
weston-subsurfaces works fine with:

mesa (10.0) heads/10.0-0-gd0f606f
mesa (9.2) heads/9.2-0-gebe8e9b

So this is likely isolated to a regression in the master branch of Mesa.
Comment 4 Neil Roberts 2013-12-16 18:44:37 UTC
I think I understand what's going on here. I'm not really sure what to do about it though. It looks like the bug is triggered by my Mesa patch to make it block for the frame callback for SwapInterval(1) in get_back_bo instead of in eglSwapBuffers:

http://cgit.freedesktop.org/mesa/mesa/commit/?id=25cc889004aad6d1cab9edd76db89

weston-subsurfaces creates 2 subsurfaces. One of them is a regular widget which is drawn with cairo-glesv2. The other is custom EGL widget which renders a triangle with GL calls. The triangle widget is set to use eglSwapInterval(0). However, because the triangle widget is created using the toy toolkit, it actually also gets a secret cairo surface which will also use EGL to draw. Both of these cairo surfaces will be using eglSwapInterval(1) so they will install a frame callback after each buffer swap.

When drawing window_flush in window.c is called to swap all of the buffers for the sub-surfaces. It will first swap the secret cairo surface for the triangle and then it will swap the other cairo surface. When Cairo tries to call swap buffers it will first ‘acquire’ the GL context. Acquiring the context calls eglMakeCurrent with whatever surface Cairo last used. After acquiring the context it will then call eglMakeCurrent again with the actual surface that it's going to swap.

The problem is that calling eglMakeCurrent also calls get_back_bo. That means that when swapping the other cairo surface, it will also end up calling eglMakeCurrent and thus get_back_bo for the hidden triangle surface. The hidden triangle surface has a frame callback installed so get_back_bo will end up blocking. The triangle surface is a synchronized subsurface so the frame callback event is not going to be delivered until the parent surface also commits. However, that is never going to happen because weston-subsurfaces tries to swap the subsurfaces before it swaps the main surface so it just ends up in a deadlock.

So there are three weird things going on here. Fixing any one of these would make the problem go away:

1. The toy toolkit shouldn't be making a secret extra Cairo surface for the triangle widget.

2. Cairo shouldn't be redundantly rebinding the old surface before switching to a new one.

3. Mesa probably shouldn't be calling get_back_bo just because eglMakeCurrent is called on a surface. (Although maybe there is a good reason for that, I don't know).

But really if we fixed any of these then we'd still have a core problem that Mesa can block indefinitely when synchronized subsurfaces are involved. I suppose we had that problem before my patch as well, but it was just a bit harder to trigger because you'd have to try to swap the subsurface twice before swapping the parent surface.
Comment 5 Pekka Paalanen 2013-12-17 08:31:42 UTC
The triangle code sets up its own frame callback to "independently" drive the animation.

There should not be a secret cairo EGL surface for the triangle widget. If there is, I think that should be fixed first. I guess it just hasn't been an issue until now.

The interactions and deadlock avoidance with EGL vs. synchronized sub-surfaces has always been delicate. First it was only the frame callback, but now you have the wl_buffer.releases as well.

I believe it is a fact, that synchronized sub-surfaces will require one more buffer in the rotation than what a normal surface does, at least if they run "independently". The interactions are hard to follow, unfortunately.

I think I have assumed, that eglSwapInterval(0) would cause the EGL implementation to never stall indefinitely, but is that a feasible assumption?

Btw. how about --with-cairo=image?

I am using --with-cairo=image, so that's another possible reason why I might never see the problem.
Comment 6 Neil Roberts 2013-12-19 16:34:37 UTC
> There should not be a secret cairo EGL surface for the triangle widget. If
> there is, I think that should be fixed first. I guess it just hasn't been an
> issue until now.

I posted a patch to the mailing list to fix this:

http://lists.freedesktop.org/archives/wayland-devel/2013-December/012650.html

However it turns out I was wrong and that doesn't fix the problem. The problem still happens even if you pass -n to disable the triangle surface.

The sequence of drawing without the triangle surface goes like this:

1. draw main surface
2. draw subsurface
3. swap subsurface
4. swap main surface

As mentioned in comment #1, the first thing Cairo does when flushing (AKA swapping) the surface is to ‘acquire’ the GL context. After acquiring the context it will bind the context again with the EGL surface for the Cairo surface that is to be swapped. Acquiring the context binds whatever surface Cairo used last. At step 4 the last surface that was bound is the subsurface. That has a frame callback installed following the swap in step 3 which means that when eglMakeCurrent is called it will block in get_back_bo. This will block until the main surface is swapped which will never happen so it blocks forever.

> I think I have assumed, that eglSwapInterval(0) would cause the EGL
> implementation to never stall indefinitely, but is that a feasible
> assumption?

Yes, that is correct. With eglSwapInterval(0) Mesa should only ever block on a sync request and there should be no reason why that wouldn't come instantly. This problem isn't really related to the eglSwapInterval(0) changes but instead it is triggered by blocking on the frame callback in get_back_bo instead of in eglSwapBuffers. That was a separate change independent of eglSwapInterval to fix the 3 buffers problem.

> Btw. how about --with-cairo=image?

I haven't tested it but I assume using the image backend would fix the problem because that doesn't have the mess with rebinding the EGL context.

I think the best fix would be to make Mesa not call get_back_bo from eglMakeCurrent. The patch that added this call is this one:

http://cgit.freedesktop.org/mesa/mesa/commit/?h=db9c151d

I tried recklessly removing the call to intel_prepare_render to see if the situation has changed since 2010 but it breaks quite catastrophically. It looks like making this change would be a fair bit of work.
Comment 7 Kristian Høgsberg 2014-01-18 00:59:16 UTC
I'll post this to mesa-dev for review, fixes the problem here:

commit fad655f0ce208ae985c62c936d26e20d8a22f897
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Fri Jan 17 16:55:31 2014 -0800

    i965: Only update renderbuffers on initial intelMakeCurrent
    
    We call intel_prepare_render() in intelMakeCurrent() to make sure we have
    renderbuffers before calling _mesa_make_current().  The only reason we
    do this is so that we can have valid defaults for width and height.
    If we already have buffers for the drawable we're making current, we
    don't need to do this.
    
    https://bugs.freedesktop.org/show_bug.cgi?id=72540
    https://bugs.freedesktop.org/show_bug.cgi?id=72612
    
    Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>

diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c
index 78c06fc..b23cdef 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -911,6 +911,7 @@ intelMakeCurrent(__DRIcontext * driContextPriv,
    if (driContextPriv) {
       struct gl_context *ctx = &brw->ctx;
       struct gl_framebuffer *fb, *readFb;
+      struct intel_renderbuffer *rb = NULL;
 
       if (driDrawPriv == NULL && driReadPriv == NULL) {
          fb = _mesa_get_incomplete_framebuffer();
@@ -918,6 +919,7 @@ intelMakeCurrent(__DRIcontext * driContextPriv,
       } else {
          fb = driDrawPriv->driverPrivate;
          readFb = driReadPriv->driverPrivate;
+         rb = intel_get_renderbuffer(fb, BUFFER_BACK_LEFT);
          driContextPriv->dri2.draw_stamp = driDrawPriv->dri2.stamp - 1;
          driContextPriv->dri2.read_stamp = driReadPriv->dri2.stamp - 1;
       }
@@ -929,7 +931,12 @@ intelMakeCurrent(__DRIcontext * driContextPriv,
       intel_gles3_srgb_workaround(brw, fb);
       intel_gles3_srgb_workaround(brw, readFb);
 
-      intel_prepare_render(brw);
+      if (rb && !rb->mt) {
+         /* If we don't have buffers for the drawable yet, force a call to
+          * getbuffers here so we can have a default drawable size. */
+         intel_prepare_render(brw);
+      }
+
       _mesa_make_current(ctx, fb, readFb);
    } else {
       _mesa_make_current(NULL, NULL, NULL);
Comment 8 Kristian Høgsberg 2014-01-20 06:48:13 UTC
Pushed to mesa master, closing this.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.