Bug 72863 - Weston-simple-egl fullscreening causes client application flicker
Summary: Weston-simple-egl fullscreening causes client application flicker
Status: VERIFIED WONTFIX
Alias: None
Product: Mesa
Classification: Unclassified
Component: EGL/Wayland (show other bugs)
Version: 10.0
Hardware: Other All
: medium normal
Assignee: Wayland bug list
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-19 06:07 UTC by Anu Reddy
Modified: 2014-03-20 22:04 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Anu Reddy 2013-12-19 06:07:29 UTC
Steps:
1.	Launch Weston
2.	Launch Weston terminal : weston-terminal
3.	Execute simple egl : Weston-simple-egl
4.	Press <f11> key to enable weston-simple-egl fullscreen mode. 

Look for spinning RGB triangle flicker. Simple-egl does not render smoothly on fullscreen mode. Switch to unfullscreen mode using <f11>. Client window does not flicker. RGB triangle spins smoothly.


Environment :

Kernel:3.9.5-301.fc19.x86_64
wayland (HEAD) 1.3.91-0-g01bde63
drm (HEAD) libdrm-2.4.50-0-g4c5de72
mesa (HEAD) remotes/origin/10.0-0-g6f7da01
libva (HEAD) libva-1.2.1-0-g88ed1eb
intel-driver (HEAD) 1.2.1-0-g8f306e3
weston (HEAD) remotes/origin/master-0-gdf42a80
efl (HEAD) remotes/origin/efl-1.8-0-g90c2320
elementary (HEAD) remotes/origin/elementary-1.8-0-ge077db6
wayland-fits (HEAD) remotes/origin/HEAD-0-gcd75d94
Comment 1 U. Artie Eoff 2013-12-19 15:17:54 UTC
Were you able to capture this issue in a video capture?

I also wanted to note that when I saw this happening, it does not appear to have any relation to bug 72854 as one might initially conclude from the description.  The triangle flickering was quite severe and continuous, IIRC.
Comment 2 U. Artie Eoff 2013-12-19 20:39:35 UTC
This issue is not observed with mesa (master) heads/master-0-ga9bf599 (eglSwapInterval 0 or 1).  Only occurs with mesa <= 10.0.x.

When I activate video capture (wcap or libva), the issue disappears also.
Comment 3 U. Artie Eoff 2014-01-08 20:15:07 UTC
Using -s command line option makes this issue disappear, too... that is, 16 bpp EGL config makes it go away.
Comment 4 U. Artie Eoff 2014-01-09 16:23:14 UTC
Only happens when using composite bypass.
Comment 5 Neil Roberts 2014-01-17 17:24:53 UTC
I think I understand why this is happening. The client is sitting in a loop doing two things: rendering and then swapping the buffers. For the sake of this explanation the only thing we care about in the rendering part is that it calls get_back_bo to get a new back buffer. While swapping it does two important things: it waits for a frame callback and attaches a buffer. After the client attaches a buffer the compositor will go off and asynchronously do a render. The render doesn't have any effect on the client until it waits for the frame callback where it might receive a buffer release event so for the sake of argument we can imagine the rendering only happens during the wait. 
Mesa 10.0 has slots for three buffers which we will call 1, 2 and 3. These are created on demand but for the example we can imagine they always exist. The lifecycle of the buffers is like this:

At first all of the buffers are available to the client.

 *Client side*         *Compositor side*
  Free: 1 2 3           Attached to surface:
  Rendering:            Next scanout:
                        current scanout:

== get_back_bo:

The client will start rendering its first frame and call get_back_bo to grab the first free buffer.

 *Client side*         *Compositor side*
  Free: 2 3             Attached to surface:
  Rendering: 1          Next scanout:
                        current scanout:

== wait:

Before swapping, the client will wait for a frame callback. There hasn't been a frame yet so this does nothing.

 *Client side*         *Compositor side*
  Free: 2 3             Attached to surface:
  Rendering: 1          Next scanout:
                        current scanout:

== attach:

The client will then attach the buffer it rendered to and the compositor will go off and do a render itself. The compositor was previously sitting idle so it will do this render immediately without waiting for a page flip.

 *Client side*         *Compositor side*
  Free: 2 3             Attached to surface: 1
  Rendering:            Next scanout:
                        current scanout:

== get_back_bo:

The client will now grab another buffer to start rendering the second frame.

 *Client side*         *Compositor side*
  Free: 3               Attached to surface: 1
  Rendering: 2          Next scanout:
                        current scanout:

== wait:

Before swapping it will wait for the frame callback from the first frame. At this point we can assume the compositor has finished the render which means it will have queued a page flip with the first buffer.

 *Client side*         *Compositor side*
  Free: 3               Attached to surface: 1
  Rendering: 2          Next scanout: 1
                        current scanout:


== attach:

Now the client attaches the second frame...

 *Client side*         *Compositor side*
  Free: 3               Attached to surface: 2
  Rendering:            Next scanout: 1
                        current scanout:

== get_back_bo:

and grabs a buffer for its third frame.

 *Client side*         *Compositor side*
  Free:                 Attached to surface: 2
  Rendering: 3          Next scanout: 1
                        current scanout:

== wait:

The client now waits for the frame callback. The compositor will only send this once it has finished rendering and it will only do that after the page flip for the previous frame has completed. The compositor is still holding onto both buffers so no release event is sent.

 *Client side*         *Compositor side*
  Free:                 Attached to surface: 2
  Rendering: 3          Next scanout: 2
                        current scanout: 1

== attach:

Now the client attaches its third frame and the compositor is still holding on to all three buffers.

 *Client side*         *Compositor side*
  Free:                 Attached to surface: 3
  Rendering:            Next scanout: 2
                        current scanout: 1

== get_back_bo:

The compositor will now try to grab a buffer for the fourth frame but this is not possible because they are all locked. get_back_bo will return -1 and all of the rendering will be discarded.

 *Client side*         *Compositor side*
  Free:                 Attached to surface: 3
  Rendering: -1         Next scanout: 2
                        current scanout: 1

== wait:

Again will we wait until the compositor was finished painting frame three before attaching frame four. This entails waiting for the page flip from frame 2 which will cause a release event to be sent for the first frame. The client will catch this before attaching its buffer.

 *Client side*         *Compositor side*
  Free: 1               Attached to surface: 3
  Rendering: -1         Next scanout: 3
                        current scanout: 2

== attach:

eglSwapBuffers calls get_back_bo a final time just before attaching a buffer. At this point there is a free buffer available so we immediately attach it. The buffer hasn't been rendered to so it still contains the image for frame 1.

 *Client side*         *Compositor side*
  Free:                 Attached to surface: 1
  Rendering:            Next scanout: 3
                        current scanout: 2

== get_back_bo:

The client no longer has any buffers again so get_back_bo will fail again and it is back at the same state it was before so the loop should just continue like that.

 *Client side*         *Compositor side*
  Free:                 Attached to surface: 1
  Rendering: -1         Next scanout: 3
                        current scanout: 2

So basically I think this is just a symptom of the problem described in this message on the mailing list:

http://lists.freedesktop.org/archives/wayland-devel/2013-December/012456.html

The core problem is that if a client doesn't wait for a frame callback before starting to render then it will end up using one more buffer than it should need. In that email the client isn't fullscreen so the client should only need two buffers and ends up using three. In this case the client is fullscreen so it should need three buffers but actually needs four.

The solution on master is to also wait for the frame callback in get_back_bo which effectively forces the client to wait for a frame callback before rendering. We could backport this patch to the Mesa 10.0 branch to fix the problem but it might not be worth it as it's not totally risk-free. On the other hand I think this bug isn't really a new bug but it's just exemplified by the recent change to weston-simple-egl to stop it from waiting for a frame callback before rendering. In that case you could argue we can just leave the bug broken in Mesa 10.0.

The story I decribed above has one plot hole in that it looks like weston-simple-egl does manage to get some of the frames to render and approximately every third frame looks correct. I'm not quite sure why this is happening so perhaps my explanation isn't the full picture.
Comment 6 U. Artie Eoff 2014-03-20 22:02:56 UTC
Closing as won't fix for mesa <= 10.0 since it works fine in mesa 10.1 (which is released now) and backporting to 10.0 would be too risky.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.