Bug 98833 - [REGRESSION, bisected] Wayland revert commit breaks non-Vsync fullscreen frame updates
Summary: [REGRESSION, bisected] Wayland revert commit breaks non-Vsync fullscreen fram...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: EGL/Wayland (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Wayland bug list
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 98471
  Show dependency treegraph
 
Reported: 2016-11-23 17:35 UTC by Eero Tamminen
Modified: 2017-05-19 09:34 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Patch to add frame swap delay option to weston-simple-egl (1.32 KB, patch)
2016-11-28 13:59 UTC, Eero Tamminen
Details | Splinter Review
Patch to add frame swap delay option to weston-simple-egl (1.51 KB, text/plain)
2017-01-16 12:45 UTC, Eero Tamminen
Details

Description Eero Tamminen 2016-11-23 17:35:40 UTC
Most of SynMark2 v7.0.1 (internal test-suite) Wayland version tests' fullscreen frame updates start to be shown in random order after following commit:
------------------
commit 9ca6711faa03a9527c0946f2489e42cd9a62235c
Author:     Daniel Stone <daniels@collabora.com>
AuthorDate: Wed Jun 1 09:59:06 2016 +0100
Commit:     Daniel Stone <daniels@collabora.com>
CommitDate: Thu Nov 10 10:25:03 2016 +0000

    Revert "wayland: Block for the frame callback in get_back_bo not dri2_swap_buffers"
    
    This reverts commit 25cc889004aad6d1cab9edd76db898658e347b97, though
    since the code has changed, it was applied manually.
    
    The intent of moving blocking from SwapBuffers to get_back_bo, was to
    avoid unnecessary triple-buffering by ensuring that the compositor had
    fully processed the previous frame before we started rendering. This
    means that the only time we would have to resort to triple-buffering
    would be when the buffer is directly scanned out, thus saving an extra
    buffer for composition anyway.
    
    The 'repaint window' changes introduced in Weston since then, however,
    have narrowed the window of time between the frame event being sent and
    the repaint loop needing to conclude, to 7ms by default, in order to
    reduce latency. This means however that blocking in get_back_bo gives a
    maximum of 7ms for the entire GL submission to begin and complete.
    
    Not only this, but if a client is using buffer_age to avoid full
    repaints, the buffer-age request will stall in get_back_bo until the
    frame callback completes, meaning that the client cannot even calculate
    the repaint area before the 7ms window.
    
    The combination of the two meant that WebKit-GTK+ was failing to
    achieve full framerate on a Minnowboard, due to spending a great deal of
    its time attempting to query the age of the next buffer before redraw.
    
    Revert to the previous behaviour of allowing rendering to begin but
    delaying SwapBuffers, unless and until we can find a more gentle
    behaviour.

    Signed-off-by: Daniel Stone <daniels@collabora.com>
    Reviewed-by: Jonas Ådahl <jadahl@gmail.com>
    Reviewed-by: Derek Foreman <derekf@osg.samsung.com>
    Tested-by: Derek Foreman <derekf@osg.samsung.com>
    Cc: Kristian Høgsberg <krh@bitplanet.net>
------------------

Frame updates work fine with 12.0.4 & 13.0.1 (and Ubuntu Mesa 11.2), but not after above commit.

Test suite has options for using single, double or triple buffering and toggling Vsync, but these didn't have any effect.

Only if the tests are run in non-native (fullscreen) resolution, or as windowed, the problem isn't visible.

I wasn't able to reproduce the issue with (Ubuntu) Glmark2 Wayland version, and didn't have any other test-cases at hand.

So far I've tested this only on SKL, with Ubuntu 16.04, using its default kernel (i.e. don't know whether kernel affects it).
Comment 1 Daniel Stone 2016-11-23 18:21:37 UTC
Which compositor is this? I wouldn't be at all surprised if this was Weston being broken, but I'm happy to give it a closer look. If you can reproduce this with something (anything) publicly available as well, that would obviously also be a massive help ...
Comment 2 Pekka Paalanen 2016-11-24 08:21:27 UTC
(In reply to Eero Tamminen from comment #0)
> Test suite has options for using single, double or triple buffering and
> toggling Vsync, but these didn't have any effect.

Just curious, how do you actually choose between single, double or triple buffering? I thought EGL didn't give any explicit control on buffering to the app, and setting EGL_RENDER_BUFFER to EGL_SINGLE_BUFFER makes no sense on Wayland. How do you control Vsync? We don't have any API on Wayland to turn vsync off, as in, enable visible tearing.

I could understand better if you explained what you use in EGL API terms instead, since AFAIK there is no way to explicitly choose a "buffering scheme", EGL made sure to abstract that away from the app.
Comment 3 Eero Tamminen 2016-11-24 09:42:56 UTC
Before checking other things, I'm going to try whether this happens also with latest kernel and on something else than SKL.


(In reply to Daniel Stone from comment #1)
> Which compositor is this? I wouldn't be at all surprised if this was Weston
> being broken, but I'm happy to give it a closer look.

This was with Weston, both the 1.9 version in Ubuntu 16.04 and version I built from git.

Could you propose what else I should test besides Weston?  It should be something that can be tested with reasonable effort on Ubuntu (16.04).

(Gnome on Wayland option in lightdm just crashes, I'll check whether things work better with Gdm.)


> If you can reproduce this with something (anything) publicly available as well, that would obviously also be a massive help ...

I guess this isn't something that would show up with just normal desktop 2D usage of 3D driver, as otherwise somebody would already have done a bug about it.

Are there other 3D tests than Glmark2 which work on Wayland?   Preferably something made for GLES 3.x, GL 3.x or GL 4.x, not old GL / GLES 2.0 stuff like Glmark2, and preferably something already available on Ubuntu so that they can be checked quickly...


(In reply to Pekka Paalanen from comment #2)
> (In reply to Eero Tamminen from comment #0)
> > Test suite has options for using single, double or triple buffering and
> > toggling Vsync, but these didn't have any effect.
[...]
> I could understand better if you explained what you use in EGL API terms
> instead, since AFAIK there is no way to explicitly choose a "buffering
> scheme", EGL made sure to abstract that away from the app.

They were just options in the test-suite, I'll check whether they actually should do something on EGL/Wayland (test-suite supports also other APIs &  OSes, so it's possible that all options aren't supported on all platforms).


Apitrace replay didn't work on a trace, do I need to do something specific to get Apitrace to replay traces on Wayland?

(Apitrace changes some things, so I don't trust trace dump to be reliable in telling how program works until I've seen issue also on trace replay.)
Comment 4 Pekka Paalanen 2016-11-24 10:09:01 UTC
> (In reply to Pekka Paalanen from comment #2)
> > (In reply to Eero Tamminen from comment #0)
> > > Test suite has options for using single, double or triple buffering and
> > > toggling Vsync, but these didn't have any effect.
> [...]
> > I could understand better if you explained what you use in EGL API terms
> > instead, since AFAIK there is no way to explicitly choose a "buffering
> > scheme", EGL made sure to abstract that away from the app.
> 
> They were just options in the test-suite, I'll check whether they actually
> should do something on EGL/Wayland (test-suite supports also other APIs & 
> OSes, so it's possible that all options aren't supported on all platforms).
> 
> 
> Apitrace replay didn't work on a trace, do I need to do something specific
> to get Apitrace to replay traces on Wayland?
> 
> (Apitrace changes some things, so I don't trust trace dump to be reliable in
> telling how program works until I've seen issue also on trace replay.)

Sorry, no idea about apitrace.

Since it's a buffer posting issue instead of a rendering issue, I thought it might be fairly easy to write a minimal program reproducing the problem if we knew how your test suite uses EGL. It shouldn't matter at all which flavour of GL was used, except for portability. I'm not aware of an existing test program like that, so Weston repository could probably use one similar to presentation-shm which can run in one of several different operating modes and report timings.
Comment 5 Eero Tamminen 2016-11-24 17:19:00 UTC
Happens both on BDW & SKL, both with Ubuntu 16.04 kernel & latest kernel from drm nightly git.

Not sure whether I have time to write test program, but at least I can provide EGL/GL API calls from apitrace dump.
Comment 6 Eero Tamminen 2016-11-25 16:20:49 UTC
Wrong frame ordering issue doesn't happen with (Ubuntu 16.04 version of) Gnome Wayland, only with Weston.

The difference between them is that gnome-shell does seem to do compositing as gnome-shell is doing buffer swaps while the tests are running, and Weston doesn't (there are no GL draw calls either).  I.e. (Ubuntu 16.04 version of) Gnome Wayland is NOT a valid test-case for this bug.

(In the worst case, for one test gnome-shell managed to do updates only at 15 FPS while test minimum was >120 FPS.  That was very visible and another reason why I looked into frame updates.)
Comment 7 Eero Tamminen 2016-11-25 16:23:28 UTC
(In reply to Eero Tamminen from comment #3)
> They were just options in the test-suite, I'll check whether they actually
> should do something on EGL/Wayland (test-suite supports also other APIs & 
> OSes, so it's possible that all options aren't supported on all platforms).

-> Frame buffer count option has no effect at all in EGL/Wayland, VSync option maps to eglSwapInterval() call.


Is there something minimal like glxgears for Wayland, which sources I could easily modify to do screen updates similarly to the test-suite I'm using?
Comment 8 Pekka Paalanen 2016-11-28 08:37:46 UTC
(In reply to Eero Tamminen from comment #7)
> -> Frame buffer count option has no effect at all in EGL/Wayland, VSync
> option maps to eglSwapInterval() call.

That makes sense.

> Is there something minimal like glxgears for Wayland, which sources I could
> easily modify to do screen updates similarly to the test-suite I'm using?

Yup, there is
https://cgit.freedesktop.org/wayland/weston/tree/clients/simple-egl.c
It's a whole program almost in one source file.

You might want to change the redraw() function to not use a clock but rather a frame counter to advance the animation, otherwise racing at high rates the difference between consecutive frames might be too small to notice.

Maybe it can already reproduce the problem when you run it fullscreen, opaque, and swapinterval=0?

I see there are some left-overs of frame callback handling, but it does not actually use frame callbacks manually. I was going to say you'd need to remove that logic, but it's already gone. All throttling is left for eglSwapBuffers().
Comment 9 Eero Tamminen 2016-11-28 13:59:51 UTC
Created attachment 128240 [details] [review]
Patch to add frame swap delay option to weston-simple-egl

(In reply to Pekka Paalanen from comment #8)
> Maybe it can already reproduce the problem when you run it fullscreen,
> opaque, and swapinterval=0?

That wasn't enough, nor was making exactly the same (E)GL calls.

What was needed, was just using slower frame update rate.  The problem is quite visible when the test goes 60-100 FPS.

Attached patch adds delay option to weston-simple-egl.  To see the problem, try with something like:
  ./weston-simple-egl -b -f -o -d 15500
Comment 10 Eero Tamminen 2017-01-04 11:05:50 UTC
Ping... Were you able to reproduce the issue with patched weston-simple-egl?
Comment 11 Pekka Paalanen 2017-01-13 15:16:23 UTC
Hi, sorry, just back from holidays. The patch looks fine so would be nice to have that on wayland-devel@ mailing list if you didn't send it already. I think it'd be an ok addition to simple-egl.

Does it matter if the delay is before or after swapbuffers? I would slightly prefer it before swapbuffers so it would emulate time spent in drawing. That could be the rationale in the commit message, too.

I haven't had a chance to test yet.
Comment 12 Eero Tamminen 2017-01-16 12:45:27 UTC
Created attachment 128981 [details]
Patch to add frame swap delay option to weston-simple-egl
Comment 13 Eero Tamminen 2017-01-16 12:46:10 UTC
(In reply to Pekka Paalanen from comment #11)
> Hi, sorry, just back from holidays. The patch looks fine so would be nice to
> have that on wayland-devel@ mailing list if you didn't send it already. I
> think it'd be an ok addition to simple-egl.

I'm not on wayland list, and rather not subscribe on one more.  I don't need attributions, so feel free to apply it as you see fit.


> Does it matter if the delay is before or after swapbuffers? I would slightly
> prefer it before swapbuffers so it would emulate time spent in drawing.

As expected, it doesn't matter.  Attached is updated patch.
Comment 14 Pekka Paalanen 2017-02-20 14:12:37 UTC
(In reply to Eero Tamminen from comment #9)
> Created attachment 128240 [details] [review] [review]
> Patch to add frame swap delay option to weston-simple-egl
> 
> That wasn't enough, nor was making exactly the same (E)GL calls.
> 
> What was needed, was just using slower frame update rate.  The problem is
> quite visible when the test goes 60-100 FPS.
> 
> Attached patch adds delay option to weston-simple-egl.  To see the problem,
> try with something like:
>   ./weston-simple-egl -b -f -o -d 15500

Yes! I just installed Mesa git master and using Weston git master with your delay patch, and it produces a mighty glitchy animation indeed.

I am on "GL renderer: Mesa DRI Intel(R) Haswell Desktop".
Comment 15 Eero Tamminen 2017-02-20 14:55:22 UTC
Ok, as it happens on HSW, BDW & SKL, I would say that it's a generic problem.  

Is there any Mesa driver which non-vsynched frame updates are not broken by the indicated Mesa commit?
Comment 16 Eero Tamminen 2017-05-10 14:12:05 UTC
(In reply to Pekka Paalanen from comment #14)
> Yes! I just installed Mesa git master and using Weston git master with your
> delay patch, and it produces a mighty glitchy animation indeed.
> 
> I am on "GL renderer: Mesa DRI Intel(R) Haswell Desktop".

(In reply to Eero Tamminen from comment #15)
> Ok, as it happens at least on HSW, BDW & SKL, I would say that it's a generic
> problem.  
> 
> Is there any Mesa driver which non-vsynched frame updates are not broken by
> the indicated Mesa commit?

I.e. is there any reason why Daniel's revert should not be reverted?
Comment 17 Daniel Stone 2017-05-16 09:31:45 UTC
I've been looking at this over the past couple of days, but haven't quite been able to figure it out. Hopefully I can crack it today.
Comment 18 Daniel Stone 2017-05-16 10:03:00 UTC
Right, here we are:
https://lists.freedesktop.org/archives/mesa-dev/2017-May/155791.html
Comment 19 Daniel Stone 2017-05-18 14:13:30 UTC
(In reply to Daniel Stone from comment #18)
> Right, here we are:
> https://lists.freedesktop.org/archives/mesa-dev/2017-May/155791.html

Eero, are you able to test this?
Comment 20 Eero Tamminen 2017-05-19 07:43:11 UTC
(In reply to Daniel Stone from comment #19)
> (In reply to Daniel Stone from comment #18)
> > Right, here we are:
> > https://lists.freedesktop.org/archives/mesa-dev/2017-May/155791.html
> 
> Eero, are you able to test this?

I may have again time for this next month, not currently.
Comment 21 Daniel Stone 2017-05-19 09:34:14 UTC
No problem, I've pushed with Pekka's ack in the meantime. Please reopen if this is still broken for you later. Thanks for the really detailed report!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.