Bug 35930 - flickering in many OpenGL applications with composition manager
Summary: flickering in many OpenGL applications with composition manager
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: git
Hardware: All Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 35877 37769 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-04-03 08:53 UTC by Bryan Cain
Modified: 2011-09-02 09:54 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
DDX: Implement pageflip completion event handling. (11.39 KB, patch)
2011-08-22 17:54 UTC, Mario Kleiner
no flags Details | Splinter Review
DDX: Update front buffer pixmap and names before exchanging buffers (3.36 KB, patch)
2011-08-22 17:55 UTC, Mario Kleiner
no flags Details | Splinter Review
DDX: Fixes to swap scheduling, especially for copy-swaps. (4.68 KB, patch)
2011-08-22 17:56 UTC, Mario Kleiner
no flags Details | Splinter Review
disable swaps of of-screen pixmaps (892 bytes, patch)
2011-08-23 02:40 UTC, maximlevitsky
no flags Details | Splinter Review

Description Bryan Cain 2011-04-03 08:53:16 UTC
When using Compiz with Nouveau on an NVA5 (GeForce GT 330M), there is an effect which can best be described as "flickering".  It does not happen with Metacity, and I'm not sure whether it happens with compositing window managers other than Compiz.

In the main menus of both SuperTux 0.3.3 and SuperTuxKart, it appears as flickering menu and sprite textures.  In Alien Arena, when the game is paused, it looks like parts of the 3D scene are "bleeding through" the pause menu in some frames.  I've done a lot of investigation into the causes and symptoms of this issue in the past few days, so please bear with me as I explain what I've found.

My initial theory was that things were being drawn in the wrong order, causing, for example, the SuperTux background to be drawn over the menu.  I later found this not to be the case, since inserting a manual glFlush() after drawing each sprite in SuperTux 0.3.3 made no difference.  This theory also couldn't explain why the textures at the bottom of the screen flicker much more often in the SuperTux 0.3.3 credits, or why the main menu of SuperTux 0.1.3 (with OpenGL enabled) is unaffected.

I then remembered that all of the affected games were rendering on top of a previous buffer, and realized that the issue is that the window manager is displaying buffers before the rendering into them is finished.

After some investigation, I've confirmed that the renderer is being properly flushed by glXSwapBuffers, and that calling glFinish() right before swapping the buffers makes no difference.

So my current theory is that one of the following is happening:
* SwapBuffers, using page flipping, is swapping to the wrong buffer - the one that is about to be rendered into (sounds unlikely, and I'm unsure of whether it's possible with DRI2 for such an issue to affect only one driver)
* glFlush and glFinish (nv50_flush/nouveau_fence_wait) are not doing their job properly (more likely)

I believe these adequately explain the following symptoms I'm experiencing:
* flickering textures in SuperTux 0.3.3 (the background is rendered before the sprites, menu, etc.)
* why the SuperTux 0.1.3 main menu is unaffected in OpenGL mode (it has a very slow framerate)
* why the issue went away in SuperTux 0.3.3 when I introduced a manual delay of 30 milliseconds after each buffer swap
* why parts of the 3D scene in Alien Arena sometimes "bleed through" the pause screen
* why the sprites at the bottom of the screen flicker more often in the SuperTux credits (they are rendered from top to bottom)
* why glFlush and glFinish do not make a difference
Comment 1 maximlevitsky 2011-04-03 08:59:06 UTC
Kwin4 is affected as well.
In fact here supertuxkart doesn't show this problem in compiz , but in kwin4 it flickers badly.
Comment 2 maximlevitsky 2011-05-17 12:54:57 UTC
*** Bug 35877 has been marked as a duplicate of this bug. ***
Comment 3 maximlevitsky 2011-05-17 12:57:12 UTC
In fact the exact same issue happens in kwin as well.
In addition to that often the game itself plays just fine, but once you exit it, the composition manager starts to flicker badly in pretty much same way
Comment 4 Bryan Cain 2011-05-17 14:11:01 UTC
How did you narrow the issue down to Mesa?
Comment 5 maximlevitsky 2011-05-17 15:56:24 UTC
Well thats happen with compiz, so its 3D related.
So as a good citizen, I set the component as nouveau wiki suggests for 3D bugs.
This used not to happen in distinct past, so a bisect could be possible
(execpt that is sucky to search libdrm commit tree for the commit that will work with specific mesa commit)
Comment 6 Mario Kleiner 2011-05-18 13:28:48 UTC
From my understanding of the code in the ddx and the way dri2 swap scheduling/completion is supposed to work, it could be that you're hitting a limitation of the current nouveau ddx when pageflipping is enabled for bufferswaps:

<http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/tree/src/nouveau_dri2.c#n272>

-> For page-flipped bufferswaps, which are on by default, the current ddx doesn't wait for swap completion before it notifies x / mesa of it. Instead it over-optimistically assumes that the swap completes at the very moment it is scheduled, which is almost never the case.

-> Mesa would start to render to what it thinks is the post-swap backbuffer, but in reality it is the current pre-swap frontbuffer. Unless they have some other unorthodox synchronization mechanism in place to prevent that. You'd see many of the symptoms described here if i'm correct. And a pause of 30 msecs after each bufferswap as described would fix the issue in many cases.

Iow, the ddx implementation isn't yet ready to do this properly for page-flipped fullscreen swaps, as they happen with video games and quite often with desktop compositor.

You could try if adding the option "Pageflip" "off" in xorg.conf fixes the problems for now, until proper implementation is there.

As far as i can see, the current ddx code has a few more issues there -- it does do stuff differently in some cases than how it is meant to be done for the dri2 swapbuffers and timestamping implementation.

-mario
Comment 7 maximlevitsky 2011-05-18 13:56:32 UTC
Nope, thats not pageflipping fault.
I suspected that long ago and tested.
To be sure I tested that again:



maxim@maxim-laptop:~$ cat /etc/X11/xorg.conf | grep Flip
        Option "PageFlip" "false"

maxim@maxim-laptop:~$ cat /usr/local/var/log/Xorg.0.log | grep flip
[  2674.564] (**) NOUVEAU(0): Page flipping disabled

Still, exactly same problem.
Comment 8 Bryan Cain 2011-05-18 13:59:07 UTC
This could still be an issue somewhere in the DDX driver, even if page flipping is not the problem.
Comment 9 maximlevitsky 2011-05-18 14:51:37 UTC
To test that further I added 'return FALSE;' in nouveau_dri2.c:can_exchange()
Seeing that this doesn't help I added:

usleep(300 * 1000);

just before:
DRI2SwapComplete(s->client, draw, frame, tv_sec, tv_usec,
		 type, s->func, s->data);

int nouveau_dri2.c in DDX

And despite that huge delay (I tried first 30 * 1000 as suggested), nether rendering FPS dropped nor problem fixed.
Comment 10 Christoph Bumiller 2011-05-19 02:37:41 UTC
I suppose that's because the DDX probably isn't the one copying out the data too early, but compiz (using mesa).

Now, compiz' operation should be synchronized with the other GL apps by the kernel the same way that the DDX is synchronized with them, provided that all apps shadering a nouveau_bo emit their validation relocs properly, which they seem to do (if the writing one didn't, you'd always have flickering even without compiz).

That leaves me with no more ideas (where it's the hw driver's fault) for the moment.
Comment 11 maximlevitsky 2011-05-20 17:59:11 UTC
I noticed something funny that might nail down that bug.
If I rotate the screen, the bug disappears.

I can even rotate the screen while game running and control the flicker this way (although sometimes flicker doesn't disappear - but compiz restart sure reappears it, and on the other hand flicker always appears if I start the game in rotated mode and rotate to normal).

I tried left and upside-down rotations.
Comment 12 maximlevitsky 2011-05-20 18:50:22 UTC
Well, I need some lessons in common sense. I need a bit more of it...



eb83c830c87bce345748edef3b50660246143db7 is the first bad commit
commit eb83c830c87bce345748edef3b50660246143db7
Author: Francisco Jerez <currojerez@riseup.net>
Date:   Thu Oct 21 22:57:08 2010 +0200

    dri2: Add pageflip/exchange support.
    
    Signed-off-by: Francisco Jerez <currojerez@riseup.net>

:040000 040000 4ea816bc7475fd76531101fe7c620b2f50cf2fe9 f7f25d83a3ee0421392585fc8e5725cd1f466fda M      src
Comment 13 maximlevitsky 2011-05-20 20:00:42 UTC
Yes, it indeed pageflip support.
Putting 'return FALSE' in can_exchange fixes the problem, last time I must have
forgot to 'make install' or something.

However, 'Option "PageFlip" "false"' doesn't help, because even if set, the
condition in can_exchange lets flipping in some cases
(if nouveau_exa_pixmap_is_onscreen() == FALSE).
Not sure why that check is there.

Also note that if game doesn't run full-screen, I can still see rare flickering
with pageflip disabled (with that return FALSE).
Probably as was suggested before, it just exposes the problem that was there
before.

Also alien-arena works just fine.
Although I likely won't play it, despite nice graphics, and theme, I hate the controls (you can't move left-right) and health there actually decreases and not increases in idle like in nexuiz.
Comment 14 maximlevitsky 2011-05-21 06:18:44 UTC
And to add to this, I now installed old 2.6.35 kernel from ubuntu repostries,and without 'return = FALSE' hack I still see that problem.
Sure as hell, that kernel doesn't have page flipping code.

Also I think I know why I initially I got the negative results.
I used KDE's logout feature, but I now suspect that it doesn't restart X server as opposed to GDM. Now for testing I always kill it manually.
Comment 15 maximlevitsky 2011-05-21 15:19:42 UTC
Also, isn't page-flipping supposed to be done between front/back buffer of any DRI drawable, e.g. the one app uses for direct rendering?

However I see that according to DDX and kernel code, current page-flipping code only supports flipping between displays front and back buffer, and yet DDX uses it for all page flipping. Isn't that just wrong?

When compiz is runnung apps render to texture and thus DDX should at least for now failback to blitting for such DRI drawables, no? (although sure it is possible to implement page-flipping for individual non screen sized DRI buffers)
Comment 16 Christoph Bumiller 2011-05-22 02:04:11 UTC
Actual page flipping applies only to full screen applications. The page flip is done by the display engine, you flip what is being displayed on screen, the whole scanout buffer, look it up (multi buffering).

You cannot "flip" single drawables (individual pixels of a buffer) to (individual pixels of) the front buffer, that doesn't work (with existing hardware).
Comment 17 maximlevitsky 2011-05-22 02:49:18 UTC
I understand that, but nouveau_dri2_finish_swap in DDX is called for both window DRI buffers and scanout buffers.

It seem that DRI2CanFlip filters indeed only full screen buffers, but I don't understand why it returns TRUE for all pixmaps.
Aren't pixmaps not on the screen at all?
Or it never called with pixmaps but only with windows.

As far as I understand it, from very old memories of tinkering with X windows systems, pixmaps and windows are supposed to be interchangeable in X draw calls, just windows can be displayed, while pixmaps are strictly of-screen and can only be blitted to some window
Comment 18 maximlevitsky 2011-05-22 03:19:17 UTC
Also note that user with nick 'curro' noted that https://bugs.freedesktop.org/show_bug.cgi?id=35452 fixes that issue partially, and indeed even though flickering is there, at least upon full-screen game exit, compiz continues to work fine and not gain the same flickering issue.

Also I found out why alien-arena was such a good reproducer of this bug.
It now looks that compiz due to a bug or a race doesn't un-redirect games that switch resolution upon start.
(First I noticed that if game uses non-native resolution, it always flickers, then I switched the screen via xrandr to target resolution _before_ starting the game, and flicker disappeared)
And alien-arena weren't using native resolution.
Pretty much for me that issue is fixed now as I don't really play non-fullscreen games (FPS loss and pointless anyway).
And nether I want to play while rendering is done via compiz (because of non-native resolution bug)
Nouveau i quite fast these days, fast enough to run all games I play occasionally at native resolution (1280x800)
Comment 19 maximlevitsky 2011-05-22 04:35:21 UTC
So, now I get it. The problem is that nouveau DDX would just swap between off-screen windows without any hardware assistance, but these of-screen windows could be used by compiz (and even bound to textures using GLX_EXT_texture_from_pixmap) and thus is just yanks them under compiz nose.

May I suggest then to remove for now the '!nouveau_exa_pixmap_is_onscreen(dst_pix)' from can_excange ?
This fixes the issue for me and keeps full-screen page flipping on
Comment 20 Bryan Cain 2011-05-22 07:43:40 UTC
(In reply to comment #19)
> So, now I get it. The problem is that nouveau DDX would just swap between
> off-screen windows without any hardware assistance, but these of-screen windows
> could be used by compiz (and even bound to textures using
> GLX_EXT_texture_from_pixmap) and thus is just yanks them under compiz nose.
> 
> May I suggest then to remove for now the
> '!nouveau_exa_pixmap_is_onscreen(dst_pix)' from can_excange ?
> This fixes the issue for me and keeps full-screen page flipping on

This fixes it for me, too!  Thanks so much for tracking the issue down!  Could you send it as a patch to the Nouveau mailing list?
Comment 21 maximlevitsky 2011-05-22 08:51:04 UTC
To be honest, I didn't track that down, but Christoph Bumiller and user with nick 'curro' on #nouveau I just followed their suggestions.

And my last suggestion is more a workaround that a fix.
If nouveau developers think that this acceptable for now, I sure don't mind sending a patch.
Comment 22 maximlevitsky 2011-05-30 17:38:41 UTC
*** Bug 37769 has been marked as a duplicate of this bug. ***
Comment 23 maximlevitsky 2011-08-13 17:34:49 UTC
Any update? can we at least for now use my workaround till this is fixed in the xserver?
Comment 24 Mario Kleiner 2011-08-22 17:54:18 UTC
Created attachment 50467 [details] [review]
DDX: Implement pageflip completion event handling.
Comment 25 Mario Kleiner 2011-08-22 17:55:33 UTC
Created attachment 50468 [details] [review]
DDX: Update front buffer pixmap and names before exchanging buffers
Comment 26 Mario Kleiner 2011-08-22 17:56:20 UTC
Created attachment 50469 [details] [review]
DDX: Fixes to swap scheduling, especially for copy-swaps.
Comment 27 Mario Kleiner 2011-08-22 18:08:55 UTC
Hi, can you try the attached series of three patches?

They implement handling of pageflip completion events from the kernel. So far pageflip events from the kernel were ignored by the nouveau ddx. They also fix some serious screen corruption when switching between redirected and unredirected fullscreen windows under a compositor, and fix a few corner cases in dri2 swap scheduling, especially for copy-swaps for windows.

These are direct translations to nouveau ddx of the corresponding (well tested) implementations and fixes for the intel and ati ddx.

The series is so far only tested with Linux 3.0 on a single display configuration, but should work with earlier kernels as well. Should work with dual-display setups (fullscreen window spanning both displays, clone mode, or zaphod head with separate x-screens), but i probably won't have a chance to test dual-display before next weekend, so there's some chance of bugs there.

These patches fix all bugs i encountered so far with wrong oml_sync_control timestamps from bufferswaps, flicker and other synchronization issues, e.g., glxgears running with 1800 fps although vsync is on. Hopefully they also help to resolve this bug.

thanks,
-mario
Comment 28 maximlevitsky 2011-08-22 20:48:21 UTC
I tried that patch series. Sadly it doesn't help with swapping of off-screen EXA pixmaps that cause flickering in non fullscreen games running in compiz
Comment 29 Michel Dänzer 2011-08-23 01:09:04 UTC
I think there's a fundamental problem with simply exchanging the buffers for a redirected window's backing pixmap: There's no synchronization between the app and the compositing manager (CM), so by the time the CM composites the window contents after a flip, the app might have already flipped again and started rendering to the buffer the CM is using for compositing. I think this is the cause of the flickering.

It might be possible to fix this (at least as long as there aren't several clients trying to get at the window contents via the Composite extension...) with triple buffering, but I wonder if it's really worth the complexity. (E.g. exchanging buffers isn't possible anyway with the majority of window managers, which reparent client windows)
Comment 30 maximlevitsky 2011-08-23 02:40:58 UTC
Created attachment 50479 [details] [review]
disable swaps of of-screen pixmaps

I sure do agree with you.
May we should agree on disable that swap then after all as I proposed?

That surly fixes that problem for me.
Comment 31 Francisco Jerez 2011-08-23 04:25:20 UTC
(In reply to comment #29)
> I think there's a fundamental problem with simply exchanging the buffers for a
> redirected window's backing pixmap: There's no synchronization between the app
> and the compositing manager (CM), so by the time the CM composites the window
> contents after a flip, the app might have already flipped again and started
> rendering to the buffer the CM is using for compositing. I think this is the
> cause of the flickering.
> 
Exactly.

> It might be possible to fix this (at least as long as there aren't several
> clients trying to get at the window contents via the Composite extension...)
> with triple buffering, but I wonder if it's really worth the complexity. (E.g.
> exchanging buffers isn't possible anyway with the majority of window managers,
> which reparent client windows)

I was considering another solution, but I've been too busy in the last couple of months to put it into practice. Basically, the X server could make sure that at any given time all clients agree on the role of any buffers that are being shared by several clients, by blocking their GetBuffers requests at the right time, IOW no two clients would ever see the same buffer in two different slots at the same moment. AFAICT this can be made to work for an arbitrary number of clients getting at the same window at the same time.

Maxim, I've pushed your patch as a temporary solution until I (or somebody else with more time) get around to fixing this synchronization problem in the X server.
Comment 32 Francisco Jerez 2011-08-23 04:32:53 UTC
(In reply to comment #27)
>[...]
> The series is so far only tested with Linux 3.0 on a single display
> configuration, but should work with earlier kernels as well. Should work with
> dual-display setups (fullscreen window spanning both displays, clone mode, or
> zaphod head with separate x-screens), but i probably won't have a chance to
> test dual-display before next weekend, so there's some chance of bugs there.
> 

When you consider them ready to go in, can you please send them to the mailing list? They're quite difficult to review as attachments in a bug report.

>[...]


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.