When using Compiz with Nouveau on an NVA5 (GeForce GT 330M), there is an effect which can best be described as "flickering". It does not happen with Metacity, and I'm not sure whether it happens with compositing window managers other than Compiz. In the main menus of both SuperTux 0.3.3 and SuperTuxKart, it appears as flickering menu and sprite textures. In Alien Arena, when the game is paused, it looks like parts of the 3D scene are "bleeding through" the pause menu in some frames. I've done a lot of investigation into the causes and symptoms of this issue in the past few days, so please bear with me as I explain what I've found. My initial theory was that things were being drawn in the wrong order, causing, for example, the SuperTux background to be drawn over the menu. I later found this not to be the case, since inserting a manual glFlush() after drawing each sprite in SuperTux 0.3.3 made no difference. This theory also couldn't explain why the textures at the bottom of the screen flicker much more often in the SuperTux 0.3.3 credits, or why the main menu of SuperTux 0.1.3 (with OpenGL enabled) is unaffected. I then remembered that all of the affected games were rendering on top of a previous buffer, and realized that the issue is that the window manager is displaying buffers before the rendering into them is finished. After some investigation, I've confirmed that the renderer is being properly flushed by glXSwapBuffers, and that calling glFinish() right before swapping the buffers makes no difference. So my current theory is that one of the following is happening: * SwapBuffers, using page flipping, is swapping to the wrong buffer - the one that is about to be rendered into (sounds unlikely, and I'm unsure of whether it's possible with DRI2 for such an issue to affect only one driver) * glFlush and glFinish (nv50_flush/nouveau_fence_wait) are not doing their job properly (more likely) I believe these adequately explain the following symptoms I'm experiencing: * flickering textures in SuperTux 0.3.3 (the background is rendered before the sprites, menu, etc.) * why the SuperTux 0.1.3 main menu is unaffected in OpenGL mode (it has a very slow framerate) * why the issue went away in SuperTux 0.3.3 when I introduced a manual delay of 30 milliseconds after each buffer swap * why parts of the 3D scene in Alien Arena sometimes "bleed through" the pause screen * why the sprites at the bottom of the screen flicker more often in the SuperTux credits (they are rendered from top to bottom) * why glFlush and glFinish do not make a difference
Kwin4 is affected as well. In fact here supertuxkart doesn't show this problem in compiz , but in kwin4 it flickers badly.
*** Bug 35877 has been marked as a duplicate of this bug. ***
In fact the exact same issue happens in kwin as well. In addition to that often the game itself plays just fine, but once you exit it, the composition manager starts to flicker badly in pretty much same way
How did you narrow the issue down to Mesa?
Well thats happen with compiz, so its 3D related. So as a good citizen, I set the component as nouveau wiki suggests for 3D bugs. This used not to happen in distinct past, so a bisect could be possible (execpt that is sucky to search libdrm commit tree for the commit that will work with specific mesa commit)
From my understanding of the code in the ddx and the way dri2 swap scheduling/completion is supposed to work, it could be that you're hitting a limitation of the current nouveau ddx when pageflipping is enabled for bufferswaps: <http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/tree/src/nouveau_dri2.c#n272> -> For page-flipped bufferswaps, which are on by default, the current ddx doesn't wait for swap completion before it notifies x / mesa of it. Instead it over-optimistically assumes that the swap completes at the very moment it is scheduled, which is almost never the case. -> Mesa would start to render to what it thinks is the post-swap backbuffer, but in reality it is the current pre-swap frontbuffer. Unless they have some other unorthodox synchronization mechanism in place to prevent that. You'd see many of the symptoms described here if i'm correct. And a pause of 30 msecs after each bufferswap as described would fix the issue in many cases. Iow, the ddx implementation isn't yet ready to do this properly for page-flipped fullscreen swaps, as they happen with video games and quite often with desktop compositor. You could try if adding the option "Pageflip" "off" in xorg.conf fixes the problems for now, until proper implementation is there. As far as i can see, the current ddx code has a few more issues there -- it does do stuff differently in some cases than how it is meant to be done for the dri2 swapbuffers and timestamping implementation. -mario
Nope, thats not pageflipping fault. I suspected that long ago and tested. To be sure I tested that again: maxim@maxim-laptop:~$ cat /etc/X11/xorg.conf | grep Flip Option "PageFlip" "false" maxim@maxim-laptop:~$ cat /usr/local/var/log/Xorg.0.log | grep flip [ 2674.564] (**) NOUVEAU(0): Page flipping disabled Still, exactly same problem.
This could still be an issue somewhere in the DDX driver, even if page flipping is not the problem.
To test that further I added 'return FALSE;' in nouveau_dri2.c:can_exchange() Seeing that this doesn't help I added: usleep(300 * 1000); just before: DRI2SwapComplete(s->client, draw, frame, tv_sec, tv_usec, type, s->func, s->data); int nouveau_dri2.c in DDX And despite that huge delay (I tried first 30 * 1000 as suggested), nether rendering FPS dropped nor problem fixed.
I suppose that's because the DDX probably isn't the one copying out the data too early, but compiz (using mesa). Now, compiz' operation should be synchronized with the other GL apps by the kernel the same way that the DDX is synchronized with them, provided that all apps shadering a nouveau_bo emit their validation relocs properly, which they seem to do (if the writing one didn't, you'd always have flickering even without compiz). That leaves me with no more ideas (where it's the hw driver's fault) for the moment.
I noticed something funny that might nail down that bug. If I rotate the screen, the bug disappears. I can even rotate the screen while game running and control the flicker this way (although sometimes flicker doesn't disappear - but compiz restart sure reappears it, and on the other hand flicker always appears if I start the game in rotated mode and rotate to normal). I tried left and upside-down rotations.
Well, I need some lessons in common sense. I need a bit more of it... eb83c830c87bce345748edef3b50660246143db7 is the first bad commit commit eb83c830c87bce345748edef3b50660246143db7 Author: Francisco Jerez <currojerez@riseup.net> Date: Thu Oct 21 22:57:08 2010 +0200 dri2: Add pageflip/exchange support. Signed-off-by: Francisco Jerez <currojerez@riseup.net> :040000 040000 4ea816bc7475fd76531101fe7c620b2f50cf2fe9 f7f25d83a3ee0421392585fc8e5725cd1f466fda M src
Yes, it indeed pageflip support. Putting 'return FALSE' in can_exchange fixes the problem, last time I must have forgot to 'make install' or something. However, 'Option "PageFlip" "false"' doesn't help, because even if set, the condition in can_exchange lets flipping in some cases (if nouveau_exa_pixmap_is_onscreen() == FALSE). Not sure why that check is there. Also note that if game doesn't run full-screen, I can still see rare flickering with pageflip disabled (with that return FALSE). Probably as was suggested before, it just exposes the problem that was there before. Also alien-arena works just fine. Although I likely won't play it, despite nice graphics, and theme, I hate the controls (you can't move left-right) and health there actually decreases and not increases in idle like in nexuiz.
And to add to this, I now installed old 2.6.35 kernel from ubuntu repostries,and without 'return = FALSE' hack I still see that problem. Sure as hell, that kernel doesn't have page flipping code. Also I think I know why I initially I got the negative results. I used KDE's logout feature, but I now suspect that it doesn't restart X server as opposed to GDM. Now for testing I always kill it manually.
Also, isn't page-flipping supposed to be done between front/back buffer of any DRI drawable, e.g. the one app uses for direct rendering? However I see that according to DDX and kernel code, current page-flipping code only supports flipping between displays front and back buffer, and yet DDX uses it for all page flipping. Isn't that just wrong? When compiz is runnung apps render to texture and thus DDX should at least for now failback to blitting for such DRI drawables, no? (although sure it is possible to implement page-flipping for individual non screen sized DRI buffers)
Actual page flipping applies only to full screen applications. The page flip is done by the display engine, you flip what is being displayed on screen, the whole scanout buffer, look it up (multi buffering). You cannot "flip" single drawables (individual pixels of a buffer) to (individual pixels of) the front buffer, that doesn't work (with existing hardware).
I understand that, but nouveau_dri2_finish_swap in DDX is called for both window DRI buffers and scanout buffers. It seem that DRI2CanFlip filters indeed only full screen buffers, but I don't understand why it returns TRUE for all pixmaps. Aren't pixmaps not on the screen at all? Or it never called with pixmaps but only with windows. As far as I understand it, from very old memories of tinkering with X windows systems, pixmaps and windows are supposed to be interchangeable in X draw calls, just windows can be displayed, while pixmaps are strictly of-screen and can only be blitted to some window
Also note that user with nick 'curro' noted that https://bugs.freedesktop.org/show_bug.cgi?id=35452 fixes that issue partially, and indeed even though flickering is there, at least upon full-screen game exit, compiz continues to work fine and not gain the same flickering issue. Also I found out why alien-arena was such a good reproducer of this bug. It now looks that compiz due to a bug or a race doesn't un-redirect games that switch resolution upon start. (First I noticed that if game uses non-native resolution, it always flickers, then I switched the screen via xrandr to target resolution _before_ starting the game, and flicker disappeared) And alien-arena weren't using native resolution. Pretty much for me that issue is fixed now as I don't really play non-fullscreen games (FPS loss and pointless anyway). And nether I want to play while rendering is done via compiz (because of non-native resolution bug) Nouveau i quite fast these days, fast enough to run all games I play occasionally at native resolution (1280x800)
So, now I get it. The problem is that nouveau DDX would just swap between off-screen windows without any hardware assistance, but these of-screen windows could be used by compiz (and even bound to textures using GLX_EXT_texture_from_pixmap) and thus is just yanks them under compiz nose. May I suggest then to remove for now the '!nouveau_exa_pixmap_is_onscreen(dst_pix)' from can_excange ? This fixes the issue for me and keeps full-screen page flipping on
(In reply to comment #19) > So, now I get it. The problem is that nouveau DDX would just swap between > off-screen windows without any hardware assistance, but these of-screen windows > could be used by compiz (and even bound to textures using > GLX_EXT_texture_from_pixmap) and thus is just yanks them under compiz nose. > > May I suggest then to remove for now the > '!nouveau_exa_pixmap_is_onscreen(dst_pix)' from can_excange ? > This fixes the issue for me and keeps full-screen page flipping on This fixes it for me, too! Thanks so much for tracking the issue down! Could you send it as a patch to the Nouveau mailing list?
To be honest, I didn't track that down, but Christoph Bumiller and user with nick 'curro' on #nouveau I just followed their suggestions. And my last suggestion is more a workaround that a fix. If nouveau developers think that this acceptable for now, I sure don't mind sending a patch.
*** Bug 37769 has been marked as a duplicate of this bug. ***
Any update? can we at least for now use my workaround till this is fixed in the xserver?
Created attachment 50467 [details] [review] DDX: Implement pageflip completion event handling.
Created attachment 50468 [details] [review] DDX: Update front buffer pixmap and names before exchanging buffers
Created attachment 50469 [details] [review] DDX: Fixes to swap scheduling, especially for copy-swaps.
Hi, can you try the attached series of three patches? They implement handling of pageflip completion events from the kernel. So far pageflip events from the kernel were ignored by the nouveau ddx. They also fix some serious screen corruption when switching between redirected and unredirected fullscreen windows under a compositor, and fix a few corner cases in dri2 swap scheduling, especially for copy-swaps for windows. These are direct translations to nouveau ddx of the corresponding (well tested) implementations and fixes for the intel and ati ddx. The series is so far only tested with Linux 3.0 on a single display configuration, but should work with earlier kernels as well. Should work with dual-display setups (fullscreen window spanning both displays, clone mode, or zaphod head with separate x-screens), but i probably won't have a chance to test dual-display before next weekend, so there's some chance of bugs there. These patches fix all bugs i encountered so far with wrong oml_sync_control timestamps from bufferswaps, flicker and other synchronization issues, e.g., glxgears running with 1800 fps although vsync is on. Hopefully they also help to resolve this bug. thanks, -mario
I tried that patch series. Sadly it doesn't help with swapping of off-screen EXA pixmaps that cause flickering in non fullscreen games running in compiz
I think there's a fundamental problem with simply exchanging the buffers for a redirected window's backing pixmap: There's no synchronization between the app and the compositing manager (CM), so by the time the CM composites the window contents after a flip, the app might have already flipped again and started rendering to the buffer the CM is using for compositing. I think this is the cause of the flickering. It might be possible to fix this (at least as long as there aren't several clients trying to get at the window contents via the Composite extension...) with triple buffering, but I wonder if it's really worth the complexity. (E.g. exchanging buffers isn't possible anyway with the majority of window managers, which reparent client windows)
Created attachment 50479 [details] [review] disable swaps of of-screen pixmaps I sure do agree with you. May we should agree on disable that swap then after all as I proposed? That surly fixes that problem for me.
(In reply to comment #29) > I think there's a fundamental problem with simply exchanging the buffers for a > redirected window's backing pixmap: There's no synchronization between the app > and the compositing manager (CM), so by the time the CM composites the window > contents after a flip, the app might have already flipped again and started > rendering to the buffer the CM is using for compositing. I think this is the > cause of the flickering. > Exactly. > It might be possible to fix this (at least as long as there aren't several > clients trying to get at the window contents via the Composite extension...) > with triple buffering, but I wonder if it's really worth the complexity. (E.g. > exchanging buffers isn't possible anyway with the majority of window managers, > which reparent client windows) I was considering another solution, but I've been too busy in the last couple of months to put it into practice. Basically, the X server could make sure that at any given time all clients agree on the role of any buffers that are being shared by several clients, by blocking their GetBuffers requests at the right time, IOW no two clients would ever see the same buffer in two different slots at the same moment. AFAICT this can be made to work for an arbitrary number of clients getting at the same window at the same time. Maxim, I've pushed your patch as a temporary solution until I (or somebody else with more time) get around to fixing this synchronization problem in the X server.
(In reply to comment #27) >[...] > The series is so far only tested with Linux 3.0 on a single display > configuration, but should work with earlier kernels as well. Should work with > dual-display setups (fullscreen window spanning both displays, clone mode, or > zaphod head with separate x-screens), but i probably won't have a chance to > test dual-display before next weekend, so there's some chance of bugs there. > When you consider them ready to go in, can you please send them to the mailing list? They're quite difficult to review as attachments in a bug report. >[...]
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.