Bug 28341 - Flickering screen in Neverball on drm-radeon-testing
Summary: Flickering screen in Neverball on drm-radeon-testing
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/R600 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 29098 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-06-01 02:29 UTC by Magnus Jensen
Modified: 2010-08-02 12:11 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg/current (30.27 KB, text/plain)
2010-06-01 11:37 UTC, Magnus Jensen
Details
dmesg-current (30.31 KB, text/plain)
2010-06-01 15:11 UTC, Magnus Jensen
Details
Revert xf86-video-ati commit 30591320ec46e491ba20904cc64f3405b51c6505 (20.67 KB, patch)
2010-07-14 04:22 UTC, Michel Dänzer
Details | Splinter Review
Add flush,invalidate on swap with msc (1.67 KB, patch)
2010-07-22 10:00 UTC, Jerome Glisse
Details | Splinter Review
Add flush, invalidate on dri2 swap, fixed. (1.96 KB, patch)
2010-07-25 08:15 UTC, Mario Kleiner
Details | Splinter Review
Proof of concept "fix" for R600/R700. Act on dri2InvalidateDrawable(). (4.72 KB, patch)
2010-07-25 08:26 UTC, Mario Kleiner
Details | Splinter Review
Fix dri2 swap (9.38 KB, patch)
2010-07-26 12:26 UTC, Jerome Glisse
Details | Splinter Review
Fix frontbuffer rendering, avoid segfault in singlebuffer demo. (3.06 KB, patch)
2010-07-27 10:50 UTC, Mario Kleiner
Details | Splinter Review
radeon: Add DRI2 flush extension support so we synchronize properly to bufferswaps. (13.06 KB, patch)
2010-08-01 19:55 UTC, Mario Kleiner
Details | Splinter Review

Description Magnus Jensen 2010-06-01 02:29:01 UTC
Mesa and ddx patched with tiling patches from dri-devel mailing list.  drm-radeon-testing kernel. HD3650 (rv635) AGP gpu.

When starting neverball the title screen flickers with black until the start of a level, graphics is then (almost) fine.
Comment 1 Alex Deucher 2010-06-01 08:54:46 UTC
Are there any CS related messages in your dmesg?
Comment 2 Magnus Jensen 2010-06-01 11:37:20 UTC
Created attachment 35995 [details]
dmesg/current

The last output from dmesg after running neverball for a few seconds.
The notice about mesa being outdated is not valid in this case since, it still flickers whether i recompile mesa or not. I have got same error without that message.
Comment 3 Magnus Jensen 2010-06-01 12:25:42 UTC
Sorry. ignore my last attachment. These  errors continue to spawn and i don't think they are related to neverball. i'm going to recompile mesa and ddx, and post another dmesg soon.
Just waiting for a kernel compile to finish.
Comment 4 Magnus Jensen 2010-06-01 15:11:57 UTC
Created attachment 35997 [details]
dmesg-current

no radeon related stuff in dmesg output while running neverball (still flickers a lot) drm+mesa+ddx all latest from git. this do not occur with distro kernel (2.6.33.4)
Comment 5 Andrew Randrianasulu 2010-06-04 17:25:44 UTC
I was under impression  i hit same bug here on rv280 + wine + DeusEx, but after manually applying patches from http://article.gmane.org/gmane.comp.video.dri.devel/46630 i still have flickering intro. This is with git xserver ...  should i wait for kernel-based solution for testing my case or open different bug?
Comment 6 Alex Deucher 2010-06-09 13:45:19 UTC
possibly also related to bug 28383 and bug 28410.
Comment 7 Andy Furniss 2010-06-14 16:56:56 UTC
(In reply to comment #6)
> possibly also related to bug 28383 and bug 28410.

I am currently running with a patch that fixes 28383 but have just noticed that the mesa demo ipers does not render properly (with or without patch) - it's flashing until I reduce the LOD enough that the fps gets capped to refresh rate when it then renders OK.

It renders OK with swrast and if I boot an older kernel that uses the old vsync.
Comment 8 Mario Kleiner 2010-06-15 04:32:49 UTC
Seems that some synchronisation in the radeon kernel drm driver is missing, which wasn't needed for the old synchronous vsync code;

Old glXSwapbuffers code was synchronous: glXSwapBuffers blocked until swap completion.

The new code just schedules a vblank event, then returns control to the client. The client submits further rendering commands into the command stream before the swap has completed, so you have a race-condition between the client submitting new commands for the post-swap backbuffer and the vblank event triggering submission of the "bufferswap blit" command buffer into the cs a couple of milliseconds after the glXSwapbuffers call. Depending on the relative timeing, it can happen that *new* rendering commands, e.g., glClear() get executed on the *old* backbuffer before it has been copied to the frontbuffer. Would result in random flickering or half-rendered frames overdrawn on top of old rendered frame.

The solution would be to add some synchronisation to the kernel driver: If a swapbuffers is pending and a client tries to submit command buffers for that drawable, block it until swap completion. This is what the intel drivers apparently do and what seems to be missing fromt the radeon driver.

Michel Daenzer confirmed my suspicion with some patch (conversation on dri-devel):

"The ideal solution would probably be to make the kernel block in the
command stream (CS) submission ioctl if the CS renders to (and from?) a
buffer object (BO) which is involved in a pending swap.

Meanwhile, the attached hacks for xf86-video-ati and Mesa seem to help
here, YMMV."

He added this patch to mesa inside the Dri2Swapbuffers submission code, after the swap has been scheduled:

+
+    /* Make sure we call to the server before rendering again, in case we need
+     * to block for the swap */
+    dri2InvalidateBuffers(dpyPriv->dpy, pdraw->drawable);

This is what i found (also posted on dri-devel):

"I saw a similar flickering with latest Xorg stack (mesa master, xserver, ddx etc. master) and the 2.6.34 tree from radeon testing with my own toolkit on a R600 gpu. This setup uses the new dri sync & swap bits and changes how glXSwapbuffers works.

This shows flickering, but load and timing dependent...

.... glXXX rendering commands to draw image.
glXSwapBuffers();
glBegin(GL_POINTS);
glVertex2i(10,10);
glEnd();
glFinish();
Take a  swap completion timestamp here.
glClear();
...more rendering commands

-> I use this glVertex2i, .... glFinish() sequence to wait for swap completion and get a timestamp. This works on any os/gpu combo ever tested, but doesn't seem to work reliably anymore with the new radeon sync & swap bits in place. Display flickers, presumably because the glClear() executes almost immediately after the glXSwapBuffers is scheduled, and before the bufferswap has actually taken place -> Clear the backbuffer before swapping.

This however:

.... glXXX rendering commands to draw image.
glXSwapBuffers();
glXWaitForSbcOML(...);
glClear();
...

does work, because the new glXWaitForSbcOML() blocks the client until swap completion, so glClear() can only get submitted to the gpu after the swap completed."
Comment 9 Michel Dänzer 2010-07-09 08:24:20 UTC
Jerome, do you have a plan for fixing this, or should we just stop exposing DRI2 vsync again until there's a solution?
Comment 10 Michel Dänzer 2010-07-14 02:30:55 UTC
Does reverting xf86-video-ati Git commit 30591320ec46e491ba20904cc64f3405b51c6505 ('kms: add support for the MSC swap & sync API') fix this problem?
Comment 11 Andy Furniss 2010-07-14 04:03:15 UTC
(In reply to comment #10)
> Does reverting xf86-video-ati Git commit
> 30591320ec46e491ba20904cc64f3405b51c6505 ('kms: add support for the MSC swap &
> sync API') fix this problem?

It doesn't revert on master for me.
Comment 12 Michel Dänzer 2010-07-14 04:22:20 UTC
Created attachment 37035 [details] [review]
Revert xf86-video-ati commit 30591320ec46e491ba20904cc64f3405b51c6505
Comment 13 Andy Furniss 2010-07-14 05:04:35 UTC
(In reply to comment #12)
> Created an attachment (id=37035) [details]
> Revert xf86-video-ati commit 30591320ec46e491ba20904cc64f3405b51c6505

That fixes my two test cases - ipers and sauerbraten.
Comment 14 Andy Furniss 2010-07-16 06:02:09 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > Created an attachment (id=37035) [details] [details]
> > Revert xf86-video-ati commit 30591320ec46e491ba20904cc64f3405b51c6505
> 
> That fixes my two test cases - ipers and sauerbraten.

I see that with today's mesa master another workaround is possible.

vblank_mode=0 as an env (~/.drirc doesn't work for me) when running my test cases will fix the issue without the patch.
Comment 15 Shlomi Steinberg 2010-07-16 08:16:32 UTC
*** Bug 29098 has been marked as a duplicate of this bug. ***
Comment 16 Jerome Glisse 2010-07-22 10:00:00 UTC
Created attachment 37317 [details] [review]
Add flush,invalidate on swap with msc

Please try if attached patch fix the issue for you (apply against mesa no change needed in ddx)
Comment 17 Shlomi Steinberg 2010-07-22 11:09:51 UTC
Patched today's mesa git. Still need vblank_mode=0.
Comment 18 Mario Kleiner 2010-07-22 13:23:07 UTC
Jerome,

as far as i can see, your flush, invalidate patch is in the right direction, but the dri2InvalidateDrawable() call just increments drawable->dri2.stamp and the current radeon dri driver in current mesa isn't checking the drawable->dri2.stamp for changes.

The intel driver has checks like...

if (drawable->lastStamp != drawable->dri2.stamp)
     intel_update_renderbuffers(driContext, drawable);

... in various places. Similar checks and calls to radeon_update_renderbuffers() would probably do the trick, because that would call DRI2GetBuffersWithFormat() etc. which will throttle properly until a swap is completed.
Comment 19 Mario Kleiner 2010-07-25 08:15:20 UTC
Created attachment 37366 [details] [review]
Add flush, invalidate on dri2 swap, fixed.

Fixed version of Jerome's patch. The flush, invalidate extension was added at the wron place, therefore never called by the driver at dri2InvalidateBuffer() time.
Comment 20 Mario Kleiner 2010-07-25 08:26:41 UTC
Created attachment 37368 [details] [review]
Proof of concept "fix" for R600/R700. Act on dri2InvalidateDrawable(). 

This patch together with the previous patch applied to current mesa git master eliminates the flicker problem on my tested apps with a R600 card.

A new function radeon_prepare_render() checks the timestamps that get updated by dri2InvalidateDrawable() to find out if a swap is in progress / buffers are invalidated.
If so, it uses radeon_update_buffers() to get "new" buffers. That function will also throttle
the client if a swap is still in progress. We'd need to add a call to radeon_prepare_render() to
various places in the driver. This is what the intel driver does with intel_prepare_render() to
avoid artifacts.

I've only added a check to r700DrawPrims() to see if it works at all. I don't know at which other
locations such calls would be needed (and i'm a bloody beginner), so this is a pretty sketchy start. 

-mario
Comment 21 Shlomi Steinberg 2010-07-25 13:07:54 UTC
Confirmed. Both patches fix etracer for me. r600, HD 3850.
Comment 22 Jerome Glisse 2010-07-26 12:26:20 UTC
Created attachment 37401 [details] [review]
Fix dri2 swap

Can you confirm that the attach patch fix the issue for you (i have tested it on r200,r300,r600 and it seems to work UMS/KMS while also fixing what i believe is the issue you reported on KMS+DRI2)
Comment 23 Andrew Randrianasulu 2010-07-26 14:30:28 UTC
(In reply to comment #22)
> Created an attachment (id=37401) [details]
> Fix dri2 swap
> 
> Can you confirm that the attach patch fix the issue for you (i have tested it
> on r200,r300,r600 and it seems to work UMS/KMS while also fixing what i believe
> is the issue you reported on KMS+DRI2)

Works for me on RV280 (DRI2 + xserver git, 1.9.0 RC 5, kernel 2.6.35-rc5+). Tested with wine 1.2 and 3DMark2001, DeusEx_demo
Comment 24 Mario Kleiner 2010-07-26 17:31:53 UTC
Jerome,

your Improved patch works on R600 for me as well, thanks for improving it and
teaching me about the proper draw functions for non-r600 gpu's :-). It took me
half a day to initially find r700DrawPrim() as the proper location for R600.

Looking at the code and comparing with the intel driver i think we should also
add a radeon_prepare_render() to radeonReadPixels() in radeon_pixel_read.c, and
the "copy from colorbuffer" path inside do_copy_texsubimage() in
radeon_tex_copy.c ? Reading/Copying from the backbuffer/frontbuffer must also
make sure it operates on the post-swap buffers, therefore wait for swap
completion.
Comment 25 Andy Furniss 2010-07-27 03:26:58 UTC
(In reply to comment #22)
> Created an attachment (id=37401) [details]
> Fix dri2 swap
> 
> Can you confirm that the attach patch fix the issue for you (i have tested it
> on r200,r300,r600 and it seems to work UMS/KMS while also fixing what i believe
> is the issue you reported on KMS+DRI2)

It fixes my test cases (ipers and sauerbraten) testing with rv770.
Comment 26 Andy Furniss 2010-07-27 06:47:43 UTC
(In reply to comment #25)
> (In reply to comment #22)
> > Created an attachment (id=37401) [details] [details]
> > Fix dri2 swap
> > 
> > Can you confirm that the attach patch fix the issue for you (i have tested it
> > on r200,r300,r600 and it seems to work UMS/KMS while also fixing what i believe
> > is the issue you reported on KMS+DRI2)
> 
> It fixes my test cases (ipers and sauerbraten) testing with rv770.

I've found a regression caused by the patch, the mesa demo singlebuffer segfaults.

singlebuffer[11119]: segfault at bf634ff0 ip b77e2b97 sp bf634fd0 error 6 in libGL.so.1.2[b77a7000+4e000]
Comment 27 Mario Kleiner 2010-07-27 10:50:26 UTC
Created attachment 37414 [details] [review]
Fix frontbuffer rendering, avoid segfault in singlebuffer demo.

This patch on top of the other patch works for me with mesa's singlebuffer demo. It translants a little bit more logic from the intel driver to handle frontbuffer flushing and avoid infinite recursion which caused the segfault.

I had to do radeon->front_buffer_dirty = GL_FALSE before calling flushFrontBuffer to avoid infinite recursion, whereas intel does it the other way round. Either i'm missing something (very likely), or intel does something subtile differently (likely), or the intel driver should have the same infinite recursion bug and segfault in singlebuffer demo.

Anybody with an intel gpu to test singlebuffer demo?
Comment 28 Andy Furniss 2010-07-27 11:30:10 UTC
(In reply to comment #27)
> Created an attachment (id=37414) [details]
> Fix frontbuffer rendering, avoid segfault in singlebuffer demo.
> 
> This patch on top of the other patch works for me 

Also fixes for me.
Comment 29 Mario Kleiner 2010-08-01 19:55:22 UTC
Created attachment 37513 [details] [review]
radeon: Add DRI2 flush extension support so we synchronize properly to bufferswaps.

Ok,

i've merged all previous working patches into this patch and retested on current mesa master.

I'll send this one to mesa-devel for inclusion.
Comment 30 Mario Kleiner 2010-08-02 12:11:45 UTC
Patch with fix now in mesa master. Closing as resolved.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.