Bug 37068

Summary: [XVideo] tearing artifacts - 2.15 regression
Product: xorg Reporter: Edward Sheldrake <ejsheldrake>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg log running today's git intel driver
none
/var/log/dmesg none

Description Edward Sheldrake 2011-05-10 05:12:21 UTC
With xf86-video-intel 2.15.0, I see tearing artifacts at the bottom of XVideo windows - usually in bottom right or left corner, there is a horizontal strip a few pixels high (sometimes 2 smaller strips) and about a quarter or less of the width of the video, which is probably a frame behind the rest of the video. Unfortunately the problem is not visible in a screenshot, and disappears when the video is paused.

I saw these with some previous versions, but never filed a bug, and then they got fixed sometime for 2.14.0. But they're back in 2.15.0.

System environment:
-- chipset: GM45
-- system architecture: 64-bit
-- xf86-video-intel: 2.15.0 and today's git
-- xserver: 1.10.1
-- mesa: 7.10.2
-- libdrm: 2.4.25
-- kernel: 2.6.38.5
-- Linux distribution: Fedora 14
-- Machine or mobo model: Toshiba Satellite A300
-- Display connector: LVDS

I did a "git bisect" yesterday and ended up with this:
6f104189bb9439ab0e05f04d4be020813eb04bf9 is the first bad commit
commit 6f104189bb9439ab0e05f04d4be020813eb04bf9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 4 19:21:10 2011 +0100

    Take advantage of the kernel flush for dirty bo in the busy ioctl
    
    Rather than just creating and submitting a batch that simply contains a
    flush in order to periodically ensure that rendering reaches the
    scanout, we can simply ask the kernel whether the scanout is busy. The
    kernel will then submit a flush on our behalf if it is dirty, which
    takes advantage of the kernel's dirty state tracking.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

2.14.902 is the last tarball without the bug.
Comment 1 Edward Sheldrake 2011-05-10 05:13:29 UTC
Created attachment 46542 [details]
Xorg log running today's git intel driver
Comment 2 Edward Sheldrake 2011-05-10 05:14:16 UTC
Created attachment 46543 [details]
/var/log/dmesg
Comment 3 Chris Wilson 2011-05-10 06:11:48 UTC
Great! Thanks for the bisection, it means that I've already fixed it with:

commit 3145530feed879082bcfab11ffc8e7fd0911c920
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat May 7 19:51:04 2011 +0100

    Ensure that the partial batch is flushed upon the blockhandler
    
    Currently, we require that a batch containing a dirty bo be submitted
    before we mark the device as requiring a flush. So if we never submit a
    batch between block handlers, we can end up sleeping without ever
    flushing either the partial batch or the rendering to the scanout.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36776
    Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 4 Edward Sheldrake 2011-05-10 06:23:39 UTC
I'm afraid it's not fixed in current git, the last commit in the git version I'm running is:

commit fd1ebd44fb72e7bdf57d00f8941cd6110a529cac
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue May 10 07:30:58 2011 +0100

    module: Adopt IVB's more detailed naming convention for SNB
Comment 5 Chris Wilson 2011-05-10 06:45:25 UTC
*g* That means that your bisection only pointed to a commit that made it more prevalent and not the actual bug. :-p

Can you do a reverse bisection instead and find the commit that appeared to fix it?
Comment 6 Edward Sheldrake 2011-05-10 09:51:30 UTC
With "good" meaning I can see the problem, starting with 2.13.0 as good and 2.14.0 as bad, I ended up with:

4a186a612376bdd6f86c026e8b8b442108868a0a is the first bad commit
commit 4a186a612376bdd6f86c026e8b8b442108868a0a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 7 16:56:57 2010 +0000

    Always flush the batch before blocking for new X requests
    
    This should prevent any lag when waiting upon user input, for example
    whilst logging in with gdm.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 7 Chris Wilson 2011-05-10 10:00:47 UTC
Hmm. Ok, that really does confirm that the bug is the batch is simply not being flushed... Oh...

diff --git a/src/intel_dri.c b/src/intel_dri.c
index cd72f45..48d0f56 100644
--- a/src/intel_dri.c
+++ b/src/intel_dri.c
@@ -182,6 +182,8 @@ static PixmapPtr fixup_shadow(DrawablePtr drawable, PixmapPt
        /* And redirect the pixmap to the new bo (for 3D). */
        intel_set_pixmap_private(old, priv);
        old->refcnt++;
+
+       intel_get_screen_private(xf86Screens[screen->myNum])->needs_flush = TRUE
        return old;
 }

It's conspiracy, I tell you!
Comment 8 Edward Sheldrake 2011-05-10 10:54:49 UTC
(In reply to comment #7)
> Hmm. Ok, that really does confirm that the bug is the batch is simply not being
> flushed... Oh...
> 
> diff --git a/src/intel_dri.c b/src/intel_dri.c
> index cd72f45..48d0f56 100644
> --- a/src/intel_dri.c
> +++ b/src/intel_dri.c
> @@ -182,6 +182,8 @@ static PixmapPtr fixup_shadow(DrawablePtr drawable,
> PixmapPt
>         /* And redirect the pixmap to the new bo (for 3D). */
>         intel_set_pixmap_private(old, priv);
>         old->refcnt++;
> +
> +       intel_get_screen_private(xf86Screens[screen->myNum])->needs_flush =
> TRUE
>         return old;
>  }
> 
> It's conspiracy, I tell you!

This patch doesn't fix anything.

In case it matters, I'm currently running openbox at the moment - running xcompmgr hides the problem, it's probably not visible under gnome-shell either.
Comment 9 Chris Wilson 2011-05-10 11:10:47 UTC
And for good measure:

diff --git a/src/intel_video.c b/src/intel_video.c
index 499614f..021ca5f 100644
--- a/src/intel_video.c
+++ b/src/intel_video.c
@@ -1599,6 +1599,7 @@ I830PutImageTextured(ScrnInfoPtr scrn,
                                         pixmap);
        }
 
+       intel_get_screen_private(scrn)->needs_flush = TRUE;
        DamageDamageRegion(drawable, clipBoxes);
 
        return Success;
Comment 10 Edward Sheldrake 2011-05-10 11:26:00 UTC
(In reply to comment #9)
> And for good measure:
> 
> diff --git a/src/intel_video.c b/src/intel_video.c
> index 499614f..021ca5f 100644
> --- a/src/intel_video.c
> +++ b/src/intel_video.c
> @@ -1599,6 +1599,7 @@ I830PutImageTextured(ScrnInfoPtr scrn,
>                                          pixmap);
>         }
> 
> +       intel_get_screen_private(scrn)->needs_flush = TRUE;
>         DamageDamageRegion(drawable, clipBoxes);
> 
>         return Success;

Yes, this one fixes it. Thanks!
Comment 11 Chris Wilson 2011-05-10 12:41:45 UTC
Thanks, you very much for your report, persistence and thorough testing!


commit 0b4ca9313cc7eb4845cf7f4e87c869c0c6d6ff0d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue May 10 20:36:10 2011 +0100

    video: Flush the batch on the next blockhandler after queuing
    
    In order to avoid video lag and jerky playback we need to ensure that
    any queued video is flushed before we go to sleep.
    
    Fixes regression from 6f104189bb.
    
    Reported-and-tested-by: Edward Sheldrake <ejsheldrake@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37068
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

commit 895a46e8ff70195c1a4bdccbeb652e330376f64a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue May 10 20:38:25 2011 +0100

    dri: Flush the batch after a DRI swap/copy event
    
    To minimise lag in those every so critical games, we want to ensure that
    the copy happens as soon as it is received, so we need to flush the
    batch after processing a swap event and before we go to sleep.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=37068
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.