Bug 79992

Summary: Transient corruption in some applications (regression)
Product: xorg Reporter: Nick Bowler <nbowler>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Screenshot of corruption in GMPC
none
Xorg log
none
Only discard the CPU bo if we have a GPU bo to transfer the damage to
none
Full debug log (xz compressed)
none
Full debug log of assertion failure (xz compressed) none

Description Nick Bowler 2014-06-13 16:53:41 UTC
Created attachment 100987 [details]
Screenshot of corruption in GMPC

With recent-ish intel driver bits I've been seeing some corruption in some
applications.  Specifically, some parts of some applications appear to
occasionally not redraw correctly (i.e., at all).  The applications I
see this in are GMPC and Pan (both use GTK+-2).

The issue is reproduced by opening an affected application, then either
dragging it off screen and back on, or moving another window over top of it,
and it will end up looking something like the attached screenshot.  Moving
the mouse cursor over affected areas causes it to unbug.

Bisection implicates the following, and reverting it on master (with some minor conflicts) appears to correct the issue:

ad0390068832ad4727371902fe41a85a53de1894 is the first bad commit
commit ad0390068832ad4727371902fe41a85a53de1894
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 24 10:00:03 2013 +0100

    sna: Separate out copy preferrence from operating in place decision
    
    The two decision trees are no longer identical.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

:040000 040000 0d76a212fa72c5246b54208f79cb63b35c9ae6b4 4b63a221756b8891e5cf3317e0c6be545af5785d M	src

Note: for some tests I had to cherry-pick

  524a45da56e2 ("compat-api: Map changes of DamageUnregister API in 1.14.99.2")

for the driver to build against current server, and also

  6914b3af7f26 ("sna: Protect against fake CRTCs during initial probe")

to avoid some crashes (bug 73981).

Obligatory version information:

  Linux 3.14.3
  libdrm 2.4.54
  Xorg server 1.15.99.903 (also happens on 1.15.1)
  Git xf86-video-intel
Comment 1 Nick Bowler 2014-06-13 16:54:58 UTC
Created attachment 100988 [details]
Xorg log
Comment 2 Chris Wilson 2014-06-13 17:44:55 UTC
diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
index 72c16d3..abfa00d 100644
--- a/src/sna/sna_accel.c
+++ b/src/sna/sna_accel.c
@@ -6005,7 +6005,8 @@ static void discard_cpu_damage(struct sna *sna, struct sna_pixmap *priv)
        sna_damage_destroy(&priv->cpu_damage);
        list_del(&priv->flush_list);
 
-       sna_pixmap_free_cpu(sna, priv, priv->cpu);
+       if (sna_pixmap_free_cpu(sna, priv, priv->cpu))
+               sna_damage_all(&priv->gpu_damage, priv->pixmap);
        priv->cpu = false;
 }
 

?
Comment 3 Chris Wilson 2014-06-13 17:59:37 UTC
Created attachment 100995 [details] [review]
Only discard the CPU bo if we have a GPU bo to transfer the damage to

Slightly better patch.
Comment 4 Nick Bowler 2014-06-13 18:54:26 UTC
Created attachment 101000 [details]
Full debug log (xz compressed)

Applied on top of latest master.  The behaviour is not changed in any noticeable way (Tried both patches).

I tried building the driver with --enable-debug=full to get more verbose logs.
Don't know if it's related, but I hit this assertion a couple times (not
consistently):

  X: sna_accel.c:2093: _sna_pixmap_move_to_cpu: Assertion `priv->gpu_damage == ((void *)0)' failed.

In case it helps, I'm attaching the (absurdly big) log from a non-crashing
test run: I started the server (with patch applied), dragged the GMPC window
off screen and back onscreen, saw corruption like in the screenshot, and
closed the server.
Comment 5 Chris Wilson 2014-06-13 19:07:25 UTC
Ah. I see something wrong in that patch indeed, try:

diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
index 13fe6ed..ce04ca0 100644
--- a/src/sna/sna_accel.c
+++ b/src/sna/sna_accel.c
@@ -14382,16 +14382,15 @@ sna_poly_fill_rect(DrawablePtr draw, GCPtr gc, int n, xRectangle *rect)
                        RegionTranslate(&region, dx, dy);
                }
 
-               if (region_subsumes_drawable(&region, &pixmap->drawable)) {
-                       discard_cpu_damage(sna, priv);
-                       hint |= IGNORE_CPU | REPLACES;
-               } else {
-                       if ((flags & 2) == 0)
-                               hint |= IGNORE_CPU;
-                       if (priv->cpu_damage &&
-                           region_subsumes_damage(&region, priv->cpu_damage)) {
+               if ((flags & 2) == 0) {
+                       hint |= IGNORE_CPU;
+                       if (region_subsumes_drawable(&region, &pixmap->drawable)) {
                                discard_cpu_damage(sna, priv);
-                               hint |= IGNORE_CPU;
+                               hint |= REPLACES;
+                       } else {
+                               if (priv->cpu_damage &&
+                                   region_subsumes_damage(&region, priv->cpu_damage))
+                                       discard_cpu_damage(sna, priv);
                        }
                }
                if (priv->cpu_damage == NULL) {
Comment 6 Chris Wilson 2014-06-13 19:29:31 UTC
(In reply to comment #4)
> Don't know if it's related, but I hit this assertion a couple times (not
> consistently):
> 
>   X: sna_accel.c:2093: _sna_pixmap_move_to_cpu: Assertion `priv->gpu_damage
> == ((void *)0)' failed.

That would be useful to have a full debug log for.
Comment 7 Nick Bowler 2014-06-13 20:32:19 UTC
(In reply to comment #5)
> Ah. I see something wrong in that patch indeed, try:
> 
> diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
> index 13fe6ed..ce04ca0 100644
> --- a/src/sna/sna_accel.c
> +++ b/src/sna/sna_accel.c
> @@ -14382,16 +14382,15 @@ sna_poly_fill_rect(DrawablePtr draw, GCPtr gc, int

I removed the previous patch and applied this one.  Seems to fix the rendering problem, thanks!
Comment 8 Nick Bowler 2014-06-13 20:41:47 UTC
Created attachment 101012 [details]
Full debug log of assertion failure (xz compressed)

(In reply to comment #6)
> (In reply to comment #4)
> > Don't know if it's related, but I hit this assertion a couple times (not
> > consistently):
> > 
> >   X: sna_accel.c:2093: _sna_pixmap_move_to_cpu: Assertion `priv->gpu_damage
> > == ((void *)0)' failed.
> 
> That would be useful to have a full debug log for.

Here it is.  The assertion failure still happens with the last patch applied
(and debugging enabled), so I guess it's a separate issue.

Seems a bit random; sometimes it crashes instantly as soon as I start the
server, other times it crashes when I start interacting with things,
sometimes it doesn't seem to want to crash at all.

This log came from a run where it crashed as soon as I tried to interact
with the gmpc window.
Comment 9 Chris Wilson 2014-06-13 20:46:31 UTC
commit 15485602d8451bcf6c2e979ccbd9cdc11111bcce
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 13 20:30:25 2014 +0100

    sna: Use the right is-clipped hint
    
    The region here has yet to be clipped, and so the only valid is-clipped
    hint is from the flags computed from the PolyRect extents. Make sure we
    use those when determining whether it is valid to discard damage.
    
    Fixes regression from
    commit ad0390068832ad4727371902fe41a85a53de1894 [2.99.903]
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Tue Sep 24 10:00:03 2013 +0100
    
        sna: Separate out copy preferrence from operating in place decision
    
    Reported-by: Nick Bowler <nbowler@draconx.ca>
    Tested-by: Nick Bowler <nbowler@draconx.ca>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79992
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 10 Chris Wilson 2014-06-14 06:02:16 UTC
commit b27837d5372facde0f9f69eb8df664d2798f0911
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 14 06:51:34 2014 +0100

    sna: Fix assertions for discarding upload caches
    
    The upload caches are special, along with having a bo->proxy, they also
    claim to be completely damaged on both the GPU and CPU. Allow that to
    pass through when discarding the proxy.
    
    Reported-by: Nick Bowler <nbowler@draconx.ca>
    Bugzilla; https://bugs.freedesktop.org/show_bug.cgi?id=79992
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.