Bug 66990

Summary: [sna snb/ivb] corruption with chromium
Product: xorg Reporter: Vedran Rodic <vrodic>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: anarsoul, bay, czajernia, devtty5, joe, mail, steven, xlinuxro
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
SNA-issue-4
none
SNA-issue-4
none
SNA-issue-3
none
sna-issue-3-new
none
bug-chrome.png
none
Xorg.log
none
Xorg log file
none
Screenshot showing problem on my Lenovo T430s
none
xorg log where graphics corruption was observed (SNA)
none
xorg log after switching back to UXA none

Description Vedran Rodic 2013-07-17 08:48:21 UTC
Created attachment 82530 [details]
SNA-issue-4

I'm seeing various screen corruption issues with latest Intel SNA DDX on my Ivy Bridge. On the kernel side, I have 3.10.1, and on X server side I have Ubuntu ppa of xserver-xorg-core 1.13.4~git20130508. 

I'm not sure when exactly it started (could be months ago). Switching to UXA makes the issue in bug-intel-ddx-4.png go away, and probably in bug-intel-ddx-3.png, but that one is a bit harder to reproduce. 

I'm using mostly GTK2 clients on LXDE environment, without compositing.
Comment 1 Vedran Rodic 2013-07-17 08:49:21 UTC
Created attachment 82531 [details]
SNA-issue-4
Comment 2 Vedran Rodic 2013-07-17 08:49:47 UTC
Created attachment 82532 [details]
SNA-issue-3
Comment 3 Vedran Rodic 2013-07-17 08:53:00 UTC
SNA-Issue-4 shows a problem on the bottom left (the status bar of PHPstorm is broken by text above)

SNA-Issue-3 show text corruption on the bottom-centre part of the screen, when entering text in text box.
Comment 4 Chris Wilson 2013-07-17 09:02:43 UTC
(In reply to comment #2)
> Created attachment 82532 [details]
> SNA-issue-3

This looks like the kernel bug fixed by

commit daa13e1ca587bc773c1aae415ed1af6554117bd4
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 28 16:54:08 2013 +0100

    drm/i915: Only clear write-domains after a successful wait-seqno

Can you describe the first issue more clearly, is it only with that application?
Comment 5 Vedran Rodic 2013-07-17 09:27:01 UTC
The first issue is the only one I can reproduce easily, I'm not sure if it is PHPStorm specific, but it might be.

Basically it's seen only when I scroll that treeview of the application. Initial rendering of the status bar is fine.
Comment 6 Chris Wilson 2013-07-17 09:33:53 UTC
Can you check which commit you have of the DDX, there was a various recent bug with scrolling, fixed by

commit 34c9b759fbab8d548108e954d55de38c6f5bec31
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 16 19:39:37 2013 +0100

    sna: Note that borderClip region may be more than a singular box
Comment 7 Chris Wilson 2013-07-17 09:56:01 UTC
Hmm, I think I just fixed a further bug from


commit 34c9b759fbab8d548108e954d55de38c6f5bec31
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 16 19:39:37 2013 +0100

    sna: Note that borderClip region may be more than a singular box

with:

commit a764a6e69b23f644957cf3e4e98868464f458758
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 17 10:51:56 2013 +0100

    sna: Fix typo in computing box intersection

Do you mind attaching your Xorg.0.log so that I can check which version you are running?
Comment 8 Vedran Rodic 2013-07-17 10:25:50 UTC
I tested with a version compiled at 8:50 CET, so it had 34c9b759fbab8d548108e954d55de38c6f5bec31

But right now I tested with current a764a6e69b23f644957cf3e4e98868464f458758, and the problem in PHPStorm is gone.
Comment 9 Vedran Rodic 2013-07-20 06:41:37 UTC
It looks like the first issue in this bug, is still present.

I have a kernel version a0ab62339af858b63eba0205a583a5a503536da6 (got it from drm-intel-nightly ubuntu mainline builds), and I still saw a very similar problem as with previous sna-issue-3 screenshot. 

My DDX is e386ba86ea487a2db62d80a0e60f176e052d6406, do I don't have the latest single commit since that. I'll attach the image of the new issue, it looks slightly different, but that probably is a just in the random garbage differences.
Comment 10 Vedran Rodic 2013-07-20 06:42:19 UTC
Created attachment 82715 [details]
sna-issue-3-new
Comment 11 Chris Wilson 2013-07-20 21:42:12 UTC
Could be an unwanted side-effect of 

commit 6921abd81017c9ed7f3b2413784068fbc609a0ea
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jul 18 16:21:27 2013 +0100

    sna: Add a fast path for the most common fallback for CPU-CPU blits

in which case please test with current master. Thanks.
Comment 12 Vedran Rodic 2013-07-21 05:59:13 UTC
Retested.

Sorry, still an issue.
Comment 13 Chris Wilson 2013-07-21 14:07:27 UTC
* scratches head

Not sure then. Please can you attach Xorg.0.log to confirm the configuration details, and if you can identify any pattern (i.e. reproduction steps) leading up to the corruption that will be very useful. Thanks.
Comment 14 Vedran Rodic 2013-07-22 15:20:40 UTC
This is what I use to reproduce a problem from a fresh reboot (drm-intel-nightly 68c6cd3f1312965698b2af5bb08e15807ce9ae2d, DDX 7b1a5024df96070bab70744ad7e7d78a41fb0f72 - current):

- Open Google Chrome (Version 28.0.1500.71)
- Go to http://support.humblebundle.com/customer/portal/articles/754604-torchlight-changelog
- Try selecting text in last three bullet points
- If the right edge of the white bounding box that surronds the main content in the page is obscured by making the window narrower, the problem when selecting text goes away
- Attached image for reference (bug-chrome.png)
- Attached Xorg.log
Comment 15 Vedran Rodic 2013-07-22 15:21:01 UTC
Created attachment 82823 [details]
bug-chrome.png
Comment 16 Vedran Rodic 2013-07-22 15:21:26 UTC
Created attachment 82824 [details]
Xorg.log
Comment 17 Chris Wilson 2013-07-28 09:12:24 UTC
Ben confirmed seeing something similar also after updating his kernels to 3.10.3, and only on ivb (not ilk, but then again not exhaustively tested). I've switched to browsing with chromium (rather than just light testing) and have also seen the occasional glitch. I have not yet found a pattern, so it remains elusively unreproducible.
Comment 18 Chris Wilson 2013-07-29 11:17:37 UTC
Spotted something that looks like it would be hit by Chromium from time to time:

commit c9d89499806779cd6c62d5d6d34df76279cc5abd
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jul 29 11:51:39 2013 +0100

    sna: Composite region is already in dst drawable space
    
    So do not offset it again when processing the fallback composite
    operation.
    
    Regression from commit 6921abd81017c9ed7f3b2413784068fbc609a0ea
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Thu Jul 18 16:21:27 2013 +0100
    
        sna: Add a fast path for the most common fallback for CPU-CPU blits
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=66990
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 19 Chris Wilson 2013-07-30 08:41:38 UTC
I'm hoping that we can still find a pattern behind the corruption, otherwise it will remain nigh on impossible to test. :|
Comment 20 Vedran Rodic 2013-07-30 20:20:00 UTC
I'm away on a vacation. I'll be able to test in a week with the scenario above that was reproducible every time on my machine.
Comment 21 Joe Peterson 2013-08-02 20:49:31 UTC
I'm not sure this helps, but I've been seeing what looks like exactly this bug on my Lenovo T430s as well.  I first saw it when entering text: the garbled long, horizontal rectangle changed as I typed (my typing was a part of it and/or near it).  Like the reporter of this bug, the rectangles often appear as if the width does not match the contents, causing offsetting/slanting of the pixel lines (perhaps just one effect).

It's just been the past few weeks or less (approx.) that I remember noticing it.  I am on xf86-video-intel 2.21.13-1 (Arch Linux), and kernel 3.10.2, but I do believe I saw the issue with kernel 3.9.9 as well (and I know I saw it with xf86-video-intel 2.21.12-1, also).

It happens at seemingly random times, and often I just see a random rectangle (mostly in Chromium) with the garbled contents, but it goes away quickly when something changes.  The rectangles sometimes look different than described above, but I suspect it is the same cause.  I'll try to get a screen capture next time I see it.

I'll attached my Xorg.0.log file, in case that is of help.
Comment 22 Joe Peterson 2013-08-02 20:50:27 UTC
Created attachment 83554 [details]
Xorg log file

Here's my Xorg log file, in case it helps.
Comment 23 Joe Peterson 2013-08-05 17:35:05 UTC
Created attachment 83675 [details]
Screenshot showing problem on my Lenovo T430s

OK, I captured a screenshot, finally, of the bug as I've often seen it.  I think it looks strikingly similar to the reporter's visual effect.  Hope this is of help!  Note that this is from my Lenovo T430s (not the older Lenovo that I was using when I reported the unrelated bug 55500 a while back).
Comment 24 Vedran Rodic 2013-08-08 06:13:14 UTC
Retested with latest DDX (c01c66b), drm-intel-nightly kernel 3224cf6c3ee5ab9c280052c9fbed4f660310c411

Still able to reproduce with the instructions above.
Comment 25 Chris Wilson 2013-08-08 08:22:17 UTC
If you are keen, you can try:

the userptr branch from http://cgit.freedesktop.org/~ickle/linux-2.6

and compiling the ddx with ./configure --enable-userptr <usual configure options>

The difference will be subtle, only a path where we need to operate on a busy target will use the userptr directly. At the moment, we will allocate a staging buffer to perform the copy. My feeling is that we are missing some barrier around that staging buffer and the GPU reads garbage instead of the updated content from chromium. So if switching to userptr does fix the corruption, I think that points towards the staging buffer.
Comment 26 Chris Wilson 2013-08-08 10:41:48 UTC
*** Bug 67894 has been marked as a duplicate of this bug. ***
Comment 27 Mike C 2013-08-11 14:54:40 UTC
I am seeing very similar rendering corruption (in the chrome browser) to those reported by Vedran Rodic. My system is archlinux, with the following key packages

linux 3.10.5-1
xf86-video-intel 2.21.14-2

By reverting to UXA the rendering problems don't seem to appear so this seems to be due to SNA in the current version of xf86-video-intel.
Comment 28 Mike C 2013-08-11 14:59:15 UTC
Created attachment 83939 [details]
xorg log where graphics corruption was observed (SNA)
Comment 29 Mike C 2013-08-11 16:37:13 UTC
Created attachment 83945 [details]
xorg log after switching back to UXA
Comment 30 Mike C 2013-08-11 16:39:36 UTC
(In reply to comment #27)
> I am seeing very similar rendering corruption (in the chrome browser) to
> those reported by Vedran Rodic. My system is archlinux, with the following
> key packages
> 
> linux 3.10.5-1
> xf86-video-intel 2.21.14-2
> 
> By reverting to UXA the rendering problems don't seem to appear so this
> seems to be due to SNA in the current version of xf86-video-intel.

If it is any additional help the system has an I3-3220T processor with HD2500 graphics.

$ lspci  | egrep -i vga
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
Comment 31 Chris Wilson 2013-08-12 09:02:08 UTC
Can you please all test whether:

diff --git a/src/sna/sna_composite.c b/src/sna/sna_composite.c
index 58dd356..6f24eeb 100644
--- a/src/sna/sna_composite.c
+++ b/src/sna/sna_composite.c
@@ -520,7 +520,7 @@ sna_composite_fb(CARD8 op,
        if (mask)
                validate_source(mask);
 
-       if (mask == NULL &&
+       if (mask == NULL && 0 &&
            src->pDrawable &&
            dst->pDrawable->bitsPerPixel >= 8 &&
            src->filter != PictFilterConvolution &&

stops the corruption?
Comment 32 Chris Wilson 2013-08-12 09:36:43 UTC
Another step in the saga,

commit e8dfc5b3f4ffeec93e52a5319b5a3118edf0e94e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Aug 12 10:33:41 2013 +0100

    sna: Fix destination offset along memcpy composite fallback fastback
    
    The application of dst_x|y was incorrect, and so the drawing could end
    up in the wrong location for a window.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=66990
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


Pretty sure this is it! Haven't been able to reproduce my irregular chromium corruption since, but then it was fairly irregular...
Comment 33 Vedran Rodic 2013-08-12 11:11:06 UTC
No, that's not it, sorry. I can still reproduce it with "SNA compiled from 2.21.14-27-g9645e71".

I'll retest with latest drm-nightly-intel
Comment 34 Chris Wilson 2013-08-12 11:53:38 UTC
(In reply to comment #33)
> No, that's not it, sorry. I can still reproduce it with "SNA compiled from
> 2.21.14-27-g9645e71".
> 
> I'll retest with latest drm-nightly-intel

Can you also try with the quick little hack from c31

diff --git a/src/sna/sna_composite.c b/src/sna/sna_composite.c
index 58dd356..6f24eeb 100644
--- a/src/sna/sna_composite.c
+++ b/src/sna/sna_composite.c
@@ -520,7 +520,7 @@ sna_composite_fb(CARD8 op,
        if (mask)
                validate_source(mask);
 
-       if (mask == NULL &&
+       if (mask == NULL && 0 &&
            src->pDrawable &&
            dst->pDrawable->bitsPerPixel >= 8 &&
            src->filter != PictFilterConvolution &&

to sanity check that I am barking up the right tree.
Comment 35 Vedran Rodic 2013-08-12 12:47:19 UTC
Latest git plus this patch applied still has the same corruption.
Comment 36 Chris Wilson 2013-08-12 13:01:27 UTC
Thanks, that suggests you have something I haven't seen yet. Ben, what happens with your test case?
Comment 37 Ben Widawsky 2013-08-12 21:00:33 UTC
No luck for me either.
Comment 38 Chris Wilson 2013-08-26 14:22:37 UTC
Elsewhere it has been reported that strange artefacts occur when ring switching on IVB. It is definitely worth testing with current -intel to see if the corruption has changed.
Comment 39 Vedran Rodic 2013-08-26 15:55:16 UTC
You are right! :) This is fixed on my IVB with current version (intel(0): SNA compiled from 2.21.15-13-ge98cc0b).

Thanks Chris. I'll be optimistic and close this bug now.
Comment 40 Pawel Drewniak 2013-09-07 15:13:11 UTC
I am still experiencing this issue, both on

[   792.429] (II) intel(0): SNA compiled from 2.21.15-13-ge98cc0b

and on 2.99.901.
I get exact same results as in comment #14.

Platform:
Lenovo Thinkpad X1 (Sandy Bridge/Intel(R) HD Graphics 3000)
Kernel 3.10.5
Google Chrome 28.0.1500.95
Comment 41 Chris Wilson 2013-09-08 10:51:51 UTC
*** Bug 68964 has been marked as a duplicate of this bug. ***
Comment 42 Joe Peterson 2013-09-12 15:00:46 UTC
Not to be redundant, but I am also still seeing this issue on Arch Linux (on Lenovo T430s) with xf86-video-intel 2.21.15-1.
Comment 43 Pawel Drewniak 2013-09-15 22:13:30 UTC
Looks like it is fixed as of:

(II) intel(0): SNA compiled from 2.99.901-45-g76790db

Can anyone else confirm this?
Comment 44 bay 2013-09-16 06:00:47 UTC
Confirming.

I've updated video driver from git and I haven't seen any corruptions anymore. But it will take several days to say about this accurately.
Comment 45 bay 2013-09-16 18:54:18 UTC
I've just seen screen corruption in the latest version of intel driver. But now it is really rare, only one time in two days(was 10-20 times).
Comment 46 Pawel Drewniak 2013-09-18 17:47:33 UTC
Just experienced that weird corruption again - but, as bay said, it is only once in a while - very intermittent and I cannot reproduce it at will.
Comment 47 Steven Noonan 2013-09-18 18:25:48 UTC
I am still experiencing corruption as well using a git build (from the day before yesterday) of xf86-video-intel... It's very difficult to reliably reproduce the issue but it's definitely still there.
Comment 48 Chris Wilson 2013-09-30 08:22:42 UTC
I've made quite a few minor fixes, none of which ostensibly look like it should fix this issue, but double checking with the current rc kernel (3.12-rc2) and latest xf86-video-intel.git is a must.

Afterwards, you can try enabling (independently, and/or in combination)

#define DBG_NO_UPLOAD_CACHE
#define DBG_NO_UPLOAD_ACTIVE
#define DBG_NO_MAP_UPLOAD

in src/sna/kgem.c and see if any of those prevent the corruption.
Comment 49 Pawel Drewniak 2013-10-02 12:19:28 UTC
I can reproduce it most of the time on http://www.pastebin.com/ now, when clicking on the search box.

Tested on gentoo-sources-3.10.5 and vanilla 3.12-rc3 with xf86-video-intel c724098

It looks like enabling just

#define DBG_NO_UPLOAD_CACHE

is enough to prevent the corruption on 3.12-rc3 (I enabled all 3 options first, then disabled them one by one to pinpoint the right combination).
Comment 50 Chris Wilson 2013-10-02 14:05:03 UTC
Finally!

commit a048f436a0210d076fc844404bf56b8b7fcb4b7b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Oct 2 14:59:11 2013 +0100

    sna: Only delete unused io buffers
    
    Before deleting the io buffer, we need to check that it is not active.
    Currently we check that it is not pending use in the current batch, but
    we also need to double check that it does not have outstanding use by
    the GPU. Failing to do so could mean overwriting the data prior to it
    being read by the GPU, a very small race but often hit!
    
    Reported-by: Vedran Rodic <vrodic@gmail.com> # and many others
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66990
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 51 Pawel Drewniak 2013-10-02 14:09:26 UTC
Yep, seems to work fine on 3.12.0-rc3 and a048f436. Good work, thanks!
Comment 52 bay 2013-10-07 08:00:38 UTC
Confirming. I haven't seen any corruptions for a week since I applied the patch.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.