Bug 21442

Summary: [945GM KMS] Spontaneous cursor corruption with UXA enabled
Product: DRI Reporter: Clemens Eisserer <linuxhippy>
Component: DRM/IntelAssignee: Daniel Vetter <daniel>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: ben, chris, daniel, jbarnes
Version: unspecifiedKeywords: NEEDINFO
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
corrupted cursor
none
dmest output
none
xorg log
none
mad window mapper ;)
none
double buffer cursor updates
none
properly clflush phys cursor writes
none
use wmb to flush the wc cache
none
wmb() before calling backend->gtt_flush() none

Description Clemens Eisserer 2009-04-27 13:39:49 UTC
Created attachment 25201 [details]
corrupted cursor

From time to time I see cursor corruptions with UXA+KMS.
This happens spontaneous, and just seems to be a cosmetic problem.

The corruption is present until another cursor is choosen.

My system is:
- Fedora Rawhide
- intel-2.7
- kernel-2.6.29.1
- xorg-1.6.1
- intel-945gm
Comment 1 Clemens Eisserer 2009-04-28 02:06:11 UTC
Created attachment 25220 [details]
dmest output
Comment 2 Clemens Eisserer 2009-04-28 02:06:43 UTC
Created attachment 25221 [details]
xorg log
Comment 3 Jesse Barnes 2009-05-04 16:02:32 UTC
Ouch, this one might indicate a severe bug in our memory update code... which might explain the crashes you're seeing too... Investigating.
Comment 4 Jesse Barnes 2009-05-04 17:53:35 UTC
One thing that would really help here is a way of reliably reproducing this.  It could be a caching mode failure of some kind, or it could be another command overwriting cursor memory.  Does it happen when you run a particular program or after resizing a certain window or anything else?  If we can catch it happening we should be able to find the culprit.
Comment 5 Clemens Eisserer 2009-05-05 00:36:13 UTC
To me it seems as the corruption happens in a completly random manner.
One time it happens when running Dolphin, the other time when browing with FireFox.
Comment 6 Clemens Eisserer 2009-05-14 13:49:26 UTC
Hi Jesse, I was able to reproduce the issue by frequently unmapping/mapping a window.
When the cursor is above that window, I often see corruptions like that: http://www.youtube.com/watch?v=ToZ_RPRrnlY

I'll attach a simple test-case, but kill your window-manager before - at least kwin doesn't like the stress and maybe a running window manager would hide it.

After all I don't know if those corruptions are the same I saw and see, but at least its another artifact ;)
Comment 7 Clemens Eisserer 2009-05-14 13:50:37 UTC
Created attachment 25870 [details]
mad window mapper ;)
Comment 8 Clemens Eisserer 2009-05-20 16:11:40 UTC
Update: I don't think the cursor corruptions caused by the "mad window mapper" are the same as reported earlier.

I can sometimes see those green corruptions when moving the cursor from one window to another or for animated cursors, but those artifacts only occur for a short time.
The corruption reported initially stays, until another cursor is uploaded.

Should I open a new bug-report to report the corruption shown with the "mad window mapper"?
Comment 9 Clemens Eisserer 2009-07-06 06:56:35 UTC
the cursor corruption probably caused by a memory corruption seems to be fixed in 2.3.30 + intel-2.8.0pre.

However the small corruption caused by the window-mapper program is still present.
Comment 10 Clemens Eisserer 2009-08-06 03:56:08 UTC
please ignore the previous report.
I've just seen a persistent cursor with 2.6.31.rc5 + intel-2.8.

Comment 11 Jesse Barnes 2009-08-31 11:48:34 UTC
Created attachment 29049 [details] [review]
double buffer cursor updates

Random test request...  This patch double buffers cursor updates and might make things behave better.
Comment 12 Jesse Barnes 2009-10-05 10:45:22 UTC
Any update here Clemens?  Does this still occur on recent bits?  Hopefully one of the many corruption fixes took care of this problem too.
Comment 13 Clemens Eisserer 2009-12-17 15:00:47 UTC
I've unfourtunatly just experienced it again on 2.6.31.6 + intel-2.9.1.
However it seems to happen _very_ seldom these days, so things are much better than they were.
Comment 14 Jesse Barnes 2010-02-11 09:57:39 UTC
Looking at the video again, what you're seeing could be cursor plane underruns.  I hope those are fixed in recent (2.6.32+ and 2.6.33-rc) kernels since we've improved the watermark code a lot, so the cursor FIFOs should be in better shape.
Comment 15 Clemens Eisserer 2010-06-26 16:22:33 UTC
just saw this again with intel-2.12 + kernel-2.6.34
Comment 16 Chris Wilson 2010-06-27 01:29:01 UTC
In .35-rc1, I managed to undercover a kernel bug where we weren't flushing the cursors prior to use. It may happen that it also fixes these less frequent corrupt cursors.

commit e7b526bb852cdd67b24e174da6850222f8da41b1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 2 08:30:48 2010 +0100

    drm/i915: Move non-phys cursors into the GTT
    
    Cursors need to be in the GTT domain when being accessed by the GPU.
    Previously this was a fortuitous byproduct of userspace using pwrite()
    to upload the image data into the cursor. The redundant clflush was
    removed in commit 9b8c4a and so the image was no longer being flushed
    out of the caches into main memory. One could also devise a scenario
    where the cursor was rendered by the GPU, prior to being attached as the
    cursor, resulting in similar corruption due to the missing MI_FLUSH.
    
    Fixes:
    
      Bug 28335 - Cursor corruption caused by commit 9b8c4a0b21
      https://bugs.freedesktop.org/show_bug.cgi?id=28335
Comment 17 Clemens Eisserer 2011-02-01 17:48:38 UTC
I've just experienced the problem with:

linux-2.6.35.10 (-74.fc14)
intel-2.12

basically a stock Fedora-14 installation with updates.
Comment 18 Chris Wilson 2011-02-02 00:41:43 UTC
Exactly the same style of corruption; a single row of 8 pixels shifted?
Comment 19 Clemens Eisserer 2011-02-02 01:14:36 UTC
Yah, at least looks identifal.
The next time it happens I'll try to get a photo again.

Thanks, Clemens
Comment 20 Jesse Barnes 2011-02-10 12:18:49 UTC
Chris gets this bizarro bug.  If the cursor pixmap is actually corrupted, then it's not a cursor FIFO underrun, more likely some more general memory corruption.
Comment 21 Chris Wilson 2011-06-26 01:53:22 UTC
The persistent corruption of the cursor does suggest that the contents of the memory become corrupted. The write is either through the GTT or is clflushed when the cursor is pinned into the display plane, so it is unlikely to be that the data is not reaching the GPU. Does the corruption persist across cursor changes? If so, that implies the corruption is in the original cursor image data (i.e. in X and/or the application resending corrupt data for X for its cursor).
Comment 22 Chris Wilson 2011-11-28 16:39:34 UTC
Daniel, guess what? 945gm also uses physical pages for its cursors and so also needs your phys_pwrite clflush patch? Can you attach that for Clemens to test as well and send it on to Keith.
Comment 23 Daniel Vetter 2011-11-29 07:10:59 UTC
Created attachment 53952 [details] [review]
properly clflush phys cursor writes

Please test the attached patch, it should fix your issue.
Comment 24 Daniel Vetter 2011-11-29 09:31:44 UTC
Created attachment 53955 [details] [review]
use wmb to flush the wc cache

This looks like the technically more sound approach. Testing feedback highly welcome.
Comment 25 Chris Wilson 2012-05-09 02:44:15 UTC
Actually what I believe is required here (and the other gen3 corruption) is:

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 1237e75..67d7de1 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -1131,8 +1131,10 @@ static void i9xx_cleanup(void)
 
 static void i9xx_chipset_flush(void)
 {
-       if (intel_private.i9xx_flush_page)
+       if (intel_private.i9xx_flush_page) {
+               readl(intel_private.i9xx_flush_page); /* write barrier */
                writel(1, intel_private.i9xx_flush_page);
+       }
 }
 
 static void i965_write_entry(dma_addr_t addr,
Comment 26 Chris Wilson 2012-05-10 04:23:52 UTC
Created attachment 61351 [details] [review]
wmb() before calling backend->gtt_flush()

So thinking about this, we are definitely missing a wmb() prior to flushing and declaring *memory* coherent. Even with gen6, where the cache itself is coherent, I believe we want a serialising point for the semantics of issuing the execbuffer (i.e. at the point of dispatch, the state of memory is fixed).
Comment 27 Daniel Vetter 2012-11-13 13:36:10 UTC
Clemens, can you please give us an update on how well this works on latest kernel/userspace?
Comment 28 Chris Wilson 2013-02-15 00:25:23 UTC
Timeout. Fingers crossed that the recent mb() review helps.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.