Created attachment 25201 [details] corrupted cursor From time to time I see cursor corruptions with UXA+KMS. This happens spontaneous, and just seems to be a cosmetic problem. The corruption is present until another cursor is choosen. My system is: - Fedora Rawhide - intel-2.7 - kernel-2.6.29.1 - xorg-1.6.1 - intel-945gm
Created attachment 25220 [details] dmest output
Created attachment 25221 [details] xorg log
Ouch, this one might indicate a severe bug in our memory update code... which might explain the crashes you're seeing too... Investigating.
One thing that would really help here is a way of reliably reproducing this. It could be a caching mode failure of some kind, or it could be another command overwriting cursor memory. Does it happen when you run a particular program or after resizing a certain window or anything else? If we can catch it happening we should be able to find the culprit.
To me it seems as the corruption happens in a completly random manner. One time it happens when running Dolphin, the other time when browing with FireFox.
Hi Jesse, I was able to reproduce the issue by frequently unmapping/mapping a window. When the cursor is above that window, I often see corruptions like that: http://www.youtube.com/watch?v=ToZ_RPRrnlY I'll attach a simple test-case, but kill your window-manager before - at least kwin doesn't like the stress and maybe a running window manager would hide it. After all I don't know if those corruptions are the same I saw and see, but at least its another artifact ;)
Created attachment 25870 [details] mad window mapper ;)
Update: I don't think the cursor corruptions caused by the "mad window mapper" are the same as reported earlier. I can sometimes see those green corruptions when moving the cursor from one window to another or for animated cursors, but those artifacts only occur for a short time. The corruption reported initially stays, until another cursor is uploaded. Should I open a new bug-report to report the corruption shown with the "mad window mapper"?
the cursor corruption probably caused by a memory corruption seems to be fixed in 2.3.30 + intel-2.8.0pre. However the small corruption caused by the window-mapper program is still present.
please ignore the previous report. I've just seen a persistent cursor with 2.6.31.rc5 + intel-2.8.
Created attachment 29049 [details] [review] double buffer cursor updates Random test request... This patch double buffers cursor updates and might make things behave better.
Any update here Clemens? Does this still occur on recent bits? Hopefully one of the many corruption fixes took care of this problem too.
I've unfourtunatly just experienced it again on 2.6.31.6 + intel-2.9.1. However it seems to happen _very_ seldom these days, so things are much better than they were.
Looking at the video again, what you're seeing could be cursor plane underruns. I hope those are fixed in recent (2.6.32+ and 2.6.33-rc) kernels since we've improved the watermark code a lot, so the cursor FIFOs should be in better shape.
just saw this again with intel-2.12 + kernel-2.6.34
In .35-rc1, I managed to undercover a kernel bug where we weren't flushing the cursors prior to use. It may happen that it also fixes these less frequent corrupt cursors. commit e7b526bb852cdd67b24e174da6850222f8da41b1 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 2 08:30:48 2010 +0100 drm/i915: Move non-phys cursors into the GTT Cursors need to be in the GTT domain when being accessed by the GPU. Previously this was a fortuitous byproduct of userspace using pwrite() to upload the image data into the cursor. The redundant clflush was removed in commit 9b8c4a and so the image was no longer being flushed out of the caches into main memory. One could also devise a scenario where the cursor was rendered by the GPU, prior to being attached as the cursor, resulting in similar corruption due to the missing MI_FLUSH. Fixes: Bug 28335 - Cursor corruption caused by commit 9b8c4a0b21 https://bugs.freedesktop.org/show_bug.cgi?id=28335
I've just experienced the problem with: linux-2.6.35.10 (-74.fc14) intel-2.12 basically a stock Fedora-14 installation with updates.
Exactly the same style of corruption; a single row of 8 pixels shifted?
Yah, at least looks identifal. The next time it happens I'll try to get a photo again. Thanks, Clemens
Chris gets this bizarro bug. If the cursor pixmap is actually corrupted, then it's not a cursor FIFO underrun, more likely some more general memory corruption.
The persistent corruption of the cursor does suggest that the contents of the memory become corrupted. The write is either through the GTT or is clflushed when the cursor is pinned into the display plane, so it is unlikely to be that the data is not reaching the GPU. Does the corruption persist across cursor changes? If so, that implies the corruption is in the original cursor image data (i.e. in X and/or the application resending corrupt data for X for its cursor).
Daniel, guess what? 945gm also uses physical pages for its cursors and so also needs your phys_pwrite clflush patch? Can you attach that for Clemens to test as well and send it on to Keith.
Created attachment 53952 [details] [review] properly clflush phys cursor writes Please test the attached patch, it should fix your issue.
Created attachment 53955 [details] [review] use wmb to flush the wc cache This looks like the technically more sound approach. Testing feedback highly welcome.
Actually what I believe is required here (and the other gen3 corruption) is: diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c index 1237e75..67d7de1 100644 --- a/drivers/char/agp/intel-gtt.c +++ b/drivers/char/agp/intel-gtt.c @@ -1131,8 +1131,10 @@ static void i9xx_cleanup(void) static void i9xx_chipset_flush(void) { - if (intel_private.i9xx_flush_page) + if (intel_private.i9xx_flush_page) { + readl(intel_private.i9xx_flush_page); /* write barrier */ writel(1, intel_private.i9xx_flush_page); + } } static void i965_write_entry(dma_addr_t addr,
Created attachment 61351 [details] [review] wmb() before calling backend->gtt_flush() So thinking about this, we are definitely missing a wmb() prior to flushing and declaring *memory* coherent. Even with gen6, where the cache itself is coherent, I believe we want a serialising point for the semantics of issuing the execbuffer (i.e. at the point of dispatch, the state of memory is fixed).
Clemens, can you please give us an update on how well this works on latest kernel/userspace?
Timeout. Fingers crossed that the recent mb() review helps.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.