Bug 35460

Summary: [855GM] Corruptions with linux-2.6.38 & xf-video-intel-2.14.901
Product: xorg Reporter: Bruno <bonbons>
Component: Driver/intelAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED WORKSFORME QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: daniel
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Corruption in FF when horizontaly scrolling
none
Enlightenment shelve corruption 1/2 (see firefox screenshot for correct shelve)
none
Xchat log pane corruption but also has black bottom window border decoration
none
Corruption of clock on enlightenment shelve
none
clflush phys objects to fix cursor corruptions
none
clflush phys objects to fix cursor corruptions
none
use wmb to flush the wc cache none

Description Bruno 2011-03-20 02:58:37 UTC
Created attachment 44632 [details]
Corruption in FF when horizontaly scrolling

I'm seeing rendering corruptions on my i855 (Acer TM66x):
00:02.0 VGA compatible controller [0300]: Intel Corporation 82852/855GM
Integrated Graphics Device [8086:3582] (rev 02)
00:02.1 Display controller [0380]: Intel Corporation 82852/855GM Integrated
Graphics Device [8086:3582] (rev 02)

Software (Gentoo): xf86-video-intel-2.14.901, libdrm-2.4.24, xorg-server-1.9.4,
Enlightenment 17, 2.6.38 with attachment (id=44500) from bug #34980 (drm/i915: Fix tiling corruption from pipelined fencing)

Software affected by corruptions until now:
- Enlightenment (seen on shelves)
- Firefox
- XChat (seen in chat log widget)
Comment 1 Bruno 2011-03-20 03:00:23 UTC
Created attachment 44633 [details]
Enlightenment shelve corruption 1/2 (see firefox screenshot for correct shelve)
Comment 2 Bruno 2011-03-20 03:04:15 UTC
Created attachment 44634 [details]
Xchat log pane corruption but also has black bottom window border decoration

This screenshot show corruption in xchat but also bottom window decoration that is not painted (the big black border at bottom of screenshot, should have have same look as bottom window border on Firefox screenshot)
Comment 3 Bruno 2011-03-20 03:27:46 UTC
Created attachment 44635 [details]
Corruption of clock on enlightenment shelve

This corruption happened while scrolling a lot in Firefox (clock should have looked like in Firefox screenshot, though showing some time around 11:20)

The other enlightenment shelve corruption I remember (but have not had opportunity to screenshot yet) is similar to this one but covers whole shelve
Comment 4 Bruno 2011-04-04 11:40:06 UTC
Still applies to 2.6.38 with xf86-video-intel-2.14.902 + "Cleanup gen2 tiling confusion" by Daniel and either drm-intel-next or drm-intel-staging (as of today) applied to kernel.

Note: corruptions start showing up after a few minutes of X uptime with activity.
Comment 5 Chris Wilson 2011-07-10 05:49:26 UTC
Looks like the many-facetted pipelined-fencing+bad-alignment bugs. There's another patch that is hopefully heading upstream for the kernel to fix this, then I'd like for you to retest.
Comment 6 Eugeni Dodonov 2011-09-08 15:56:11 UTC
This issue is affecting a hardware component which is not being actively worked on anymore.

Moving the assignee to the dri-devel list as contact, to give this issue a better coverage.
Comment 7 Chris Wilson 2011-10-30 05:44:20 UTC
I've just tried to reproduce this running firefox+chromium+others under e17 using SNA. I'm pretty certain this was the tiling bugs we fixed early.
Comment 8 Bruno 2011-10-30 05:55:23 UTC
(In reply to comment #7)
> I've just tried to reproduce this running firefox+chromium+others under e17
> using SNA. I'm pretty certain this was the tiling bugs we fixed early.

I've not seen them recently either, except for sporadically a pixel line with with white/gray pixels in the e17 cursor. (always at same place of the cursor, more or less middle in height).
Might be there is one tiling bug remaining around cursor handling.
Comment 9 Daniel Vetter 2011-10-30 06:44:19 UTC
> --- Comment #8 from Bruno <bonbons67@internet.lu> 2011-10-30 05:55:23 PDT ---
> I've not seen them recently either, except for sporadically a pixel line with
> with white/gray pixels in the e17 cursor. (always at same place of the cursor,
> more or less middle in height).
> Might be there is one tiling bug remaining around cursor handling.

Cursors on gen2 are stored in physical mem. On a quick lock we're indeed
missing the clflush that should be there. Let me whip up a patch.
-Daniel
Comment 10 Daniel Vetter 2011-10-30 06:46:37 UTC
Created attachment 52913 [details] [review]
clflush phys objects to fix cursor corruptions

Test-feedback highly appreciated.
Comment 11 Daniel Vetter 2011-10-30 08:42:04 UTC
Created attachment 52920 [details] [review]
clflush phys objects to fix cursor corruptions

Chris Wilson pointed out that I'm missing memory barriers.
Comment 12 Bruno 2011-11-02 11:24:34 UTC
(In reply to comment #10 with patch from comment #11)
> Created attachment 52920 [details] [review] [review]
> 
> Test-feedback highly appreciated.

Running now for two days with the patch, no negative effect seen though some more time will be needed to produce a solid answer whether it fixes the cursors as the corruption is not that regular (at least no noticed any cursor corruption since I applied the patch).

Will report back in a week or so unless I see corruption in which case I will report earlier.
Comment 13 Bruno 2011-11-12 02:39:23 UTC
A good week later I've not seen the cursor corruptions nor other issues.
Please apply patch in attachment 52920 [details] [review] with
  Tested-and-reported-by: Bruno Prémont <bonbons@linux-vserver.org>
Comment 14 Daniel Vetter 2011-11-29 09:30:35 UTC
Created attachment 53954 [details] [review]
use wmb to flush the wc cache

Hi Bruno, can you please test whether this patch works, too. I think this is the better solution, on reconsideration the last patch is a bit confusing and might fix the issue just as a side-effect.
Comment 15 Bruno 2011-12-08 01:50:04 UTC
(In reply to comment #14)
> Created attachment 53954 [details] [review] [review]
> use wmb to flush the wc cache
> 
> Hi Bruno, can you please test whether this patch works, too. I think this is
> the better solution, on reconsideration the last patch is a bit confusing and
> might fix the issue just as a side-effect.

Hi Daniel,

Running for a week now with this patch instead of the other one and have not seen cursor corruptions.

Thanks for your efforts!
Comment 16 Bruno 2011-12-09 06:46:26 UTC
(In reply to comment #15 and comment #14 with attachment 53954 [details] [review])
> > use wmb to flush the wc cache
>
> Running for a week now with this patch instead of the other one and have not
> seen cursor corruptions.

Well, today I had the 8-pixel-line curruption on a cursor and interestingly it survived cursor switches between enlightenment (colorful arrow - corrupted) and gnumeric (cell '+' cursor), but passing over claws-mail caused enlightenment cursor to no have corruption enymore.

Does the hardware store multiple cursors or somehow have multiple cursor buffers around which can get corrupted and remain corrupted (double-buffering on cursors)?

Bruno
Comment 17 Chris Wilson 2011-12-09 07:25:13 UTC
No, we don't buffer cursors. A single cursor bo is updated every time its image changes. (Whilst possible, I don't it is an interesting optimisation to cache the bo for a variety of cursors.)

This could just be a bad bit of RAM, bad cache, or signs of incoherency... Can you retry Daniel's original overkill patch and see if this corruption ever turns up? Though I think we can safely say that the current mb is an improvement.
Comment 18 Bruno 2012-01-05 13:41:16 UTC
(In reply to comment #17)
> No, we don't buffer cursors. A single cursor bo is updated every time its image
> changes. (Whilst possible, I don't it is an interesting optimisation to cache
> the bo for a variety of cursors.)
> 
> This could just be a bad bit of RAM, bad cache, or signs of incoherency... Can
> you retry Daniel's original overkill patch and see if this corruption ever
> turns up? Though I think we can safely say that the current mb is an
> improvement.

The overkill patch also has the corrupted cursor happen sporadically.

As the corrupted cursor survives cursor shape switches (e.g. on moving across windows) it must be some intermediate buffer that gets filled with corrupted data and stays that way until recycled.
I don't know if/who of enlightenment or xorg-server does the caching (as you say intel driver doesn't do it explicitly).

It seems to affect only the high-color cursors (at least I haven't seen corruptions for black&white ones) so it could be some recoding missing a barrier.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.