Bug 41284 - [snb] artifacts, urxvt cursor disappearance, hung GPU with garbage in batch
Summary: [snb] artifacts, urxvt cursor disappearance, hung GPU with garbage in batch
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: 7.6 (2010.12)
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Chris Wilson
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-28 00:20 UTC by Michael Ivko
Modified: 2012-03-26 01:44 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
screenshot 1 (2.82 KB, image/png)
2011-09-28 00:20 UTC, Michael Ivko
no flags Details
screenshot 2 (5.84 KB, image/png)
2011-09-28 00:22 UTC, Michael Ivko
no flags Details
screenshot 3 (22.45 KB, image/png)
2011-09-28 00:22 UTC, Michael Ivko
no flags Details
Xorg.0.log, dmesg, i915_error_state (243.33 KB, application/x-bzip2)
2011-09-28 00:28 UTC, Michael Ivko
no flags Details
Xresources (878 bytes, text/plain)
2011-09-28 01:27 UTC, Michael Ivko
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Ivko 2011-09-28 00:20:50 UTC
Created attachment 51701 [details]
screenshot 1

Not sure if al those issues are related, but that's how it is.

On Asus K53SJ integrated card, after some time in xorg artifacts start to appear on windows, as shown in attached screenshots. Some of them disappear after window refresh, some don't; some of them move with scrolling area; some even survive being scrolled out of view.

At this time, neither Xorg.0.log nor dmesg nor i915_error state contain anything suspicious.

Then severe artifacts appear in all urxvt windows, making it unusable. When this happens, in every newly opened urxvt terminal there are no artifacts, but text cursor is not visible, unless there is a non-whitespace character under it.

Attached log files are from the time this later thing happens.
Comment 1 Michael Ivko 2011-09-28 00:22:15 UTC
Created attachment 51702 [details]
screenshot 2
Comment 2 Michael Ivko 2011-09-28 00:22:55 UTC
Created attachment 51703 [details]
screenshot 3
Comment 3 Michael Ivko 2011-09-28 00:28:29 UTC
Created attachment 51704 [details]
Xorg.0.log, dmesg, i915_error_state
Comment 4 Chris Wilson 2011-09-28 00:51:52 UTC
What's your urxvt config? Especially with regards to single/double buffering and glyph rendering (xft versus core).

It would also be interesting to see if SNA suffers from the same issue. That would help identify at which level of the stack things are going wrong.
Comment 5 Michael Ivko 2011-09-28 01:26:19 UTC
Software: Arch linux, linux 3.0.4-1, xorg-server 1.10.4-1, xf86-video-intel 2.15.0-2

(In reply to comment #4)
> What's your urxvt config? Especially with regards to single/double buffering
> and glyph rendering (xft versus core).
> 
> It would also be interesting to see if SNA suffers from the same issue. That
> would help identify at which level of the stack things are going wrong.

I use xft (Terminus ttf).
Comment 6 Michael Ivko 2011-09-28 01:27:45 UTC
Created attachment 51707 [details]
Xresources
Comment 7 Michael Ivko 2011-09-28 01:42:38 UTC
It can be related to xft, because sometimes al instances of the same glyph are "substituted" by similar or identical images.
Comment 8 Chris Wilson 2011-09-28 02:15:28 UTC
The hang ends up with random rendering overwriting the batch buffer; which actually looks consistent with the garbage on the screen - something is scribbling over random memory.

(And if that memory happens to be the glyph cache, then any fresh instance of that glyph on the screen will be corrupt. ;-)
Comment 9 Chris Wilson 2011-09-28 02:57:10 UTC
The erroneous batch buffer consists entirely of

batchbuffer at 0x0ecbf000:
0x0ecbf000:      0xe976aa55: UNKNOWN
0x0ecbf004:      0x3030e704:    UNKNOWN
0x0ecbf008:      0x30303030:    UNKNOWN
0x0ecbf00c:      0x30303030:    UNKNOWN
0x0ecbf010:      0x22a03030:    UNKNOWN
0x0ecbf014:      0x6121b9e9: 3D UNKNOWN: 3d_965 opcode = 0x6121
0x0ecbf018:      0x0ab00040: MI UNKNOWN
0x0ecbf01c:      0x42493030: 2D UNKNOWN
repeated ad nausem.

That's not strictly premultipled rgba data, nor does it closely match other forms of data sent to the GPU. Best guess is then non-premultipled rgba.

Do you have any GL applications running (e.g. a compositing WM)?
Comment 10 Michael Ivko 2011-09-28 03:07:27 UTC
No, and compositing is switched off in xorg.conf.
Comment 11 Michael Ivko 2011-09-28 03:12:44 UTC
(In reply to comment #10)
> No, and compositing is switched off in xorg.conf.

fix: there were no xorg.conf when it happened last time.
Comment 12 Michael Ivko 2011-10-01 11:25:46 UTC
i915.semaphores=1 was in kernel options when this happened.
Comment 13 mus.svz 2011-12-27 06:02:24 UTC
> It would also be interesting to see if SNA suffers from the same issue. That
> would help identify at which level of the stack things are going wrong.

Since the artifacts in screenshot 2 look exactly like the ones from bug #42506 (subtitle corruption), I compiled xf86-video-intel from git today with SNA enabled.

grep -i sna /var/log/Xorg.0.log
[ 29768.193] (II) intel(0): SNA compiled from 2.17.0-234-g655a96c
[ 29768.741] (II) intel(0): SNA initialized with SandyBridge backend

Seems like SNA made it even worse for me. While these red artifacts only appeared rarely in subtitles before, I now have them randomly flashing on the screen about every third time I start a video.

Looks kind of cool together with the corruptions caused by VAAPI (which are supposedly a Mesa problem, see bug #42506):

http://www.youtube.com/watch?v=2dTKhCl6o2c

Semaphores and rc6 are both disabled by the way.

Plase tell me if there is any more information I can provide that could narrow this down.
Comment 14 mus.svz 2012-01-23 04:39:37 UTC
btw, the patch

drm/i915: Only clear the GPU domains upon a successful finish

did not fix this.
This patch is included in the Arch Linux kernel 3.2 and I've seen the artifacts again.
http://projects.archlinux.org/svntogit/packages.git/tree/trunk/i915-gpu-finish.patch?h=packages/linux
Comment 15 Chris Wilson 2012-01-23 05:05:58 UTC
The patch should address the GPU hang from the original reporter.
Comment 16 mus.svz 2012-01-29 14:06:03 UTC
sorry, didn't realize this patch is only for the GPU hang.
Nevertheless, the artifacts from screenshot 2 have been fixed for me in kernel 3.3-rc1. The other problems from bug #42506 as well btw.
Comment 17 Michael Ivko 2012-03-16 23:32:11 UTC
As of linux 3.2.9, the artifacts still appear, but instead of urxvt part, they are followed by complete xorg freeze (except moving mouse pointer). After several reboots the behavior disappears, but reappears when kernel is updated.
Comment 18 Chris Wilson 2012-03-26 01:44:48 UTC
I believe these are all related to the underlying bug:

commit c501ae7f332cdaf42e31af30b72b4b66cbbb1604
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Dec 14 13:57:23 2011 +0100

    drm/i915: Only clear the GPU domains upon a successful finish
    
    By clearing the GPU read domains before waiting upon the buffer, we run
    the risk of the wait being interrupted and the domains prematurely
    cleared. The next time we attempt to wait upon the buffer (after
    userspace handles the signal), we believe that the buffer is idle and so
    skip the wait.
    
    There are a number of bugs across all generations which show signs of an
    overly haste reuse of active buffers.
    
    Such as:
    
      https://bugs.freedesktop.org/show_bug.cgi?id=29046
      https://bugs.freedesktop.org/show_bug.cgi?id=35863
      https://bugs.freedesktop.org/show_bug.cgi?id=38952
      https://bugs.freedesktop.org/show_bug.cgi?id=40282
      https://bugs.freedesktop.org/show_bug.cgi?id=41098
      https://bugs.freedesktop.org/show_bug.cgi?id=41102
      https://bugs.freedesktop.org/show_bug.cgi?id=41284
      https://bugs.freedesktop.org/show_bug.cgi?id=42141
    
    A couple of those pre-date i915_gem_object_finish_gpu(), so may be
    unrelated (such as a wild write from a userspace command buffer), but
    this does look like a convincing cause for most of those bugs.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: stable@kernel.org
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.