|Summary:||[snb] artifacts, urxvt cursor disappearance, hung GPU with garbage in batch|
|Product:||xorg||Reporter:||Michael Ivko <mihai>|
|Component:||Driver/intel||Assignee:||Chris Wilson <chris>|
|Status:||RESOLVED FIXED||QA Contact:||Xorg Project Team <xorg-team>|
|i915 platform:||i915 features:|
Description Michael Ivko 2011-09-28 00:20:50 UTC
Created attachment 51701 [details] screenshot 1 Not sure if al those issues are related, but that's how it is. On Asus K53SJ integrated card, after some time in xorg artifacts start to appear on windows, as shown in attached screenshots. Some of them disappear after window refresh, some don't; some of them move with scrolling area; some even survive being scrolled out of view. At this time, neither Xorg.0.log nor dmesg nor i915_error state contain anything suspicious. Then severe artifacts appear in all urxvt windows, making it unusable. When this happens, in every newly opened urxvt terminal there are no artifacts, but text cursor is not visible, unless there is a non-whitespace character under it. Attached log files are from the time this later thing happens.
Comment 3 Michael Ivko 2011-09-28 00:28:29 UTC
Created attachment 51704 [details] Xorg.0.log, dmesg, i915_error_state
Comment 4 Chris Wilson 2011-09-28 00:51:52 UTC
What's your urxvt config? Especially with regards to single/double buffering and glyph rendering (xft versus core). It would also be interesting to see if SNA suffers from the same issue. That would help identify at which level of the stack things are going wrong.
Comment 5 Michael Ivko 2011-09-28 01:26:19 UTC
Software: Arch linux, linux 3.0.4-1, xorg-server 1.10.4-1, xf86-video-intel 2.15.0-2 (In reply to comment #4) > What's your urxvt config? Especially with regards to single/double buffering > and glyph rendering (xft versus core). > > It would also be interesting to see if SNA suffers from the same issue. That > would help identify at which level of the stack things are going wrong. I use xft (Terminus ttf).
Comment 7 Michael Ivko 2011-09-28 01:42:38 UTC
It can be related to xft, because sometimes al instances of the same glyph are "substituted" by similar or identical images.
Comment 8 Chris Wilson 2011-09-28 02:15:28 UTC
The hang ends up with random rendering overwriting the batch buffer; which actually looks consistent with the garbage on the screen - something is scribbling over random memory. (And if that memory happens to be the glyph cache, then any fresh instance of that glyph on the screen will be corrupt. ;-)
Comment 9 Chris Wilson 2011-09-28 02:57:10 UTC
The erroneous batch buffer consists entirely of batchbuffer at 0x0ecbf000: 0x0ecbf000: 0xe976aa55: UNKNOWN 0x0ecbf004: 0x3030e704: UNKNOWN 0x0ecbf008: 0x30303030: UNKNOWN 0x0ecbf00c: 0x30303030: UNKNOWN 0x0ecbf010: 0x22a03030: UNKNOWN 0x0ecbf014: 0x6121b9e9: 3D UNKNOWN: 3d_965 opcode = 0x6121 0x0ecbf018: 0x0ab00040: MI UNKNOWN 0x0ecbf01c: 0x42493030: 2D UNKNOWN repeated ad nausem. That's not strictly premultipled rgba data, nor does it closely match other forms of data sent to the GPU. Best guess is then non-premultipled rgba. Do you have any GL applications running (e.g. a compositing WM)?
Comment 10 Michael Ivko 2011-09-28 03:07:27 UTC
No, and compositing is switched off in xorg.conf.
Comment 11 Michael Ivko 2011-09-28 03:12:44 UTC
(In reply to comment #10) > No, and compositing is switched off in xorg.conf. fix: there were no xorg.conf when it happened last time.
Comment 12 Michael Ivko 2011-10-01 11:25:46 UTC
i915.semaphores=1 was in kernel options when this happened.
Comment 13 mus.svz 2011-12-27 06:02:24 UTC
> It would also be interesting to see if SNA suffers from the same issue. That > would help identify at which level of the stack things are going wrong. Since the artifacts in screenshot 2 look exactly like the ones from bug #42506 (subtitle corruption), I compiled xf86-video-intel from git today with SNA enabled. grep -i sna /var/log/Xorg.0.log [ 29768.193] (II) intel(0): SNA compiled from 2.17.0-234-g655a96c [ 29768.741] (II) intel(0): SNA initialized with SandyBridge backend Seems like SNA made it even worse for me. While these red artifacts only appeared rarely in subtitles before, I now have them randomly flashing on the screen about every third time I start a video. Looks kind of cool together with the corruptions caused by VAAPI (which are supposedly a Mesa problem, see bug #42506): http://www.youtube.com/watch?v=2dTKhCl6o2c Semaphores and rc6 are both disabled by the way. Plase tell me if there is any more information I can provide that could narrow this down.
Comment 14 mus.svz 2012-01-23 04:39:37 UTC
btw, the patch drm/i915: Only clear the GPU domains upon a successful finish did not fix this. This patch is included in the Arch Linux kernel 3.2 and I've seen the artifacts again. http://projects.archlinux.org/svntogit/packages.git/tree/trunk/i915-gpu-finish.patch?h=packages/linux
Comment 15 Chris Wilson 2012-01-23 05:05:58 UTC
The patch should address the GPU hang from the original reporter.
Comment 16 mus.svz 2012-01-29 14:06:03 UTC
sorry, didn't realize this patch is only for the GPU hang. Nevertheless, the artifacts from screenshot 2 have been fixed for me in kernel 3.3-rc1. The other problems from bug #42506 as well btw.
Comment 17 Michael Ivko 2012-03-16 23:32:11 UTC
As of linux 3.2.9, the artifacts still appear, but instead of urxvt part, they are followed by complete xorg freeze (except moving mouse pointer). After several reboots the behavior disappears, but reappears when kernel is updated.
Comment 18 Chris Wilson 2012-03-26 01:44:48 UTC
I believe these are all related to the underlying bug: commit c501ae7f332cdaf42e31af30b72b4b66cbbb1604 Author: Chris Wilson <firstname.lastname@example.org> Date: Wed Dec 14 13:57:23 2011 +0100 drm/i915: Only clear the GPU domains upon a successful finish By clearing the GPU read domains before waiting upon the buffer, we run the risk of the wait being interrupted and the domains prematurely cleared. The next time we attempt to wait upon the buffer (after userspace handles the signal), we believe that the buffer is idle and so skip the wait. There are a number of bugs across all generations which show signs of an overly haste reuse of active buffers. Such as: https://bugs.freedesktop.org/show_bug.cgi?id=29046 https://bugs.freedesktop.org/show_bug.cgi?id=35863 https://bugs.freedesktop.org/show_bug.cgi?id=38952 https://bugs.freedesktop.org/show_bug.cgi?id=40282 https://bugs.freedesktop.org/show_bug.cgi?id=41098 https://bugs.freedesktop.org/show_bug.cgi?id=41102 https://bugs.freedesktop.org/show_bug.cgi?id=41284 https://bugs.freedesktop.org/show_bug.cgi?id=42141 A couple of those pre-date i915_gem_object_finish_gpu(), so may be unrelated (such as a wild write from a userspace command buffer), but this does look like a convincing cause for most of those bugs. Signed-off-by: Chris Wilson <email@example.com> Cc: firstname.lastname@example.org Reviewed-by: Daniel Vetter <email@example.com> Reviewed-by: Eugeni Dodonov <firstname.lastname@example.org> Signed-off-by: Daniel Vetter <email@example.com>