Bug 68410 - [bisected ivb] Small black box corruption in firefox
[bisected ivb] Small black box corruption in firefox
Status: RESOLVED WORKSFORME
Product: xorg
Classification: Unclassified
Component: Driver/intel
git
x86-64 (AMD64) Linux (All)
: medium normal
Assigned To: Chris Wilson
Intel GFX Bugs mailing list
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-22 01:48 UTC by Joseph Yasi
Modified: 2014-10-06 13:28 UTC (History)
19 users (show)

See Also:
i915 platform:
i915 features:


Attachments
screenshot of corruption black boxes (175.19 KB, image/png)
2013-08-22 01:48 UTC, Joseph Yasi
no flags Details
Xorg.log of the driver with corruption (22.38 KB, text/plain)
2013-08-22 02:33 UTC, Joseph Yasi
no flags Details
Xorg.log of the driver with corruption on Haswell (27.09 KB, text/plain)
2013-08-22 04:13 UTC, Joseph Yasi
no flags Details
Xorg.0.log (18.48 KB, text/plain)
2013-08-24 05:18 UTC, Kazuo Teramoto
no flags Details
New behaviour (15.69 KB, image/png)
2013-10-07 23:31 UTC, dflogeras2
no flags Details
Flush render cache when changing CC state (3.77 KB, patch)
2013-10-31 16:00 UTC, Chris Wilson
no flags Details | Splinter Review
Apply depth stall w/a (1.55 KB, patch)
2013-11-13 11:28 UTC, Chris Wilson
no flags Details | Splinter Review
screenshot (182.20 KB, image/jpeg)
2014-10-06 12:57 UTC, behrooz
no flags Details
Xorg.0.log (26.87 KB, text/plain)
2014-10-06 13:07 UTC, behrooz
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joseph Yasi 2013-08-22 01:48:43 UTC
Created attachment 84420 [details]
screenshot of corruption black boxes

I bisected the corruption down to commit ed40a7c3de3bbb178278c05907e59239712b98b6: sna/gen6+: Tweak semaphore avoidance for composite operations. I get small black boxes when scrolling in Firefox on a page with a lot text as of this commit. This page reproduces it pretty readily: http://www.nola.com/politics/index.ssf/2013/08/polls_show_louisianians_disapp.html.
Comment 1 Joseph Yasi 2013-08-22 01:50:55 UTC
I cannot reproduce it on Haswell, only on a Lenovo T530 with Ivy Bridge.
Comment 2 Chris Wilson 2013-08-22 02:17:24 UTC
Please also attach your Xorg.log.
Comment 3 Chris Wilson 2013-08-22 02:18:12 UTC
Hmm, that doesn't look like a driver artifact.
Comment 4 Joseph Yasi 2013-08-22 02:33:23 UTC
Created attachment 84421 [details]
Xorg.log of the driver with corruption

Yeah, it looks weird for driver corruption, but it happens with commit ed40a7c3de3bbb178278c05907e59239712b98b6 and not with commit	4486ae2d829781e32652bce84c08e63ee1960bf0.
Comment 5 Joseph Yasi 2013-08-22 04:13:08 UTC
Created attachment 84422 [details]
Xorg.log of the driver with corruption on Haswell

It just happened to me on Haswell with the 2.21.15 driver while scrolling in a gmail text box with the arrow keys. I reverted to 4486ae2d829781e32652bce84c08e63ee1960bf0 and it stopped happening on the Haswell machine, too.
Comment 6 Chris Wilson 2013-08-22 09:47:45 UTC
And it the same random set of identical black boxes? What desktop environment are you using?
Comment 7 Chris Wilson 2013-08-22 09:48:17 UTC
Also once the boxes appear do they persist? Relative to the page or to the screen?
Comment 8 Joseph Yasi 2013-08-22 14:23:54 UTC
I am running KDE 4.11.

Yes, I am seeing a bunch of random identical black boxes on Haswell as well.

The boxes persist relative to the page and scroll with it. They go away when focus switches to another window, and new ones appear when scrolling the text again.
Comment 9 Kazuo Teramoto 2013-08-24 05:18:12 UTC
Created attachment 84545 [details]
Xorg.0.log

I have the same problem as the OP in firefox.

I'm using Awesome WM and an Ivy Bridge laptop with Arch Linux (testing).

This started to happen after the following packages are updated:

* mesa (9.1.6-1 -> 9.2.0rc2-1)
* mesa-libgl (9.1.6-1 -> 9.2.0rc2-1)
* intel-dri (9.1.6-1 -> 9.2.0rc2-1)
* linux (3.10.7-1 -> 3.10.9-1)
* xf86-video-intel (2.21.14-2 -> 2.21.15-1)

Arch Linux now set sna as default and I'm using it. I will try to test with the commit reverted.

IIRC I had this type of corruption sometime on the (long) past, I think I switched back to uxa and after some time I come back to sna and its gone.
Comment 10 Kazuo Teramoto 2013-08-25 08:24:00 UTC
Just a follow up. Confirming that reverting just the commit ed40a7c3de in the xf86-video-intel 2.21.15 release, don't give me any corruption.
Comment 11 Chris Wilson 2013-08-25 18:07:23 UTC
My suspicion is that this is a missing render flush, since that is all the effect the patch should have had (use existing render operations more often).

I've just pushed a patch to expose a debug variable in src/sna/gen7_render.c, if you can please recompile the DDX with this little change:

diff --git a/src/sna/gen7_render.c b/src/sna/gen7_render.c
index ff2ddb7..244bf17 100644
--- a/src/sna/gen7_render.c
+++ b/src/sna/gen7_render.c
@@ -45,7 +45,7 @@
 #include "gen4_source.h"
 #include "gen4_vertex.h"
 
-#define ALWAYS_FLUSH 0
+#define ALWAYS_FLUSH 1
 
 #define NO_COMPOSITE 0
 #define NO_COMPOSITE_SPANS 0

that should test the simple flushing theory. (There are more complicated flushing theories! ;-)
Comment 12 Joseph Yasi 2013-08-25 20:12:24 UTC
I can't reproduce this anymore even with #define ALWAYS_FLUSH 0 . I bisected to see which commit fixed it, and it looks like it was the one right after the 2.21.15 release. This doesn't happen anymore as of:
509e7aaf8446f568e133e1b450ea13f73e9b366b sna/gen7: Prefer the render ring for more operations
Comment 13 Chris Wilson 2013-08-26 14:06:06 UTC
Hmm, we mask the problem once again. I'm not sure if this caused the troublesome operation to move back to the blt ring, or if it made more operations move over to the render ring. At any rate, it should like an inter-ring flushing issue - or a malformed command.
Comment 14 Sean V Kelley 2013-09-03 03:03:04 UTC
(In reply to comment #1)
> I cannot reproduce it on Haswell, only on a Lenovo T530 with Ivy Bridge.

Confirm on Lenovo X230 with IVB
Comment 15 Jon Gjengset 2013-09-10 18:30:16 UTC
Confirmed on ThinkPad X1 Carbon with Ivy Bridge on Arch Linux with xf86-video-intel: 2.21.15
Comment 16 Joseph Yasi 2013-09-19 17:00:46 UTC
I'm still seeing this with commit afad7dd43d935a4666bff6c2964714209e851221 on Haswell, but I don't have a way to reliably reproduce it, unfortunately. I just happens occasionally.
Comment 17 Frank McLaughlin 2013-09-20 04:55:09 UTC
I am encountering this issue on a Lenovo X230 under Arch Linux with all the latest updates.  It was consistent on some pages (Facebook, for example) but didn't seem to occur on some others.  Disabling hardware acceleration in Firefox did nothing.  However, switching from SNA to UXA seems to have made the issue go away completely.  I hope that information is helpful to someone.
Comment 18 Michael Wallner 2013-09-20 06:54:52 UTC
Suffering as well on 3.11.1-1-ARCH with i7-4770T/HD4600

$ pacman -Qs intel\|mesa | awk -F"[/ ]" '/^[^ ]/ {printf "%s %s\n", $2, $3}'
glu 9.0.0-2
intel-dri 9.2.0-2
libva-intel-driver 1.2.0-1
mesa 9.2.0-2
mesa-libgl 9.2.0-2
xf86-video-intel 2.21.15-1

$ sudo lspci -nn -vv -x -s
00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller [8086:0412] (rev 06) (prog-if 00 [VGA controller])
	Subsystem: Intel Corporation Device [8086:204c]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 42
	Region 0: Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at f000 [size=64]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00018  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: i915
	Kernel modules: i915
00: 86 80 12 04 07 04 90 00 06 00 00 03 00 00 00 00
10: 04 00 80 f7 00 00 00 00 0c 00 00 e0 00 00 00 00
20: 01 f0 00 00 00 00 00 00 00 00 00 00 86 80 4c 20
30: 00 00 00 00 90 00 00 00 00 00 00 00 0b 01 00 00
Comment 19 dflogeras2 2013-09-22 13:06:05 UTC
Confirmed running amd64 gentoo with

00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09)

3.11.1 and 3.12-rc1 kernels
mesa-3.2.0
xf86-video-intel-2.99.301 (and 2.21.15)
Comment 20 dflogeras2 2013-09-22 13:08:35 UTC
pardon me, that is of course xf86-video-intel-2.99.901
Comment 21 Xavier 2013-09-23 05:11:54 UTC
Confirmed on ArchLinux, firefox 24, i7-4750HQ, intel HD5200 and xf86-video-intel 2.21.15-1
Comment 22 Félix Saparelli 2013-09-28 10:55:50 UTC
Here's a bug I filed with Mozilla: https://bugzilla.mozilla.org/show_bug.cgi?id=913419 which describes the same situation.

Hence confirming with Firefox versions 23.0.1 (stable), 26 and 27 (many nightlies); Arch Linux with several weeks worth of kernels, 3.10.8 through to 3.11.2; Intel 3000 Graphics (Macbook Pro mid-2012); xf86-video-intel 2.21.15.
Comment 23 Chris Wilson 2013-09-28 18:03:37 UTC
If you are not tracking the kernel rcs and the xf86-video-intel snapshots, you are not testing the related fixes to this bug. Confirmation of this bug on 3.12-rc2 and 2.99.903 would be much appreciated.
Comment 24 Joseph Yasi 2013-09-28 18:42:39 UTC
I still see it with 2.99.903 on Haswell with kernel 3.11.2. I'm building 3.12-rc2 now to check.
Comment 25 Joseph Yasi 2013-09-28 20:10:43 UTC
Confirmed with 2.99.903 on Haswell with kernel 3.12-rc2. Is there anything I can do to debug it?
Comment 26 Chris Wilson 2013-09-28 20:16:27 UTC
The current theory is that it is associated with the reuse of the staging upload buffers. I would try testing with different values for

#define DBG_NO_UPLOAD_CACHE
#define DBG_NO_UPLOAD_ACTIVE
#define DBG_NO_MAP_UPLOAD

in src/sna/kgem.c and see if any of those prevent the corruption.
Comment 27 Chris Wilson 2013-09-28 20:20:25 UTC
Oh, I was responding to the wrong ivb bug. Sorry, this is not the staged upload buffers, but a missing hardware workaround. Not sure which hw w/a is required yet...
Comment 28 dflogeras2 2013-09-30 12:58:42 UTC
Still present with 2.99.903 and git-sources-3.12-rc3
Comment 29 Greg Silverstein 2013-10-03 13:52:11 UTC
I can confirm this also occurs on the Iris Pro 5200 GPU as well. 

Kernel 3.11.2-1 Arch Linux
Intel 2.21.15-1
Mesa 9.2.0-2

It also appears to effect some games running opengl. The symptoms are similar black or other colored artifacts on screen.
Comment 30 Nick Hu 2013-10-06 09:03:32 UTC
Occurs with Intel HD 4000

Arch Linux x86_64
linux: 3.11.3-1
mesa: 9.2.0-2
xf86-video-intel: 2.21.15-1
Comment 31 Chris Wilson 2013-10-06 09:19:33 UTC
If you can still reproduce this easily, please try the test from comment 11.
Comment 32 Joseph Yasi 2013-10-06 16:23:10 UTC
I just used the test from comment 11 with git 7284e7f with kernel 3.11.4 on a Haswell i7-4770S. I can easily reproduce it with ALWAYS_FLUSH 0, but I cannot reproduce it with ALWAYS_FLUSH 1. It must be a missing render flush.
Comment 33 Chris Wilson 2013-10-06 16:29:32 UTC
I would not say it was a missing flush - just some operation is not as pipelined as intended and requires a top-of-pipe synchronisation.

To test that possibility, can you grab

commit c3fe60c15763c02b3b6238c77e6350d478cd8982
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Oct 6 17:27:22 2013 +0100

    sna/gen7: Add a always-stall debug option
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=68410
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

and then set

diff --git a/src/sna/gen7_render.c b/src/sna/gen7_render.c
index 9fac7b0f..b678974 100644
--- a/src/sna/gen7_render.c
+++ b/src/sna/gen7_render.c
@@ -46,7 +46,7 @@
 #include "gen4_vertex.h"
 
 #define ALWAYS_FLUSH 0
-#define ALWAYS_STALL 0
+#define ALWAYS_STALL 1
 
 #define NO_COMPOSITE 0
 #define NO_COMPOSITE_SPANS 0
Comment 34 Joseph Yasi 2013-10-06 16:49:01 UTC
It still happens with ALWAYS_STALL 1.
Comment 35 Joseph Yasi 2013-10-06 17:00:41 UTC
There was a nasty crash bug introduced somewhere between git 8980870 and 7284e7f that I'm seeing with Firefox now. It takes down the X server. I'm bisecting to see which commit did it, and I'll file another bug.
Comment 36 Chris Wilson 2013-10-07 10:58:22 UTC
Another step.

commit 55e0f4502657078a666761277bbac56a98b3780c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Oct 7 11:56:23 2013 +0100

    sna/gen7: Rename debug option ALWAYS_FLUSH to ALWAYS_INVALIDATE

gives ALWAYS_FLUSH, ALWAYS_INVALIDATE, ALWAYS_STALL to play with. Can you please play around with those options and see which masks the bug?
Comment 37 dflogeras2 2013-10-07 14:04:44 UTC
Chris

Testing on i5-4200U, kernel 3.12-rc4, mesa 9.2.0 and git xf86-intel-drivers mentioned in comment #36

First compiling exactly from git for a baseline.  Black rectangles persist, however it seems that they are varying in height (where before I think they were all uniform height).

Compiling with only ALWAYS_INVALIDATE set the problem is masked

Compiling with only ALWAYS_FLUSH set the problem is also masked

Compiling with only ALWAYS_STALL the problem is present.

HTH and thanks
Comment 38 dflogeras2 2013-10-07 14:14:03 UTC
Ignore my comment about varying heights, they vary in 2.99.903 as well, perhaps qualitatively different but probably a red herring anyway.
Comment 39 Joseph Yasi 2013-10-07 15:19:02 UTC
I can confirm that with only ALWAYS_INVALIDATE 1 or ALWAYS_FLUSH 1 mask the problem, and ALWAYS_STALL 1 does not.
Comment 40 Chris Wilson 2013-10-07 15:23:01 UTC
How about


diff --git a/src/sna/gen7_render.c b/src/sna/gen7_render.c
index 4b60f53..5764244 100644
--- a/src/sna/gen7_render.c
+++ b/src/sna/gen7_render.c
@@ -801,7 +801,7 @@ gen7_emit_cc(struct sna *sna, uint32_t blend_offset)
 	struct gen7_render_state *render = &sna->render_state.gen7;
 
 	if (render->blend == blend_offset)
-		return;
+		return false;
 
 	DBG(("%s: blend = %x\n", __FUNCTION__, blend_offset));
 
@@ -814,6 +814,7 @@ gen7_emit_cc(struct sna *sna, uint32_t blend_offset)
 	OUT_BATCH((render->cc_blend + blend_offset) | 1);
 
 	render->blend = blend_offset;
+	return blend_offset != NO_BLEND;
 }
 
 static void
@@ -1098,10 +1099,12 @@ gen7_emit_state(struct sna *sna,
 		uint16_t wm_binding_table)
 {
 	bool need_stall;
+	bool need_flush;
 
 	assert(op->dst.bo->exec);
 
-	gen7_emit_cc(sna, GEN7_BLEND(op->u.gen7.flags));
+	need_flush = sna->render_state.gen7.emit_flush;
+	need_flush |= gen7_emit_cc(sna, GEN7_BLEND(op->u.gen7.flags));
 	gen7_emit_sampler(sna, GEN7_SAMPLER(op->u.gen7.flags));
 	gen7_emit_sf(sna, GEN7_VERTEX(op->u.gen7.flags) >> 2);
 	gen7_emit_wm(sna, GEN7_KERNEL(op->u.gen7.flags));
@@ -1120,7 +1123,7 @@ gen7_emit_state(struct sna *sna,
 		sna->render_state.gen7.emit_flush = false;
 		need_stall = false;
 	}
-	if (ALWAYS_FLUSH || (sna->render_state.gen7.emit_flush && GEN7_READS_DST(op->u.gen7.flags))) {
+	if (ALWAYS_FLUSH || (need_flush && GEN7_READS_DST(op->u.gen7.flags))) {
 		gen7_emit_pipe_flush(sna, need_stall);
 		need_stall = false;
 	}
Comment 41 Joseph Yasi 2013-10-07 15:37:56 UTC
It still happens with the patch from comment 40 (plus changing the return type on gen7_emit_cc to bool to match the patch).
Comment 42 Chris Wilson 2013-10-07 20:28:50 UTC
If not CC, the other likey candidate is the change in surfaces, so maybe:

diff --git a/src/sna/gen7_render.c b/src/sna/gen7_render.c
index 4b60f53..1adb9c2 100644
--- a/src/sna/gen7_render.c
+++ b/src/sna/gen7_render.c
@@ -1098,6 +1098,7 @@ gen7_emit_state(struct sna *sna,
 		uint16_t wm_binding_table)
 {
 	bool need_stall;
+	bool need_flush;
 
 	assert(op->dst.bo->exec);
 
@@ -1107,8 +1108,12 @@ gen7_emit_state(struct sna *sna,
 	gen7_emit_wm(sna, GEN7_KERNEL(op->u.gen7.flags));
 	gen7_emit_vertex_elements(sna, op);
 
-	need_stall = gen7_emit_binding_table(sna, wm_binding_table);
-	need_stall &= gen7_emit_drawing_rectangle(sna, op);
+	need_flush = gen7_emit_binding_table(sna, wm_binding_table);
+	need_stall = gen7_emit_drawing_rectangle(sna, op) && need_flush;
+
+	need_flush |= sna->render_state.gen7.emit_flush;
+	if (ALWAYS_FLUSH)
+		need_flush = true;
 	if (ALWAYS_STALL)
 		need_stall = true;
 
@@ -1117,10 +1122,9 @@ gen7_emit_state(struct sna *sna,
 		kgem_clear_dirty(&sna->kgem);
 		assert(op->dst.bo->exec);
 		kgem_bo_mark_dirty(op->dst.bo);
-		sna->render_state.gen7.emit_flush = false;
-		need_stall = false;
+		need_flush = need_stall = false;
 	}
-	if (ALWAYS_FLUSH || (sna->render_state.gen7.emit_flush && GEN7_READS_DST(op->u.gen7.flags))) {
+	if (need_flush) {
 		gen7_emit_pipe_flush(sna, need_stall);
 		need_stall = false;
 	}
Comment 43 dflogeras2 2013-10-07 23:20:29 UTC
Yay, the patch in comment #42 works for me (on setup listed in coment #37).  Thanks for all the perseverance Chris!

To be clear I did not set any #define's to 1 as it was stated before that it just hides the problem.
Comment 44 dflogeras2 2013-10-07 23:31:12 UTC
Spoke too soon, it seems to be manifesting in a new way.  I'll attach a screenshot. Now the text in my URL bar and tab titles is missing small sections.  If I hover/click the regions to force a refresh, the text comes back sporadically, but also goes away as redraws occur.
Comment 45 dflogeras2 2013-10-07 23:31:48 UTC
Created attachment 87260 [details]
New behaviour
Comment 46 Joseph Yasi 2013-10-07 23:37:08 UTC
The patch in comment 42 fixes the black box problem for me. I haven't seen the corruption from comment 45 yet, but I'll report back if I do.
Comment 47 Chris Wilson 2013-10-08 10:07:10 UTC
Let me know what happens. I'm not convinced by that patch yet, it will force a flush in between most operations (as we switch surfaces almost constantly). So it may be fixing things accidentally, and just reducing the frequency of the corruption (a blank square in the glyph string pixmap would explain the "new behaviour").
Comment 48 Chris Wilson 2013-10-08 17:58:17 UTC
Different arrangement to put the flush/stall/invalidate first before changing the pipeline:

diff --git a/src/sna/gen7_render.c b/src/sna/gen7_render.c
index 4b60f53..9c52e38 100644
--- a/src/sna/gen7_render.c
+++ b/src/sna/gen7_render.c
@@ -1097,36 +1097,48 @@ gen7_emit_state(struct sna *sna,
 		const struct sna_composite_op *op,
 		uint16_t wm_binding_table)
 {
+	bool need_invalidate;
+	bool need_flush;
 	bool need_stall;
 
 	assert(op->dst.bo->exec);
 
-	gen7_emit_cc(sna, GEN7_BLEND(op->u.gen7.flags));
-	gen7_emit_sampler(sna, GEN7_SAMPLER(op->u.gen7.flags));
-	gen7_emit_sf(sna, GEN7_VERTEX(op->u.gen7.flags) >> 2);
-	gen7_emit_wm(sna, GEN7_KERNEL(op->u.gen7.flags));
-	gen7_emit_vertex_elements(sna, op);
+	need_invalidate = kgem_bo_is_dirty(op->src.bo) || kgem_bo_is_dirty(op->mask.bo);
+	if (ALWAYS_INVALIDATE)
+		need_invalidate = true;
+
+	need_flush = sna->render_state.gen7.emit_flush;
+	if (ALWAYS_FLUSH)
+		need_flush = true;
 
-	need_stall = gen7_emit_binding_table(sna, wm_binding_table);
+	need_stall = sna->render_state.gen7.surface_table != wm_binding_table;
 	need_stall &= gen7_emit_drawing_rectangle(sna, op);
 	if (ALWAYS_STALL)
 		need_stall = true;
 
-	if (ALWAYS_INVALIDATE || kgem_bo_is_dirty(op->src.bo) || kgem_bo_is_dirty(op->mask.bo)) {
+	if (need_invalidate) {
 		gen7_emit_pipe_invalidate(sna);
 		kgem_clear_dirty(&sna->kgem);
 		assert(op->dst.bo->exec);
 		kgem_bo_mark_dirty(op->dst.bo);
-		sna->render_state.gen7.emit_flush = false;
+
+		need_flush = false;
 		need_stall = false;
 	}
-	if (ALWAYS_FLUSH || (sna->render_state.gen7.emit_flush && GEN7_READS_DST(op->u.gen7.flags))) {
+	if (need_flush) {
 		gen7_emit_pipe_flush(sna, need_stall);
 		need_stall = false;
 	}
 	if (need_stall)
 		gen7_emit_pipe_stall(sna);
 
+	gen7_emit_cc(sna, GEN7_BLEND(op->u.gen7.flags));
+	gen7_emit_sampler(sna, GEN7_SAMPLER(op->u.gen7.flags));
+	gen7_emit_sf(sna, GEN7_VERTEX(op->u.gen7.flags) >> 2);
+	gen7_emit_wm(sna, GEN7_KERNEL(op->u.gen7.flags));
+	gen7_emit_vertex_elements(sna, op);
+	gen7_emit_binding_table(sna, wm_binding_table);
+
 	sna->render_state.gen7.emit_flush = GEN7_READS_DST(op->u.gen7.flags);
 }
Comment 49 Joseph Yasi 2013-10-08 18:19:31 UTC
It still happens with the patch from comment 48, but not with the one from comment 42. I still haven't seen the corruption from comment 45. That comment 42 patch probably is an accidental fix. Have you been able to reproduce it?
Comment 50 Joseph Yasi 2013-10-09 03:23:24 UTC
In case it helps figure out where the problem is, I haven't been able to reproduce this on my ivb laptop (i7-3820QM) since the commit after 2.21.15 (509e7aa). It only happens on hsw (i7-4770S) for me now.
Comment 51 Chris Wilson 2013-10-09 10:19:18 UTC
I have seen the black squares in firefox, but only very rarely. I haven't found a way to reproduce it reliably enough to tackle the issue by myself.
Comment 52 dflogeras2 2013-10-21 01:18:22 UTC
I've updated to xf86-video-intel 33be42bf509b6d57face6b8d99d72dd5f6c90900 with 3.12-rc6 and the problem persists, and I can no longer apply the patch from comment #42.  What do you recommend for short/long term options?
Comment 53 dave 2013-10-23 15:44:10 UTC
I'm not sure if this is helpful or not, but some pages hit this problem more readily than others. This bug page doesn't seem to display it at all. "Prettier" pages like http://iothackboulder-sr.eventbrite.com/ (only example I have at the moment) seems to show the problem pretty consistently.

$ uname -a
Linux javelin 3.11.5-1-ARCH #1 SMP PREEMPT Mon Oct 14 08:31:43 CEST 2013 x86_64 GNU/Linux
$ pacman -Q intel-dri xf86-video-intel
intel-dri 9.2.1-1
xf86-video-intel 2.21.15-1
$
Comment 54 Chris Wilson 2013-10-23 15:51:14 UTC
Same here, I never see it until someone else uses my computer... And then it stops as soon as they leave.
Comment 55 dflogeras2 2013-10-24 00:04:32 UTC
I find that scrolling via tapping the up/down arrow keys on any wikipedia page aggravates it.  In case it is relevant, in firefox I have "smooth scrolling" and "hardware acceleration" checked, and "autoscrolling" unchecked (preferences->advanced tab).
Comment 56 Chris Wilson 2013-10-31 14:01:01 UTC
Hmm, just hit an oddity on gen6 that seems relevant to this bug:

commit 82e6d41c2f4f343bd1854d3d8ee4b624b5d68971
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Oct 31 13:35:59 2013 +0000

    sna/gen6: Tweak flush around CC state changes
    
    In order to fix some font corruption, it appears that we need an extra
    flush in the Sandybridge pipeline when we change the CC stage and the
    render cache is dirty. We previously triggered a full pipeline stall
    for this case.
Comment 57 Chris Wilson 2013-10-31 16:00:19 UTC
Created attachment 88412 [details] [review]
Flush render cache when changing CC state

Can someone give this patch a whirl, please?
Comment 58 Joseph Yasi 2013-10-31 16:28:54 UTC
I'm trying it now. I haven't seen the bug yet with the patch, but I can't reproduce it reliably without the patch with 82e6d41 git either (but I have seen it with 82e6d41). I will report back if I see it happen again with the patch.
Comment 59 dflogeras2 2013-10-31 22:55:38 UTC
No change (still behaves badly) for me.

kernel 3.12-rc7, driver @ 82e6d41c2f4f343bd1854d3d8ee4b624b5d68971 + patch from comment 57, haswell i5-4200U
Comment 60 dflogeras2 2013-11-01 22:12:50 UTC
It seems that when you turn KDE's desktop effects off (alt-shift-f12 to toggle) the problem becomes far far more rare.  I always have the default effects turned on, but on a whim I just switched them off and it was harder to make the boxes appear.
Comment 61 Luke-Jr 2013-11-12 07:44:48 UTC
Very easy to reproduce here on Haswell i7-4771.
KDE (32-bit): 4.10.5 (no compositing)
xf86 driver (32-bit): 2.21.15
Linux (64-bit): 3.12.0
Comment 62 Chris Wilson 2013-11-13 11:28:22 UTC
Created attachment 89133 [details] [review]
Apply depth stall w/a

Here's something to test...
Comment 63 Brett Campbell 2013-11-13 19:26:08 UTC
Thank you for your persistence on this.  Very impressive.

Black boxes plague my Internets! :-)
Comment 64 dflogeras2 2013-11-13 20:28:17 UTC
(In reply to comment #62)
> Created attachment 89133 [details] [review] [review]
> Apply depth stall w/a
> 
> Here's something to test...

No love here
Comment 65 Cyryl Plotnicki-Chudyk 2013-11-13 20:38:53 UTC
Hi
Still seeing this bug on
mesa-9.1.6 
kernel 3.12 [gentoo patches a.k.a. 'gentoo-sources]
xf86-video-intel-2.99.905

thanks for your efforts in trying to fix this bug !

keep on hackin'
Comment 66 Andreas Klauer 2013-11-14 15:19:31 UTC
Seeing this myself on Haswell / Gentoo. Most often happens on StackExchange sites with code elements. The corruption is not just black boxes, those are just the things you see first. There are also squashed/blank characters and their missing bits an pieces reappear in other places on the screen. You can see this (if only a little) in the existing screenshot, I can provide another if it helps any.

I'll test the patches next... thanks for your efforts.
Comment 67 dflogeras2 2013-11-14 17:37:49 UTC
Hmm, I'm also a Gentoo user.  Is there anything that might be specific to Gentoo and its xorg stack that might be exacerbating the issue?
Comment 68 Cyryl Plotnicki-Chudyk 2013-11-14 18:18:14 UTC
I've seen it on one computer under different distros.
So IMHO it's more driver-related than something to do with OS.
Comment 69 Luke-Jr 2013-11-14 18:32:42 UTC
Gentoo here as well, though using a vanilla kernel. It's possible we see this first due to other distros sticking to older kernels longer...
Comment 70 Chris Wilson 2013-11-18 12:47:35 UTC
I think I can detect this using xf86-video-intel/test/render-copyarea
Comment 71 Chris Wilson 2013-11-18 15:04:23 UTC
With #define ALWAYS_INVALIDATE 1, xf86-video-intel/tests/render-copyarea passes.
Comment 72 Chris Wilson 2013-11-18 15:41:14 UTC
For the record, either ALWAYS_INVALIDATE or ALWAYS_FLUSH is sufficient, but not ALWAYS_STALL. That would imply the render cache is unhappy.
Comment 73 Chris Wilson 2013-11-18 18:58:14 UTC
Please try

commit 12f4c48d39b939a3a26a1504910b5f16ee6c86b8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Nov 18 16:59:08 2013 +0000

    sna/gen7: Rework random GPU flushing
    
    There seems to be no clear rationale for these flushes. They have been
    empirically derived to pass test/render-copyarea and rendercheck,
    without impacting upon a few simple benchmarks.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=68410
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

(i.e. tip of xf86-video-intel.git)
Comment 74 dflogeras2 2013-11-18 21:08:02 UTC
(In reply to comment #73)
> Please try
> 
> commit 12f4c48d39b939a3a26a1504910b5f16ee6c86b8
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Nov 18 16:59:08 2013 +0000

That made the problem disappear for me at least :)  Thank you very very much for hunting this down!
Comment 75 Chris Wilson 2013-11-19 09:53:30 UTC
(In reply to comment #74)
> That made the problem disappear for me at least :)  Thank you very very much
> for hunting this down!

Thanks for the feedback, is anyone still seeing the black boxes? A couple more positive responses and I'll mark this closed for the time being.
Comment 76 Andreas Klauer 2013-11-19 14:07:59 UTC
Thank you so much, it seems to work fine with the current git version.
Comment 77 Andreas Klauer 2013-11-19 15:31:08 UTC
I might have spoken too early.

If it's not a bug of my window manager (fluxbox) that somehow surfaced the moment I updated my xf86-video-intel, what I am seeing now instead is entire windows appearing where they shouldn't be.

For example I have urxvt open, then I close it. The next second it reappears, then disappears, then reappears, so effectively it "blinks" on and off. Does that even make any sense?

In fluxbox I have a clock in the lower right corner that includes seconds, so it updates every second. The window disappearing and reappearing happens exactly at the same time, so it seems to be related somehow.

Apart from urxvt I also had this happen with Anki, and gmpc. One time it started when I clicked a button in Anki, then gmpc started "blinking", but the button I clicked stayed where it was and appeared in the middle of the "gmpc" window where it doesn't belong.

I can't reliably reproduce this as of yet, it happened only two or three times since the time I wrote my last comment...
Comment 78 Chris Wilson 2013-11-19 16:22:34 UTC
(In reply to comment #77)
> I might have spoken too early.

The only thing I can say for sure is that that is an entirely different phenomenon.
Comment 79 Michael 2013-11-19 18:02:16 UTC
Hi Chris, I applied your patch 12f4c48d39b939a3a26a1504910b5f16ee6c86b8 to xf86-video-intel-2.99.906. The problem with the little black boxes has disappeared. The patch seems to work... Thanks!

CPU: Intel Core i5-3570K, Gentoo Linux x86_64, Kernel 3.10, Xorg-Server 1.14.3
Comment 80 Cyryl Plotnicki-Chudyk 2013-11-19 18:31:31 UTC
works for me now, compiled from scratch on gentoofrom the git tip, haswell i7, kernel 3.12

thanks a lot !
Comment 81 Chris Wilson 2013-11-19 21:27:27 UTC
Let's close this and hope it is gone for good.
Comment 82 Andreas Klauer 2013-11-21 12:35:55 UTC
(In reply to comment #78)
> (In reply to comment #77)
> > I might have spoken too early.
> 
> The only thing I can say for sure is that that is an entirely different
> phenomenon.

Sorry for spamming :)

It seems indeed to be a different issue. I'm seeing the same thing in 2.99.906 - along with the black boxes. It might've been in 2.99.905, just not as extreme.

So your black box fix is good and I'll have to hunt whatever issue I'm having separately.
Comment 83 Andrei Cristian Petcu 2013-12-09 08:32:36 UTC
Hi guys, do you have any idea when this will be on Arch and what package/version should I expect the fix in?
Comment 84 Keith Curtis 2014-01-03 01:40:25 UTC
Thank you for fixing this bug. It was easy to repro on Haswell with Firefox and sometimes I'd have up to 50 of these 8x4 rectangles on the screen at once! It just hit Arch testing today and I'm happy it is gone.
Comment 85 Andrei Cristian Petcu 2014-01-21 08:01:25 UTC
It is fixed now by xf86-video-intel 2.99.907-2. This is in the ArchLinux "extra" repository.
Comment 86 Alexa 2014-02-03 01:29:33 UTC
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen live from the domain http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.
Comment 87 behrooz 2014-10-06 12:57:59 UTC
Created attachment 107421 [details]
screenshot
Comment 88 behrooz 2014-10-06 13:05:59 UTC
i am running KDE 4.13.3 on kubuntu 14.04.1 on lenovo G-500 with intel HD 4000
when i use Firefox or other browser or even other place of kubuntu like dolphin or somewhere see black box corruption on screen . i attach screen shot and Xorg log.  Andrei Cristian Petcu  say this bug fixed . but i have still it .
Comment 89 behrooz 2014-10-06 13:07:06 UTC
Created attachment 107423 [details]
Xorg.0.log