Bug 48019 - [Bisected SNB] Piglit glean/occluquery regressed on -next kernel
Summary: [Bisected SNB] Piglit glean/occluquery regressed on -next kernel
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium major
Assignee: Daniel Vetter
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-28 22:11 UTC by lu hua
Modified: 2017-09-04 10:10 UTC (History)
9 users (show)

See Also:
i915 platform:
i915 features:


Attachments
occluquery output info (16.00 KB, text/plain)
2012-03-29 22:10 UTC, lu hua
no flags Details
manual revert (821 bytes, patch)
2012-04-04 07:03 UTC, Daniel Vetter
no flags Details | Splinter Review
Workaround occQuery fail (2.97 KB, patch)
2012-04-16 03:56 UTC, Chris Wilson
no flags Details | Splinter Review
Workaround occQuery fail (3.12 KB, patch)
2012-04-16 10:40 UTC, Chris Wilson
no flags Details | Splinter Review
add CS_STALL to render cache flush (2.50 KB, patch)
2012-06-26 03:41 UTC, Daniel Vetter
no flags Details | Splinter Review

Description lu hua 2012-03-28 22:11:37 UTC
System Environment:
--------------------------
Arch:             i386
Platform:         Sandybridge
Libdrm:		(master)2.4.32-2-g617213357e94299a5e9a3cb1342de55de949d156
Mesa:		(master)9cb777eb71dde895ca0ad3454a9b44252e9b402e
Xserver:	(master)xorg-server-1.12.0-47-g7f3997b01a8813c1d7055317eb06111aed8572c9
Xf86_video_intel:	(master)2.18.0-151-g2e7b5f7eafbf452c781e50eba7dc8323260af59e
Libva:		(vaapi-ext)f8be4a5477e58c4a27170f0c87eeedac8de60aef
Libva_intel_driver:(vaapi-ext)82fa52510a37ab645daaa3bb7091ff5096a20d0b
Kernel:	(drm-intel-next-queued) e7e58eb5c0d1d7d1a42fcb2b5a247d28ec08b47e

Bug detailed description:
-----------------------------
It fails on sandybridge with queued kernel. It doesn't happen on fixes kernel.

Bisect shows:3ae5378330f581466b05bcea2dd3bc8731a598cb is the first bad commit.
commit 3ae5378330f581466b05bcea2dd3bc8731a598cb
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sun Mar 25 19:47:33 2012 +0200

    drm/i915: don't use gtt_pwrite on LLC cached objects

    ~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.

    Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Reproduce steps:
----------------------------
1. start X
2. ./bin/glean -r /tmp/occluQry -o -v -v -v -t +occluQry
Comment 1 lu hua 2012-03-29 22:10:47 UTC
Created attachment 59253 [details]
occluquery output info
Comment 2 Daniel Vetter 2012-04-04 07:03:41 UTC
Created attachment 59474 [details] [review]
manual revert

Can you please test whether this manual revert of the bisected bad commit fixes the bug? git revert doesn't work due to conflicts.

This is just to check that the bisect result is stable.
Comment 3 lu hua 2012-04-04 19:45:06 UTC
Revert commit 3ae5378330, Issue doesn't happen.
Comment 4 Chris Wilson 2012-04-11 06:35:42 UTC
This just appears to be a latent elephant. The pipecontrol write of the depth count does not appear to cache coherent and fails to replace the value in the CPU cache for the queryobj:

diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i96
index 72b83f4..b792642 100644
--- a/src/mesa/drivers/dri/i965/brw_queryobj.c
+++ b/src/mesa/drivers/dri/i965/brw_queryobj.c
@@ -307,12 +307,11 @@ brw_prepare_query_begin(struct brw_context *brw)
    if (brw->query.bo == NULL ||
        brw->query.index * 2 + 1 >= 4096 / sizeof(uint64_t)) {
       drm_intel_bo_unreference(brw->query.bo);
-      brw->query.bo = NULL;
-
       brw->query.bo = drm_intel_bo_alloc(intel->bufmgr, "query", 4096, 1);
 
       /* clear target buffer */
-      drm_intel_bo_map(brw->query.bo, true);
+      //drm_intel_gem_bo_map_gtt(brw->query.bo, true);
+      drm_intel_gem_bo_map_gtt(brw->query.bo);
       memset((char *)brw->query.bo->virtual, 0, 4096);
       drm_intel_bo_unmap(brw->query.bo);
Comment 5 Chris Wilson 2012-04-11 06:53:55 UTC
Also:

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 4171666..086a505 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -399,10 +399,17 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	 * pipe_control writes because the gpu doesn't properly redirect them
 	 * through the ppgtt for non_secure batchbuffers. */
 	if (unlikely(IS_GEN6(dev) &&
-	    reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION &&
-	    !target_i915_obj->has_global_gtt_mapping)) {
-		i915_gem_gtt_bind_object(target_i915_obj,
-					 target_i915_obj->cache_level);
+		     reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
+		/* First we have an issue with cache-coherency of pc writes */
+		ret = i915_gem_object_set_cache_level(target_i915_obj,
+						      I915_CACHE_NONE);
+		if (ret)
+			return ret;
+
+		/* And then fix up the GTT mapping */
+		if (!target_i915_obj->has_global_gtt_mapping)
+			i915_gem_gtt_bind_object(target_i915_obj,
+						 target_i915_obj->cache_level);
 	}
 
 	/* and update the user's relocation entry */
Comment 6 Daniel Vetter 2012-04-11 07:03:29 UTC
Please test both patches from Chris (i.e. both the kernel patch and the mesa patch) separately. Due to the no regression policy of the kernel we need both.
Comment 7 Chris Wilson 2012-04-11 07:07:51 UTC
Can we do a check on whether this bug is also present on IvyBridge?
Comment 8 lu hua 2012-04-12 00:21:19 UTC
It doesn't happen on IvyBridge.

Issue goes away if only update mesa patch brw_queryobj.c.
Issue still exists if only update kernel patch i915_gem_execbuffer.c.
Update both kernel and mesa patch, Issue goes away.
Comment 9 Chris Wilson 2012-04-12 08:54:57 UTC
Hmm, that kernel patch worksforme. :|
Comment 10 Chris Wilson 2012-04-12 09:46:56 UTC
Having just rechecked, the kernel patch I posted doesn't work for me either. Darn.

So either that is not sufficient to push it out of the cpu cache, or the hypothesis is wrong.
Comment 11 Chris Wilson 2012-04-12 10:41:10 UTC
This bug is much older than the bisect indicates, it just depends critically upon state and buffer reuse.

Removing regression as it is clearly not.
Comment 12 Chris Wilson 2012-04-12 11:46:19 UTC
The reason I thought the kernel patch worked was that I used the wrong libGL after a reboot (the older system library). Following this to its logical conclusion,

cbd464a117317f276b65cbca69d6339166581bf7 is the first bad commit
commit cbd464a117317f276b65cbca69d6339166581bf7
Author: Eric Anholt <eric@anholt.net>
Date:   Thu Jan 12 13:01:21 2012 -0800

    i965: Fix leak of the program cache BO on context destroy.
    
    NOTE: This is a candidate for the 8.0 branch.

(Manually verified by using the previous commit, a revert and by setting disable reuse on the cache bo.)

How bizarre is that. The interesting thing about the program cache is its use of the INSTRUCTION domain.
Comment 13 Chris Wilson 2012-04-12 11:54:25 UTC
Still in the realms of the inexplicable:

diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/
index 4ae8e12..fc8b3ea 100644
--- a/src/mesa/drivers/dri/i965/brw_state_cache.c
+++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
@@ -386,6 +386,7 @@ brw_destroy_cache(struct brw_context *brw, struct brw_cache 
 
    DBG("%s\n", __FUNCTION__);
 
+   drm_intel_bo_disable_reuse(cache->bo);
    drm_intel_bo_unreference(cache->bo);
    cache->bo = NULL;
    brw_clear_cache(brw, cache);
Comment 14 Chris Wilson 2012-04-16 03:00:30 UTC
Yet another workaround:

diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c
index 563fec4..bc250ef 100644
--- a/src/mesa/drivers/dri/i965/brw_queryobj.c
+++ b/src/mesa/drivers/dri/i965/brw_queryobj.c
@@ -45,6 +45,8 @@
 #include "intel_batchbuffer.h"
 #include "intel_reg.h"
 
+#include <xf86drm.h>
+
 /** Waits on the query object's BO and totals the results for this query */
 static void
 brw_queryobj_get_results(struct gl_context *ctx,
@@ -307,8 +309,49 @@ brw_prepare_query_begin(struct brw_context *brw)
    if (brw->query.bo == NULL ||
        brw->query.index * 2 + 1 >= 4096 / sizeof(uint64_t)) {
       drm_intel_bo_unreference(brw->query.bo);
-      brw->query.bo = drm_intel_bo_alloc(intel->bufmgr, "query", 4096, 1);
-
+      brw->query.bo = drm_intel_bo_alloc(intel->bufmgr, "query", 4096, 0);
+
+      {
+	      struct drm_i915_gem_execbuffer2 execbuf;
+	      struct drm_i915_gem_exec_object2 exec[2];
+	      struct drm_i915_gem_relocation_entry reloc[1];
+	      uint32_t batch[6];
+	      drm_intel_bo *bo = drm_intel_bo_alloc(intel->bufmgr,
+						    "clear", 4096, 0);
+
+	      batch[0] = 2<<29 | 0x40<<22 | 0x3<<20 | 0x3;
+	      batch[1] = 0x3<<24 | 0x0 << 16 | 4096;
+	      batch[2] = 1 << 16 | 4096;
+	      batch[3] = 0;
+	      batch[4] = 0;
+	      batch[5] = MI_BATCH_BUFFER_END;
+
+	      drm_intel_bo_subdata(bo, 0, sizeof(batch), batch);
+
+	      memset(reloc, 0, sizeof(reloc));
+	      reloc->offset = 3 * sizeof(uint32_t);
+	      reloc->target_handle = brw->query.bo->handle;
+	      reloc->read_domains = I915_GEM_DOMAIN_RENDER;
+	      reloc->write_domain = I915_GEM_DOMAIN_RENDER;
+
+	      memset(exec, 0, sizeof(exec));
+	      exec[0].handle = brw->query.bo->handle;
+	      exec[1].handle = bo->handle;
+	      exec[1].relocation_count = 1;
+	      exec[1].relocs_ptr = (uintptr_t)reloc;
+
+	      memset(&execbuf, 0, sizeof(execbuf));
+	      execbuf.buffers_ptr = (uintptr_t)exec;
+	      execbuf.buffer_count = 2;
+	      execbuf.batch_len = sizeof(batch);
+	      execbuf.flags = I915_EXEC_BLT;
+
+	      drmIoctl(intel->driFd,
+		       DRM_IOCTL_I915_GEM_EXECBUFFER2,
+		       &execbuf);
+
+	      drm_intel_bo_unreference(bo);
+      }
Comment 15 Chris Wilson 2012-04-16 03:08:16 UTC
Note that last w/a is on top of commenting out the memset(drm_intel_bo_map(query.bo), 0, 4096);
Comment 16 Chris Wilson 2012-04-16 03:15:57 UTC
And to forestall any potential questions, the write (+ implicit FLUSH_DW) is necessary. Just executing a MI_BATCH_BUFFER_END (with appropriate relocations to get incur the same implicit flush and semaphores) has no effect.

What color do you wish this elephant?
Comment 17 Chris Wilson 2012-04-16 03:31:14 UTC
However, performing the clear on the instruction cache->bo is not sufficient. So that particular aspect I think is a red herring.
Comment 18 Chris Wilson 2012-04-16 03:56:28 UTC
Created attachment 60054 [details] [review]
Workaround occQuery fail

I'm out of ideas.
Comment 19 Chris Wilson 2012-04-16 08:45:48 UTC
The irony is that the attached workaround causes './bin/occlusion_query -auto' to fail. (GTT clear is fine for example as also disable_reuse().)
Comment 20 Chris Wilson 2012-04-16 10:40:16 UTC
Created attachment 60077 [details] [review]
Workaround occQuery fail

Updated workaround. The hardware doesn't always write the full 64-bits(!!!) so we have to explicitly clear the bo as well.
Comment 21 Daniel Vetter 2012-05-21 09:46:39 UTC
Re-prod the gl team about this funky pipe_control depth write issue on snb ...
Comment 22 Daniel Vetter 2012-06-26 03:41:05 UTC
Created attachment 63482 [details] [review]
add CS_STALL to render cache flush

Please test this kernel patch on latest mesa - maybe we're lucky and this fixes the occlusion queries, too.
Comment 23 Chris Wilson 2012-06-26 05:00:42 UTC
Still fails with CS_STALL and no w/a.
Comment 24 lu hua 2012-06-26 23:16:24 UTC
Add CS_STALL patch, this issue still exists.
Comment 25 Chris Wilson 2012-07-28 21:23:49 UTC
It turns out that this is documented behaviour, in the ISA reference manual it is mentioned in passing that the instruction cache is read-only in order to prevent self-modifying programs.

So the issue is really that we marked it as being in the instruction cache for kernel, and then in the instruction cache again for the oq write. Due to the behaviour of the old domain tracking this should have prevented the instruction cache from being invalidated, whereas forcibly moving to the GTT or RENDER write domain would have provoked a flush. If I am right, this should now be fixed with the flushing list removal and unconditional invalidate/flushing.
Comment 26 lu hua 2012-07-31 07:09:37 UTC
Retest on the following environment, Issue still exists.
Mesa:	(master)dcf8754cce1af09547a5976a74ba807bc6f2657c
Kernel:	(drm-intel-next-queued) ab3951eb74e7c33a2f5b7b64d72e82f1eea61571
Comment 28 Daniel Vetter 2012-08-05 19:47:51 UTC
Should be fixed on -fixes with:

commit e844b990b1df9242bb91b7d490552f3198946838
Author: Eric Anholt <eric@anholt.net>
Date:   Tue Jul 31 15:35:01 2012 -0700


    drm/i915: Don't forget to apply SNB PIPE_CONTROL GTT workaround.
Comment 29 lu hua 2012-08-08 02:19:53 UTC
Can't find commit e844b990b1df9242bb91b7d490552f3198946838 on -fixes kernel  5ab3633d6907018b0b.
Can't find "drm/i915: Don't forget to apply SNB PIPE_CONTROL GTT
workaround." on  -queued kernel commit:ab3951eb74e7.
Comment 30 Daniel Vetter 2012-08-08 08:16:54 UTC
It's there, really ;-)
Comment 31 lu hua 2012-08-10 03:20:40 UTC
Commit e844b990b1 is in -fixes kernel.
But can't find "drm/i915: Don't forget to apply SNB PIPE_CONTROL GTT workaround" on -queued kernel.
This issue still happens on the latest -queued kernel( 0d8957c8a90bbb).
Comment 32 Daniel Vetter 2012-08-10 08:53:28 UTC
See my explanation by mail that -queued is not always a subset of -fixes. If I say a fix has been merged to -fixes, you need to check -fixes to confirm. The -queued branch can still be affected (until I do the next backmerge/rebase, which happens every few weeks). I'll close this as confirmed.
Comment 33 Jari Tahvanainen 2017-09-04 10:10:25 UTC
Closing old verified+fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.