System Environment: -------------------------- Arch: i386 Platform: Sandybridge Libdrm: (master)2.4.32-2-g617213357e94299a5e9a3cb1342de55de949d156 Mesa: (master)9cb777eb71dde895ca0ad3454a9b44252e9b402e Xserver: (master)xorg-server-1.12.0-47-g7f3997b01a8813c1d7055317eb06111aed8572c9 Xf86_video_intel: (master)2.18.0-151-g2e7b5f7eafbf452c781e50eba7dc8323260af59e Libva: (vaapi-ext)f8be4a5477e58c4a27170f0c87eeedac8de60aef Libva_intel_driver:(vaapi-ext)82fa52510a37ab645daaa3bb7091ff5096a20d0b Kernel: (drm-intel-next-queued) e7e58eb5c0d1d7d1a42fcb2b5a247d28ec08b47e Bug detailed description: ----------------------------- It fails on sandybridge with queued kernel. It doesn't happen on fixes kernel. Bisect shows:3ae5378330f581466b05bcea2dd3bc8731a598cb is the first bad commit. commit 3ae5378330f581466b05bcea2dd3bc8731a598cb Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Sun Mar 25 19:47:33 2012 +0200 drm/i915: don't use gtt_pwrite on LLC cached objects ~120 µs instead fo ~210 µs to write 1mb on my snb. I like this. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reproduce steps: ---------------------------- 1. start X 2. ./bin/glean -r /tmp/occluQry -o -v -v -v -t +occluQry
Created attachment 59253 [details] occluquery output info
Created attachment 59474 [details] [review] manual revert Can you please test whether this manual revert of the bisected bad commit fixes the bug? git revert doesn't work due to conflicts. This is just to check that the bisect result is stable.
Revert commit 3ae5378330, Issue doesn't happen.
This just appears to be a latent elephant. The pipecontrol write of the depth count does not appear to cache coherent and fails to replace the value in the CPU cache for the queryobj: diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i96 index 72b83f4..b792642 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -307,12 +307,11 @@ brw_prepare_query_begin(struct brw_context *brw) if (brw->query.bo == NULL || brw->query.index * 2 + 1 >= 4096 / sizeof(uint64_t)) { drm_intel_bo_unreference(brw->query.bo); - brw->query.bo = NULL; - brw->query.bo = drm_intel_bo_alloc(intel->bufmgr, "query", 4096, 1); /* clear target buffer */ - drm_intel_bo_map(brw->query.bo, true); + //drm_intel_gem_bo_map_gtt(brw->query.bo, true); + drm_intel_gem_bo_map_gtt(brw->query.bo); memset((char *)brw->query.bo->virtual, 0, 4096); drm_intel_bo_unmap(brw->query.bo);
Also: diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 4171666..086a505 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -399,10 +399,17 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj, * pipe_control writes because the gpu doesn't properly redirect them * through the ppgtt for non_secure batchbuffers. */ if (unlikely(IS_GEN6(dev) && - reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION && - !target_i915_obj->has_global_gtt_mapping)) { - i915_gem_gtt_bind_object(target_i915_obj, - target_i915_obj->cache_level); + reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) { + /* First we have an issue with cache-coherency of pc writes */ + ret = i915_gem_object_set_cache_level(target_i915_obj, + I915_CACHE_NONE); + if (ret) + return ret; + + /* And then fix up the GTT mapping */ + if (!target_i915_obj->has_global_gtt_mapping) + i915_gem_gtt_bind_object(target_i915_obj, + target_i915_obj->cache_level); } /* and update the user's relocation entry */
Please test both patches from Chris (i.e. both the kernel patch and the mesa patch) separately. Due to the no regression policy of the kernel we need both.
Can we do a check on whether this bug is also present on IvyBridge?
It doesn't happen on IvyBridge. Issue goes away if only update mesa patch brw_queryobj.c. Issue still exists if only update kernel patch i915_gem_execbuffer.c. Update both kernel and mesa patch, Issue goes away.
Hmm, that kernel patch worksforme. :|
Having just rechecked, the kernel patch I posted doesn't work for me either. Darn. So either that is not sufficient to push it out of the cpu cache, or the hypothesis is wrong.
This bug is much older than the bisect indicates, it just depends critically upon state and buffer reuse. Removing regression as it is clearly not.
The reason I thought the kernel patch worked was that I used the wrong libGL after a reboot (the older system library). Following this to its logical conclusion, cbd464a117317f276b65cbca69d6339166581bf7 is the first bad commit commit cbd464a117317f276b65cbca69d6339166581bf7 Author: Eric Anholt <eric@anholt.net> Date: Thu Jan 12 13:01:21 2012 -0800 i965: Fix leak of the program cache BO on context destroy. NOTE: This is a candidate for the 8.0 branch. (Manually verified by using the previous commit, a revert and by setting disable reuse on the cache bo.) How bizarre is that. The interesting thing about the program cache is its use of the INSTRUCTION domain.
Still in the realms of the inexplicable: diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/ index 4ae8e12..fc8b3ea 100644 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c @@ -386,6 +386,7 @@ brw_destroy_cache(struct brw_context *brw, struct brw_cache DBG("%s\n", __FUNCTION__); + drm_intel_bo_disable_reuse(cache->bo); drm_intel_bo_unreference(cache->bo); cache->bo = NULL; brw_clear_cache(brw, cache);
Yet another workaround: diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c index 563fec4..bc250ef 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -45,6 +45,8 @@ #include "intel_batchbuffer.h" #include "intel_reg.h" +#include <xf86drm.h> + /** Waits on the query object's BO and totals the results for this query */ static void brw_queryobj_get_results(struct gl_context *ctx, @@ -307,8 +309,49 @@ brw_prepare_query_begin(struct brw_context *brw) if (brw->query.bo == NULL || brw->query.index * 2 + 1 >= 4096 / sizeof(uint64_t)) { drm_intel_bo_unreference(brw->query.bo); - brw->query.bo = drm_intel_bo_alloc(intel->bufmgr, "query", 4096, 1); - + brw->query.bo = drm_intel_bo_alloc(intel->bufmgr, "query", 4096, 0); + + { + struct drm_i915_gem_execbuffer2 execbuf; + struct drm_i915_gem_exec_object2 exec[2]; + struct drm_i915_gem_relocation_entry reloc[1]; + uint32_t batch[6]; + drm_intel_bo *bo = drm_intel_bo_alloc(intel->bufmgr, + "clear", 4096, 0); + + batch[0] = 2<<29 | 0x40<<22 | 0x3<<20 | 0x3; + batch[1] = 0x3<<24 | 0x0 << 16 | 4096; + batch[2] = 1 << 16 | 4096; + batch[3] = 0; + batch[4] = 0; + batch[5] = MI_BATCH_BUFFER_END; + + drm_intel_bo_subdata(bo, 0, sizeof(batch), batch); + + memset(reloc, 0, sizeof(reloc)); + reloc->offset = 3 * sizeof(uint32_t); + reloc->target_handle = brw->query.bo->handle; + reloc->read_domains = I915_GEM_DOMAIN_RENDER; + reloc->write_domain = I915_GEM_DOMAIN_RENDER; + + memset(exec, 0, sizeof(exec)); + exec[0].handle = brw->query.bo->handle; + exec[1].handle = bo->handle; + exec[1].relocation_count = 1; + exec[1].relocs_ptr = (uintptr_t)reloc; + + memset(&execbuf, 0, sizeof(execbuf)); + execbuf.buffers_ptr = (uintptr_t)exec; + execbuf.buffer_count = 2; + execbuf.batch_len = sizeof(batch); + execbuf.flags = I915_EXEC_BLT; + + drmIoctl(intel->driFd, + DRM_IOCTL_I915_GEM_EXECBUFFER2, + &execbuf); + + drm_intel_bo_unreference(bo); + }
Note that last w/a is on top of commenting out the memset(drm_intel_bo_map(query.bo), 0, 4096);
And to forestall any potential questions, the write (+ implicit FLUSH_DW) is necessary. Just executing a MI_BATCH_BUFFER_END (with appropriate relocations to get incur the same implicit flush and semaphores) has no effect. What color do you wish this elephant?
However, performing the clear on the instruction cache->bo is not sufficient. So that particular aspect I think is a red herring.
Created attachment 60054 [details] [review] Workaround occQuery fail I'm out of ideas.
The irony is that the attached workaround causes './bin/occlusion_query -auto' to fail. (GTT clear is fine for example as also disable_reuse().)
Created attachment 60077 [details] [review] Workaround occQuery fail Updated workaround. The hardware doesn't always write the full 64-bits(!!!) so we have to explicitly clear the bo as well.
Re-prod the gl team about this funky pipe_control depth write issue on snb ...
Created attachment 63482 [details] [review] add CS_STALL to render cache flush Please test this kernel patch on latest mesa - maybe we're lucky and this fixes the occlusion queries, too.
Still fails with CS_STALL and no w/a.
Add CS_STALL patch, this issue still exists.
It turns out that this is documented behaviour, in the ISA reference manual it is mentioned in passing that the instruction cache is read-only in order to prevent self-modifying programs. So the issue is really that we marked it as being in the instruction cache for kernel, and then in the instruction cache again for the oq write. Due to the behaviour of the old domain tracking this should have prevented the instruction cache from being invalidated, whereas forcibly moving to the GTT or RENDER write domain would have provoked a flush. If I am right, this should now be fixed with the flushing list removal and unconditional invalidate/flushing.
Retest on the following environment, Issue still exists. Mesa: (master)dcf8754cce1af09547a5976a74ba807bc6f2657c Kernel: (drm-intel-next-queued) ab3951eb74e7c33a2f5b7b64d72e82f1eea61571
Likely fix: http://lists.freedesktop.org/archives/intel-gfx/2012-July/019344.html
Should be fixed on -fixes with: commit e844b990b1df9242bb91b7d490552f3198946838 Author: Eric Anholt <eric@anholt.net> Date: Tue Jul 31 15:35:01 2012 -0700 drm/i915: Don't forget to apply SNB PIPE_CONTROL GTT workaround.
Can't find commit e844b990b1df9242bb91b7d490552f3198946838 on -fixes kernel 5ab3633d6907018b0b. Can't find "drm/i915: Don't forget to apply SNB PIPE_CONTROL GTT workaround." on -queued kernel commit:ab3951eb74e7.
It's there, really ;-)
Commit e844b990b1 is in -fixes kernel. But can't find "drm/i915: Don't forget to apply SNB PIPE_CONTROL GTT workaround" on -queued kernel. This issue still happens on the latest -queued kernel( 0d8957c8a90bbb).
See my explanation by mail that -queued is not always a subset of -fixes. If I say a fix has been merged to -fixes, you need to check -fixes to confirm. The -queued branch can still be affected (until I do the next backmerge/rebase, which happens every few weeks). I'll close this as confirmed.
Closing old verified+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.