This test intermittently fails on broadwell, sometimes the output makes sense, sometimes it doesn't. For example, this failure (This is a fail, not a pass): /tmp/build_root/m64/lib/piglit/bin/arb_shader_atomic_counters-unused-result -auto -fbo Probe value at (0) Expected: 0x00000001 Observed: 0x00000001
It's bad synchronisation of the read buffer: diff --git a/tests/spec/arb_shader_atomic_counters/common.c b/tests/spec/arb_shader_atomic_counters/common.c index 95d809e..bf8309d 100644 --- a/tests/spec/arb_shader_atomic_counters/common.c +++ b/tests/spec/arb_shader_atomic_counters/common.c @@ -35,6 +35,7 @@ atomic_counters_probe_buffer(unsigned base, unsigned count, uint32_t *p = glMapBufferRange( GL_ATOMIC_COUNTER_BUFFER, base * sizeof(uint32_t), count * sizeof(uint32_t), GL_MAP_READ_BIT); + bool pass = true; unsigned i; if (!p) { @@ -43,17 +44,18 @@ atomic_counters_probe_buffer(unsigned base, unsigned count, } for (i = 0; i < count; ++i) { - if (p[i] != expected[i]) { + uint32_t *found = p[i]; + if (found != expected[i]) { printf("Probe value at (%i)\n", i); printf(" Expected: 0x%08x\n", expected[i]); - printf(" Observed: 0x%08x\n", p[i]); - glUnmapBuffer(GL_ATOMIC_COUNTER_BUFFER); - return false; + printf(" Observed: 0x%08x\n", found); + pass = false; + break; } } glUnmapBuffer(GL_ATOMIC_COUNTER_BUFFER); - return true; + return pass; } bool should fix up the error message.
s/*found/found/ ofc
I sent ickle's patch to the mailing list, and fixed the same bug in shader_runner and a compute shader test. However, this just fixes the test failure message to make sense. The driver still has a race condition that is causing these intermittent failures, and we need to fix that.
It definitely seems to be some kind of race - adding usleep(1000) in the test inbetween glMapBufferRange and the p[i] access effectively hides the problem. It also works fine on Haswell - I've run it for a few hours with no failures. INTEL_DEBUG=sync doesn't help. always_flush_batch=true seems to make it fail a little more often...it at minimum doesn't help...
Chris, by chance do you have any ideas about this bug? I'm fairly stumped...
piglit.spec.arb_fragment_layer_viewport.layer-gs-writes-in-range depends on atomic counters and suffers from this bug.
I've sent a test to the mailing list that might be related to this same issue. Adding syncs or sleeps does not seem to help there though. http://lists.freedesktop.org/archives/piglit/2015-November/018227.html
As another maddening data point... I've never been able to reproduce this on my ThinkPad X250. I had thought it could be a kernel regression (I was on a quite old kernel), but I just ran the test 100 times in a loop without a single failure on 4.2.6-301.fc23.x86_64. Perhaps it's only a problem with GBM runs? I don't use GBM.
I've been trying to reproduce this failure on BDW today without success. It doesn't seem to make a difference for me whether I use X or GBM to run the test case. I noticed though that the DRM render ring code is missing a DC flush on batchbuffer submission, what could lead to a non-deterministic failure in this test case. Can anyone try out the attached kernel patch and see if it helps? If it does this should be considered a DRM bug.
Created attachment 120435 [details] [review] gen7_flush_batch_dc_writes.patch
Curro's patch fixes the tests that are intermittent. However, upgrading to Linux 4.3 (to apply the patch) produces an intermitten failures on piglit.spec.arb_shader_image_load_store.qualifiers. I'll write that bug up.
(In reply to Mark Janes from comment #11) > Curro's patch fixes the tests that are intermittent. > > However, upgrading to Linux 4.3 (to apply the patch) produces an intermitten > failures on piglit.spec.arb_shader_image_load_store.qualifiers. I'll write > that bug up. Thanks, I've added your tested-by and sent the patch for review.
Fix pushed to drm-intel-next-queued as commit 965fd602a6436f689f4f2fe40a6789582778ccd5 Author: Francisco Jerez <currojerez@riseup.net> Date: Wed Jan 13 18:59:39 2016 -0800 drm/i915: Make sure DC writes are coherent on flush.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.