arb_shader_image_load_store.execution.basic-imagestore-from-uniform intermittently fails on broadwell gt2 and braswell: Standard Output /tmp/build_root/m64/lib/piglit/bin/shader_runner /tmp/build_root/m64/lib/piglit/tests/spec/arb_shader_image_load_store/execution/basic-imageStore-from-uniform.shader_test -auto piglit: debug: Requested an OpenGL 3.3 Core Context, and received a matching 3.3 context Probe color at (0,0) Expected: 0.000000 1.000000 0.000000 1.000000 Observed: 1.000000 0.000000 0.000000 1.000000 This behavior occurred when image_load_store was enabled in d03c657. The test has been disabled on BDW and BSW, because intermittent tests make it more difficult to track regressions.
I've been looking into this today and I don't think that this test could actually be causing the failure you've seen, it's pretty much trivial, most likely it was run concurrently with some other test that hung the GPU causing it to misrender (can you confirm if there was a GPU hang message in your kernel logs when the failure occurred?). I've managed to reproduce an intermittent hang on BDW by running piglit in a loop, but it comes from the spec@arb_shader_image_load_store@host-mem-barrier Indirect/RaW subtest. It seems rather difficult to reproduce, it often takes dozens of runs until the first failure occurs, which is why it didn't catch my eye until now. It looks like the hang is caused by some interaction between context switches with the hardware UAV coherency function on the VS stage, which leads the VS to stall indefinitely for some reason I don't fully understand yet. The attached hack should sidestep the issue.
Created attachment 117661 [details] [review] gen8_no_vs_uav_coherency.patch
No GPU hang was detected during this run. After every test run, automation checks dmesg for gpu hangs and generates a test failure if one exists. For example, the following build has "failure-gpu-hang-otc-gfxtest-bsw-03-bswm64.compile.error" in the test results because of a BSW gpu hang. http://otc-mesa-ci.jf.intel.com/job/mesa_master_daily/1240/#showFailuresLink We need to do this check to reboot systems after gpu hang. Automation greps for "GPU HANG". Is that string shown in the case you identified? Let me know if there is anything else I need to check for.
How often did you see this fail? I let it run in a loop earlier for nearly 2000 times on BDW and couldn't spot a single failure. Can you reproduce the failure when you run the test manually?
I think it failed twice out of 12 runs. I turned it off as soon as I could, because intermittent results aren't handled well by the CI system. I wouldn't be surprised if it had to do with concurrent tests. If you can't reproduce it at all, I'll perform a series of test runs and see if serial/concurrent makes a difference.
During my tests I did see one instance of gpu hang on host-mem-barrier.
I tested the uav mem coherency patch, and was able to reproduce a gpu hang on BSW. http://lists.freedesktop.org/archives/mesa-dev/2015-August/091705.html
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.