91611 – [BDW, BSW] arb_shader_image_load_store.execution.basic-imagestore-from-uniform intermittent

Bug 91611 - [BDW, BSW] arb_shader_image_load_store.execution.basic-imagestore-from-uniform intermittent

Summary: [BDW, BSW] arb_shader_image_load_store.execution.basic-imagestore-from-unifor...

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Francisco Jerez
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-08-11 21:12 UTC by Mark Janes
Modified:	2015-10-09 17:08 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:
i915 features:

Attachments
gen8_no_vs_uav_coherency.patch (758 bytes, patch) 2015-08-12 23:19 UTC, Francisco Jerez	Details \| Splinter Review
View All

Description Mark Janes 2015-08-11 21:12:52 UTC

arb_shader_image_load_store.execution.basic-imagestore-from-uniform intermittently fails on broadwell gt2 and braswell:

Standard Output

/tmp/build_root/m64/lib/piglit/bin/shader_runner /tmp/build_root/m64/lib/piglit/tests/spec/arb_shader_image_load_store/execution/basic-imageStore-from-uniform.shader_test -auto
piglit: debug: Requested an OpenGL 3.3 Core Context, and received a matching 3.3 context

Probe color at (0,0)
  Expected: 0.000000 1.000000 0.000000 1.000000
  Observed: 1.000000 0.000000 0.000000 1.000000

This behavior occurred when image_load_store was enabled in d03c657.

The test has been disabled on BDW and BSW, because intermittent tests
make it more difficult to track regressions.

Comment 1 Francisco Jerez 2015-08-12 23:18:16 UTC

I've been looking into this today and I don't think that this test could actually be causing the failure you've seen, it's pretty much trivial, most likely it was run concurrently with some other test that hung the GPU causing it to misrender (can you confirm if there was a GPU hang message in your kernel logs when the failure occurred?).

I've managed to reproduce an intermittent hang on BDW by running piglit in a loop, but it comes from the spec@arb_shader_image_load_store@host-mem-barrier Indirect/RaW subtest.  It seems rather difficult to reproduce, it often takes dozens of runs until the first failure occurs, which is why it didn't catch my eye until now.

It looks like the hang is caused by some interaction between context switches with the hardware UAV coherency function on the VS stage, which leads the VS to stall indefinitely for some reason I don't fully understand yet.  The attached hack should sidestep the issue.

Comment 2 Francisco Jerez 2015-08-12 23:19:49 UTC

Created attachment 117661 [details] [review]
gen8_no_vs_uav_coherency.patch

Comment 3 Mark Janes 2015-08-12 23:40:51 UTC

No GPU hang was detected during this run.

After every test run, automation checks dmesg for gpu hangs and generates a test failure if one exists.  For example, the following build has "failure-gpu-hang-otc-gfxtest-bsw-03-bswm64.compile.error" in the test results because of a BSW gpu hang.

http://otc-mesa-ci.jf.intel.com/job/mesa_master_daily/1240/#showFailuresLink

We need to do this check to reboot systems after gpu hang.

Automation greps for "GPU HANG".  Is that string shown in the case you identified?  Let me know if there is anything else I need to check for.

Comment 4 Francisco Jerez 2015-08-12 23:54:36 UTC

How often did you see this fail?  I let it run in a loop earlier for nearly 2000 times on BDW and couldn't spot a single failure.  Can you reproduce the failure when you run the test manually?

Comment 5 Mark Janes 2015-08-13 00:28:57 UTC

I think it failed twice out of 12 runs.  I turned it off as soon as I could, because intermittent results aren't handled well by the CI system.

I wouldn't be surprised if it had to do with concurrent tests.  If you can't reproduce it at all, I'll perform a series of test runs and see if serial/concurrent makes a difference.

Comment 6 Mark Janes 2015-08-14 18:17:05 UTC

During my tests I did see one instance of gpu hang on host-mem-barrier.

Comment 7 Mark Janes 2015-09-08 22:51:15 UTC

I tested the uav mem coherency patch, and was able to reproduce a gpu hang on BSW.  

http://lists.freedesktop.org/archives/mesa-dev/2015-August/091705.html

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.