Summary: | [BDW] ES3-CTS.shaders.uniform_block.random.nested_structs_arrays.9 Fails | ||
---|---|---|---|
Product: | Mesa | Reporter: | Marta Löfstedt <marta.lofstedt> |
Component: | Drivers/DRI/i965 | Assignee: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Status: | RESOLVED MOVED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | ||
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Marta Löfstedt
2015-09-30 14:56:36 UTC
Please note that on newer version of the Khronos CTS tests this test has been renamed: ES3-CTS.shaders.uniform_block.random.nested_structs_arrays.8 The Fail is reproduced on Mesa: git@102f6 I did a little digging on this this afternoon. I'm going to write down what I found here before I head home. I don't know when I'll have time to poke at this again, so I want to leave notes for the next person.
I believe Connor's patch is a complete red herring; it only changes things because it causes register allocation to be subtly different on SIMD8. Running the test, I noticed that if you run it with INTEL_DEBUG=no16, it fails and, before the window closes, it flashes yellow. Looking at the shader it generates we see
> dEQP_FragColor = vec4(1.0, v_vtxResult, result, 1.0);
So a color of yellow means that v_vtxResult is fine, but result is 0. This means that the failure is in one of the comparisons in the fragment shader. In particular, the fault is not the vertex shader or nor is it in the interpolation of the vertex result. I see two potential ways forward:
1) Play with the dEQP test case and see if we can get a smaller failing case
2) Step through the fragment shader in the simulator and see if we can see where the false comparison is coming from.
My first attempt would probably be number 2 but, as I said above, I don't know when I'll have time to look at it again so if someone else wants to give it a try, feel free.
So, I did a little more investigation last night and Matt and I looked at it a bit this morning. Here's what we found: 1) If you step through it in the simulator, the simulator produces "correct" (would pass the test) results. I double and tripple-checked this and I can't get the simulator to give me yellow. 2) If you hack up could_coissue() in brw_fs_combine_constants.cpp, it generates the exact same code for HSW as it does for BDW but HSW passes. 3) We generate the exact same code for BSW as we do for BDW and BSW passes. The above seems to rule out any sort of register allocation or scheduling errors. Instead, it's almost certain that we're hitting some undocumented and unsimulated hardware bug. Unfortunately, the shader (and hitting the bug) is very sensitive to register allocation and anything you do that changes register allocation is liable to make the shader not hit the bug. This means that there are a pile of different things that one could do to make the shader go from pass to fail but none of them actually fix anything! We need to be very careful when we claim that a patch fixes this bug to make sure that the patch actually does fix something and doesn't just hide it again. This bug has been fixed by: Author: Connor Abbott <cwabbott0@gmail.com> AuthorDate: Sat Jun 6 13:32:21 2015 -0400 Commit: Connor Abbott <cwabbott0@gmail.com> CommitDate: Fri Oct 30 02:19:00 2015 -0400 i965: always run the post-RA scheduler Before, we would only do scheduling after register allocation if we spilled, despite the fact that the pre-RA scheduler was only supposed to be for register pressure and set the latencies of every instruction to 1. This meant that unless we spilled, which we rarely do, then we never considered instruction latencies at all, and we usually never bothered to try and hide texture fetch latency. Although a later commit removes the setting the latency to 1 part, we still want to always run the post-RA scheduler since it's able to take the false dependencies that the register allocator creates into account, and it can be more aggressive than the pre-RA scheduler since it doesn't have to worry about register pressure at all. Test master post-ra-sched diff %diff bench_OglPSBump2 396.730 402.386 5.656 +1.400% bench_OglPSBump8 244.370 247.591 3.221 +1.300% bench_OglPSPhong 241.117 242.002 0.885 +0.300% bench_OglPSPom 59.555 59.725 0.170 +0.200% bench_OglShMapPcf 86.149 102.346 16.197 +18.800% bench_OglVSTangent 388.849 395.489 6.640 +1.700% bench_trex 65.471 65.862 0.390 +0.500% bench_trexoff 69.562 70.150 0.588 +0.800% bench_heaven 25.179 25.254 0.074 +0.200% Reviewed-by: Jason Ekstrand <jasoan.ekstrand@intel.com> (In reply to Mark Janes from comment #4) > This bug has been fixed by: > > Author: Connor Abbott <cwabbott0@gmail.com> > AuthorDate: Sat Jun 6 13:32:21 2015 -0400 > Commit: Connor Abbott <cwabbott0@gmail.com> > CommitDate: Fri Oct 30 02:19:00 2015 -0400 > > i965: always run the post-RA scheduler Not fixed, just hidden. It's a very obscure hardware bug and any sort of shuffling around of instructions is liable to hide it. However, it's interesting that running post-RA scheduling changes things because that means it has nothing to do with register allocation. I think this is more evidence of it being some sort of dependency-tracking bug in the hardware. I'm re-opening this so we don't forget it's here. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1496. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.