Summary: | [HSW] compute shader shared var + atomic op = fail | ||
---|---|---|---|
Product: | Mesa | Reporter: | Ilia Mirkin <imirkin> |
Component: | Drivers/DRI/i965 | Assignee: | Jordan Justen <jljusten> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | mark.a.janes |
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
INTEL_DEBUG=cs shared atomic add
simple shader test that exposes the issue |
One additional observation: the (wrong) count of group 0 (after which it stops comparing) is different every time - tends to cycle between a few different values. My suspicion is that there's something execmask-related going on. Right now we always use 0xffff as the execmask arg for all the untyped surface reads/writes/atomics, as supplied by fs_builder::sample_mask_reg(), but e.g. the HSW prm has very difficult to understand explanation of how the exec mask should be computed (page 832, Execution Masks). I wonder if data is being picked up from threads that are logically "off". The shader in question does a 4x4x4 grid of 3x2x1 blocks. Created attachment 121849 [details]
simple shader test that exposes the issue
New theory: the shared memory isn't actually per-workgroup (even though it should be). Play around with the attached shader test, varying the local size, as well as the grid dimensions. [The result of the counter should be == the product of the grid dimensions.]
Patch "i965/hsw: Initialize SLM index in state register" sent. https://patchwork.freedesktop.org/patch/74671/ *** Bug 94255 has been marked as a duplicate of this bug. *** Fixed on master: commit a100a57e30010da49c96f84a661cec9c57f9eebe i965/hsw: Initialize SLM index in state register |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 121653 [details] INTEL_DEBUG=cs shared atomic add All 48 dEQP tests matching '*shared_var*atomic*' currently fail (with kernel 4.4.1, mesa git + Ken's compute state fixing series). This is how I run it: MESA_GLES_VERSION_OVERRIDE=3.1 LD_LIBRARY_PATH=/home/ilia/install/lib ./deqp-gles31 --deqp-visibility=hidden --deqp-case='*shared_var*atomic*' This accounts for almost all of the shared_var failures. There are also these 2, which are probably unrelated, but figured I'd mention just in case: Test case 'dEQP-GLES31.functional.compute.basic.shared_var_multiple_invocations'.. Compute shader compile time = 0.448000 ms Link time = 2.081000 ms Test case duration in microseconds = 4210 us Fail (Comparison failed for Output.values[1]) Test case 'dEQP-GLES31.functional.compute.basic.shared_var_multiple_groups'.. Compute shader compile time = 0.446000 ms Link time = 2.528000 ms Test case duration in microseconds = 4595 us Fail (Comparison failed for Output.values[0]) Included is the disassembly of one of the atomic fails, in case it's useful.