Bug 94081 - [HSW] compute shader shared var + atomic op = fail
Summary: [HSW] compute shader shared var + atomic op = fail
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Jordan Justen
QA Contact: Intel 3D Bugs Mailing List
: 94255 (view as bug list)
Depends on:
Reported: 2016-02-10 18:32 UTC by Ilia Mirkin
Modified: 2016-03-10 17:29 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:

INTEL_DEBUG=cs shared atomic add (15.91 KB, text/plain)
2016-02-10 18:32 UTC, Ilia Mirkin
simple shader test that exposes the issue (1.75 KB, text/plain)
2016-02-19 22:35 UTC, Ilia Mirkin

Note You need to log in before you can comment on or make changes to this bug.
Description Ilia Mirkin 2016-02-10 18:32:09 UTC
Created attachment 121653 [details]
INTEL_DEBUG=cs shared atomic add

All 48 dEQP tests matching '*shared_var*atomic*' currently fail (with kernel 4.4.1, mesa git + Ken's compute state fixing series). This is how I run it:

MESA_GLES_VERSION_OVERRIDE=3.1 LD_LIBRARY_PATH=/home/ilia/install/lib ./deqp-gles31 --deqp-visibility=hidden --deqp-case='*shared_var*atomic*'

This accounts for almost all of the shared_var failures. There are also these 2, which are probably unrelated, but figured I'd mention just in case:

Test case 'dEQP-GLES31.functional.compute.basic.shared_var_multiple_invocations'..
Compute shader compile time = 0.448000 ms
Link time = 2.081000 ms
Test case duration in microseconds = 4210 us
  Fail (Comparison failed for Output.values[1])

Test case 'dEQP-GLES31.functional.compute.basic.shared_var_multiple_groups'..
Compute shader compile time = 0.446000 ms
Link time = 2.528000 ms
Test case duration in microseconds = 4595 us
  Fail (Comparison failed for Output.values[0])

Included is the disassembly of one of the atomic fails, in case it's useful.
Comment 1 Ilia Mirkin 2016-02-13 01:15:15 UTC
One additional observation: the (wrong) count of group 0 (after which it stops comparing) is different every time - tends to cycle between a few different values.

My suspicion is that there's something execmask-related going on. Right now we always use 0xffff as the execmask arg for all the untyped surface reads/writes/atomics, as supplied by fs_builder::sample_mask_reg(), but e.g. the HSW prm has very difficult to understand explanation of how the exec mask should be computed (page 832, Execution Masks). I wonder if data is being picked up from threads that are logically "off".

The shader in question does a 4x4x4 grid of 3x2x1 blocks.
Comment 2 Ilia Mirkin 2016-02-19 22:35:06 UTC
Created attachment 121849 [details]
simple shader test that exposes the issue

New theory: the shared memory isn't actually per-workgroup (even though it should be). Play around with the attached shader test, varying the local size, as well as the grid dimensions. [The result of the counter should be == the product of the grid dimensions.]
Comment 3 Jordan Justen 2016-02-23 01:03:05 UTC
Patch "i965/hsw: Initialize SLM index in state register" sent.

Comment 4 Jordan Justen 2016-02-23 01:05:53 UTC
*** Bug 94255 has been marked as a duplicate of this bug. ***
Comment 5 Jordan Justen 2016-03-10 17:29:13 UTC
Fixed on master:

commit a100a57e30010da49c96f84a661cec9c57f9eebe

    i965/hsw: Initialize SLM index in state register

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.