Bug 94081 - [HSW] compute shader shared var + atomic op = fail
Summary: [HSW] compute shader shared var + atomic op = fail
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Jordan Justen
QA Contact: Intel 3D Bugs Mailing List
: 94255 (view as bug list)
Depends on:
Reported: 2016-02-10 18:32 UTC by Ilia Mirkin
Modified: 2016-03-10 17:29 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:

INTEL_DEBUG=cs shared atomic add (15.91 KB, text/plain)
2016-02-10 18:32 UTC, Ilia Mirkin
simple shader test that exposes the issue (1.75 KB, text/plain)
2016-02-19 22:35 UTC, Ilia Mirkin

Description Ilia Mirkin 2016-02-10 18:32:09 UTC
Created attachment 121653 [details]
INTEL_DEBUG=cs shared atomic add

All 48 dEQP tests matching '*shared_var*atomic*' currently fail (with kernel 4.4.1, mesa git + Ken's compute state fixing series). This is how I run it:

MESA_GLES_VERSION_OVERRIDE=3.1 LD_LIBRARY_PATH=/home/ilia/install/lib ./deqp-gles31 --deqp-visibility=hidden --deqp-case='*shared_var*atomic*'

This accounts for almost all of the shared_var failures. There are also these 2, which are probably unrelated, but figured I'd mention just in case:

Test case 'dEQP-GLES31.functional.compute.basic.shared_var_multiple_invocations'..
Compute shader compile time = 0.448000 ms
Link time = 2.081000 ms
Test case duration in microseconds = 4210 us
  Fail (Comparison failed for Output.values[1])

Test case 'dEQP-GLES31.functional.compute.basic.shared_var_multiple_groups'..
Compute shader compile time = 0.446000 ms
Link time = 2.528000 ms
Test case duration in microseconds = 4595 us
  Fail (Comparison failed for Output.values[0])

Included is the disassembly of one of the atomic fails, in case it's useful.
Comment 1 Ilia Mirkin 2016-02-13 01:15:15 UTC
One additional observation: the (wrong) count of group 0 (after which it stops comparing) is different every time - tends to cycle between a few different values.

My suspicion is that there's something execmask-related going on. Right now we always use 0xffff as the execmask arg for all the untyped surface reads/writes/atomics, as supplied by fs_builder::sample_mask_reg(), but e.g. the HSW prm has very difficult to understand explanation of how the exec mask should be computed (page 832, Execution Masks). I wonder if data is being picked up from threads that are logically "off".

The shader in question does a 4x4x4 grid of 3x2x1 blocks.
Comment 2 Ilia Mirkin 2016-02-19 22:35:06 UTC
Created attachment 121849 [details]
simple shader test that exposes the issue

New theory: the shared memory isn't actually per-workgroup (even though it should be). Play around with the attached shader test, varying the local size, as well as the grid dimensions. [The result of the counter should be == the product of the grid dimensions.]
Comment 3 Jordan Justen 2016-02-23 01:03:05 UTC
Patch "i965/hsw: Initialize SLM index in state register" sent.

Comment 4 Jordan Justen 2016-02-23 01:05:53 UTC
*** Bug 94255 has been marked as a duplicate of this bug. ***
Comment 5 Jordan Justen 2016-03-10 17:29:13 UTC
Fixed on master:

commit a100a57e30010da49c96f84a661cec9c57f9eebe

    i965/hsw: Initialize SLM index in state register

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.