Bug 103393

Summary: glDispatchComputeGroupSizeARB : gl_GlobalInvocationID.x != gl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x
Product: Mesa Reporter: Stephane Chevigny <stephane.chevigny>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium    
Version: 17.2   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Difference between the result of gl_GlobalInvocationID and gl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x;

Description Stephane Chevigny 2017-10-21 17:32:14 UTC
Created attachment 134977 [details]
Difference between the result of  gl_GlobalInvocationID and gl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x;

The spec for ARB_compute_variable_group_size in this file https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_compute_variable_group_size.txt state that gl_GlobalInvocationID = gl_WorkGroupID * gl_LocalGroupSizeARB + gl_LocalInvocationID; but on my computer, arch linux with  Radeon RX 580 Series (AMD POLARIS10 / DRM 3.18.0 / 4.13.
7-1-ARCH, LLVM 5.0.0) : mesa version 17.2.2,  gl_GlobalInvocationID = gl_LocalInvocationID ; only. If I replace uint i = gl_GlobalInvocation.x with  uint i = gl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x  the output of the compute shader work as expected.
Comment 1 Ilia Mirkin 2017-10-21 17:54:18 UTC
I wanted to test this out on nouveau -- I ran the program, and it created output. What's the expected output?
Comment 2 Stephane Chevigny 2017-10-21 18:19:22 UTC
The expected output:
value[i].x (= gl_GlobalInvocationID.x ;)  should be equal to value[i].y  and value[i].w (= gl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x;)  for all indices i according to spec. With the parameter I gave to glDispatchComputeGroupSizeARB, the divergence of the result happen after the indice i = 63 on my computer. So value[i].x = value[i].z (=gl_LocalInvocationID.x;) instead.

So on my computer, gl_GlobalInvocationID.x appear to be only equal to gl_LocalInvocationID.x notgl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x or I misread the spec.
Comment 3 Ilia Mirkin 2017-10-21 18:23:17 UTC
OK, I see. The relevant shader is fill_cs_450.glsl. It puts stuff into the array with values

(gl_GlobalInvocationID.x, t, gl_LocalInvocationID.x, t)

where t = gl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x

Ideally the values printed should be (x, x, l, x) -- i.e. t == gl_GlobalInvocationID.x.

With nouveau this outputs a bunch of BS... sometimes gl_GlobalInvocationID.x == gl_LocalInvocationID.x, other times gl_GlobalInvocationID.x == randomish value. I suspect this is a combination of a nouveau bug and a core bug [the latter of which is affecting the radeonsi output]. Going to try to take a look.
Comment 4 Ilia Mirkin 2017-10-21 18:56:30 UTC
OK yeah, this is totally broken... for a simplified shader (which I will submit to piglit), this is generating code like

COMP
PROPERTY CS_FIXED_BLOCK_WIDTH 0
PROPERTY CS_FIXED_BLOCK_HEIGHT 0
PROPERTY CS_FIXED_BLOCK_DEPTH 0
DCL SV[0], THREAD_ID
DCL SV[1], BLOCK_ID
DCL SV[2], BLOCK_SIZE
DCL BUFFER[16]
DCL TEMP[0..3], LOCAL
IMM[0] UINT32 {8, 0, 0, 0}
  0: UMAD TEMP[0].x, SV[1].xyzz, TEMP[1].xyzz, SV[0].xyzz
  1: MOV TEMP[1].x, TEMP[0].xxxx
  2: MOV TEMP[1].y, SV[0].xxxx
  3: UMAD TEMP[2].x, SV[1].xxxx, SV[2].xxxx, SV[0].xxxx
  4: UMUL TEMP[3].x, TEMP[2].xxxx, IMM[0].xxxx
  5: STORE BUFFER[16].xy, TEMP[3].xxxx, TEMP[1].xyyy
  6: END

Note the first op which tries to read the uninitialized TEMP[1].xyzz. Oops. Pretty sure this used to work, so probably got broken at some point with the various refactors.
Comment 5 Stephane Chevigny 2017-10-21 19:10:21 UTC
Thank you very much for checking that.

S.C.
Comment 6 Ilia Mirkin 2017-10-21 19:26:18 UTC
https://patchwork.freedesktop.org/patch/184119/

This fixes both your application as well as a small piglit test I wrote on nouveau. The issue was in the core logic. If radeonsi still doesn't work with this, then it has additional issues.
Comment 7 Ilia Mirkin 2017-10-23 12:42:11 UTC
An updated patch fixes the issue in a more complete way (with multiple shaders being linked together, even non-variable local sizes were broken). This is now upstream in

commit 4d24a7cb97641cacecd371d1968f6964785822e4
Author: Ilia Mirkin <imirkin@alum.mit.edu>
Date:   Sat Oct 21 15:15:41 2017 -0400

    glsl: fix derived cs variables

Thanks for reporting and providing a repro! As the original issue was filed against radeonsi, I'm going to leave this open until someone can confirm that this does indeed fix the problem there.
Comment 8 Samuel Pitoiset 2017-10-24 09:25:02 UTC
values[i].x == values[i].y == values[i].w for all indices i with radeonsi.

The issue appears to be fixed, thanks guys! Closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.