Created attachment 134977 [details] Difference between the result of gl_GlobalInvocationID and gl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x; The spec for ARB_compute_variable_group_size in this file https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_compute_variable_group_size.txt state that gl_GlobalInvocationID = gl_WorkGroupID * gl_LocalGroupSizeARB + gl_LocalInvocationID; but on my computer, arch linux with Radeon RX 580 Series (AMD POLARIS10 / DRM 3.18.0 / 4.13. 7-1-ARCH, LLVM 5.0.0) : mesa version 17.2.2, gl_GlobalInvocationID = gl_LocalInvocationID ; only. If I replace uint i = gl_GlobalInvocation.x with uint i = gl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x the output of the compute shader work as expected.
I wanted to test this out on nouveau -- I ran the program, and it created output. What's the expected output?
The expected output: value[i].x (= gl_GlobalInvocationID.x ;) should be equal to value[i].y and value[i].w (= gl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x;) for all indices i according to spec. With the parameter I gave to glDispatchComputeGroupSizeARB, the divergence of the result happen after the indice i = 63 on my computer. So value[i].x = value[i].z (=gl_LocalInvocationID.x;) instead. So on my computer, gl_GlobalInvocationID.x appear to be only equal to gl_LocalInvocationID.x notgl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x or I misread the spec.
OK, I see. The relevant shader is fill_cs_450.glsl. It puts stuff into the array with values (gl_GlobalInvocationID.x, t, gl_LocalInvocationID.x, t) where t = gl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x Ideally the values printed should be (x, x, l, x) -- i.e. t == gl_GlobalInvocationID.x. With nouveau this outputs a bunch of BS... sometimes gl_GlobalInvocationID.x == gl_LocalInvocationID.x, other times gl_GlobalInvocationID.x == randomish value. I suspect this is a combination of a nouveau bug and a core bug [the latter of which is affecting the radeonsi output]. Going to try to take a look.
OK yeah, this is totally broken... for a simplified shader (which I will submit to piglit), this is generating code like COMP PROPERTY CS_FIXED_BLOCK_WIDTH 0 PROPERTY CS_FIXED_BLOCK_HEIGHT 0 PROPERTY CS_FIXED_BLOCK_DEPTH 0 DCL SV[0], THREAD_ID DCL SV[1], BLOCK_ID DCL SV[2], BLOCK_SIZE DCL BUFFER[16] DCL TEMP[0..3], LOCAL IMM[0] UINT32 {8, 0, 0, 0} 0: UMAD TEMP[0].x, SV[1].xyzz, TEMP[1].xyzz, SV[0].xyzz 1: MOV TEMP[1].x, TEMP[0].xxxx 2: MOV TEMP[1].y, SV[0].xxxx 3: UMAD TEMP[2].x, SV[1].xxxx, SV[2].xxxx, SV[0].xxxx 4: UMUL TEMP[3].x, TEMP[2].xxxx, IMM[0].xxxx 5: STORE BUFFER[16].xy, TEMP[3].xxxx, TEMP[1].xyyy 6: END Note the first op which tries to read the uninitialized TEMP[1].xyzz. Oops. Pretty sure this used to work, so probably got broken at some point with the various refactors.
Thank you very much for checking that. S.C.
https://patchwork.freedesktop.org/patch/184119/ This fixes both your application as well as a small piglit test I wrote on nouveau. The issue was in the core logic. If radeonsi still doesn't work with this, then it has additional issues.
An updated patch fixes the issue in a more complete way (with multiple shaders being linked together, even non-variable local sizes were broken). This is now upstream in commit 4d24a7cb97641cacecd371d1968f6964785822e4 Author: Ilia Mirkin <imirkin@alum.mit.edu> Date: Sat Oct 21 15:15:41 2017 -0400 glsl: fix derived cs variables Thanks for reporting and providing a repro! As the original issue was filed against radeonsi, I'm going to leave this open until someone can confirm that this does indeed fix the problem there.
values[i].x == values[i].y == values[i].w for all indices i with radeonsi. The issue appears to be fixed, thanks guys! Closing.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.