VkPhysicalDeviceSubgroupProperties reports subgroupSize equal to 32. But actually subgroupSize value is 16. Mesa 18.2.2 DRI Intel(R) Iris Pro Graphics 580 (Skylake GT4e)
I could be wrong, but I though subgroupSize was a maximum number. We can indeed do 32 but not on all stages of the pipeline (32 for compute).
I tried to run a simple compute shader to test subgroup instructions: layout(local_size_x = subgroupSize) in; layout(std430, binding = 0) writeonly buffer output_buffer { uint output[]; }; void main() { uint id = gl_GlobalInvocationID.x; output[id] = subgroupAdd(id); } The output on Intel is: 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 376 376 376 376 376 376 376 376 376 376 376 376 376 376 376 376 632 632 632 632 632 632 632 632 632 632 632 632 632 632 632 632 888 888 888 888 888 888 888 888 888 888 888 888 888 888 888 888 The real subgroup size is 16 on Intel (32 on NVidia and 64 on AMD hardware). PS: It's interesting that subgroup size under Direct3D12 (DXIL shader) equals to 8.
(In reply to Alexander from comment #2) > uint id = gl_GlobalInvocationID.x; Don't you want gl_SubgroupInvocationID ?
gl_SubgroupInvocationID output is: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
I wonder if we might have a problem in brw_compile_cs(). It seems like we won't compile the SIMD32 shader unless the local group size exceeds 896, yet we'll generate a SIMD16 shader regardless of any minimum... Jason might be able to shed more light on that, I really don't understand the logic here.
Just because the advertised subgroupSize is 32 doesnn't mean we have to run with "full" subgroups. Intel hardware has dispatch widths of 8, 16, and 32. In the Vulkan subgroup model, dispatch modes of 8 and 16 are advertised as a subgroup size of 32 and where only the first 8 or 16 invocations are enabled. This is entirely in-line with the spec; there is nothing that guarantees that local_size_x = subgroupSize will get you a single invocation or that gl_NumSubgroups = DIV_ROUND_UP(gl_WorkgroupSize.x * gl_WorkgroupSize.y * gl_WorkgroupSize.z, gl_SubgroupSize). Besides that, you really don't want to run 32-wide on Intel. The performance trade-offs almost always aren't worth it. We default to 16-wide because that's tends to be a nice sweet-spot but sometimes 8-wide is even better.
Yep, you are right. The subgroupSize is 32 only for shaders with local group size more than 896. Otherwise, the subgroupSize equals to 16. Unfortunately, there is no way to determine the actual size (or minimum size) of subgroupSize variable in the Vulkan API. It will be possible to optimize shaders better with guaranteed minimal subgroupSize value. Under Direct3D12 the subgroupSize is always 8 on Intel hardware. Maybe it will be a way to declare the subgroupSize value during shader compilation?
> The subgroupSize is 32 only for shaders with local group size more than 896. Correct. We have to run 32-wide when you get very large local group sizes in order to fit the entire group on the GPU at once. Otherwise, we usually run 16 or 8-wide depending on register usage. Don't worry; you are not the first person to have this problem. I can't speak for D3D12 but for Vulkan we do have something in the works which provides more explicit control over subgroup sizes so that you can really know what you're getting. Unfortunately, I can't really provide a timeline as to when it will be available.
I assume the VK_EXT_subgroup_size_control in the Vulkan 1.1.116 specification allows for this sort of control?
That is correct. That extension should give you all the buttons and knobs you need to control our sometimes unruly subgroup sizes. Be careful with it though. SIMD32 is not always your friend. Sometimes it's ok and even provides a bit more parallelism but sometimes it causes massive amounts of spilling and tanks performance.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.