Bug 108875 - Invalid subgroupSize for Intel GPU
Summary: Invalid subgroupSize for Intel GPU
Status: RESOLVED NOTABUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/intel (show other bugs)
Version: 18.2
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-27 05:44 UTC by Alexander
Modified: 2019-07-22 13:54 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Alexander 2018-11-27 05:44:06 UTC
VkPhysicalDeviceSubgroupProperties reports subgroupSize equal to 32. But actually subgroupSize value is 16.


Mesa 18.2.2 DRI Intel(R) Iris Pro Graphics 580 (Skylake GT4e)
Comment 1 Lionel Landwerlin 2018-11-27 10:39:03 UTC
I could be wrong, but I though subgroupSize was a maximum number.
We can indeed do 32 but not on all stages of the pipeline (32 for compute).
Comment 2 Alexander 2018-11-27 11:11:01 UTC
I tried to run a simple compute shader to test subgroup instructions:

layout(local_size_x = subgroupSize) in;
layout(std430, binding = 0) writeonly buffer output_buffer { uint output[]; };
void main() {
  uint id = gl_GlobalInvocationID.x;
  output[id] = subgroupAdd(id);
}

The output on Intel is:
120  120  120  120  120  120  120  120  120  120  120  120  120  120  120  120
376  376  376  376  376  376  376  376  376  376  376  376  376  376  376  376
632  632  632  632  632  632  632  632  632  632  632  632  632  632  632  632
888  888  888  888  888  888  888  888  888  888  888  888  888  888  888  888

The real subgroup size is 16 on Intel (32 on NVidia and 64 on AMD hardware).

PS: It's interesting that subgroup size under Direct3D12 (DXIL shader) equals to 8.
Comment 3 Lionel Landwerlin 2018-11-27 11:51:48 UTC
(In reply to Alexander from comment #2)
>   uint id = gl_GlobalInvocationID.x;

Don't you want gl_SubgroupInvocationID ?
Comment 4 Alexander 2018-11-27 11:59:00 UTC
gl_SubgroupInvocationID output is:

0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
Comment 5 Lionel Landwerlin 2018-11-27 13:14:05 UTC
I wonder if we might have a problem in brw_compile_cs().
It seems like we won't compile the SIMD32 shader unless the local group size exceeds 896, yet we'll generate a SIMD16 shader regardless of any minimum...

Jason might be able to shed more light on that, I really don't understand the logic here.
Comment 6 Jason Ekstrand 2018-11-27 14:23:29 UTC
Just because the advertised subgroupSize is 32 doesnn't mean we have to run with "full" subgroups.  Intel hardware has dispatch widths of 8, 16, and 32.  In the Vulkan subgroup model, dispatch modes of 8 and 16 are advertised as a subgroup size of 32 and where only the first 8 or 16 invocations are enabled.  This is entirely in-line with the spec; there is nothing that guarantees that local_size_x = subgroupSize will get you a single invocation or that gl_NumSubgroups = DIV_ROUND_UP(gl_WorkgroupSize.x * gl_WorkgroupSize.y * gl_WorkgroupSize.z, gl_SubgroupSize).

Besides that, you really don't want to run 32-wide on Intel.  The performance trade-offs almost always aren't worth it.  We default to 16-wide because that's tends to be a nice sweet-spot but sometimes 8-wide is even better.
Comment 7 Alexander 2018-11-27 14:33:48 UTC
Yep, you are right. The subgroupSize is 32 only for shaders with local group size more than 896. Otherwise, the subgroupSize equals to 16. Unfortunately, there is no way to determine the actual size (or minimum size) of subgroupSize variable in the Vulkan API. It will be possible to optimize shaders better with guaranteed minimal subgroupSize value. Under Direct3D12 the subgroupSize is always 8 on Intel hardware. Maybe it will be a way to declare the subgroupSize value during shader compilation?
Comment 8 Jason Ekstrand 2018-11-27 14:53:00 UTC
> The subgroupSize is 32 only for shaders with local group size more than 896.

Correct.  We have to run 32-wide when you get very large local group sizes in order to fit the entire group on the GPU at once.  Otherwise, we usually run 16 or 8-wide depending on register usage.

Don't worry; you are not the first person to have this problem.  I can't speak for D3D12 but for Vulkan we do have something in the works which provides more explicit control over subgroup sizes so that you can really know what you're getting.  Unfortunately, I can't really provide a timeline as to when it will be available.
Comment 9 Duncan Hopkins 2019-07-22 08:22:26 UTC
I assume the VK_EXT_subgroup_size_control in the Vulkan 1.1.116 specification allows for this sort of control?
Comment 10 Jason Ekstrand 2019-07-22 13:54:41 UTC
That is correct.  That extension should give you all the buttons and knobs you need to control our sometimes unruly subgroup sizes.  Be careful with it though.  SIMD32 is not always your friend.  Sometimes it's ok and even provides a bit more parallelism but sometimes it causes massive amounts of spilling and tanks performance.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.