Created attachment 136547 [details]
Code to reproduce this issue. Requires SDL2.
I'm trying to get the Banshee 3D engine <https://github.com/BearishSun/BansheeEngine> working on my hardware (HD 7870) using Mesa radeonsi (amdgpu kernel module). I use 17.2.4-0ubuntu1~17.10.1 from the Ubuntu artful-proposed archive.
It uses a compute shader that reads a texture cube and writes 6 sets of coefficients into a shader storage buffer object. Details are not important as I have isolated the (probably) incorrect driver behaviour and created a much simpler program that triggers the same issue.
In short: Let's say we have a GLSL struct and access the SSBO through a dynamically-sized array of that struct type using the std430 layout, like this:
layout(std430) buffer gOutput
Then a simple assignment of a local variable of the same struct type to ssb[i] writes the floats into incorrect buffer offsets, because b, c and "weight" are placed as if the array elements in a, b and c were vec4-aligned instead of float-aligned, tripling the size of a, b and c.
Code that triggers it is like this, which should hopefully be valid GLSL (and makes more sense the way it is used in Banshee 3D):
// ... populate result ...
ssb[gl_LocalInvocationIndex] = result;
The test program is available here and contains a (maybe too) detailed explanation and three program variations that fix the issue: https://gist.github.com/w-flo/b1a5791749eea5f36cb54628037ba2bf
But I'll also attach it to this bug report.
Looking at the RADEON_DUMP_SHADERS=1 output, I think that the bug is already visible in the TGSI dump, as explained in the comment at the top of my reproducer program. I'll attach the output (it's possibly based on an older version of the reproducer).
Created attachment 136548 [details]
Can you see if this is fixed on master? I fixed a seemingly related issue with
unfortunately updating to 17.4~git1801031930.ad2187~oibaf~a (apparently a Jan 3 commit) did not fix this test case. I'll downgrade again because that package breaks renderdoc and the gnome login screen (wayland-based) on my system, probably packaging-related.
Maybe the bug is in compiler/glsl/lower_buffer_access.cpp?
I'm not sure about the control flow for radeonsi, but if that pass is used in my setup, and this emit_access() code is used to break down struct derefs into multiple scalar/array/vec derefs, then it seems like the check for packing == GLSL_INTERFACE_STD430 is missing in lines 77/78 and 85, and the std140 layout is assumed instead. I think that might explain the observed incorrect behaviour.
In that same file, the check for std430 was added in a few places, like in line 338.
I'd love to give this a try and add checks to use std430_base_alignment() and std430_size() if appropriate, but I'm not really prepared to compile Mesa myself right now.
So, if someone who knows this code feels like my suggested change is correct and required, I'd be more than happy if they could take care of this. Otherwise I might give it a try myself in some time.
I'm preparing a patch right now with the changes described in my last comment. The patch fixes my test case and I see no piglit regressions (only some unstable tests unrelated to my patch). If I get git send-email to work, I'll send it to the list.
Author: Florian Will <email@example.com>
Date: Fri Jan 5 15:33:31 2018 +0100
glsl: Respect std430 layout in lower_buffer_access
Respect the std430 rules for determining offset and size of struct
members when using a std430 buffer. std140 rules lead to wrong buffer
offsets in that case.
Fixes my test case attached in Bugzilla. No piglit changes.
Reviewed-by: Timothy Arceri <firstname.lastname@example.org>