Bug 104492 - Compute Shader: Wrong alignment when assigning struct value to structured SSBO
Summary: Compute Shader: Wrong alignment when assigning struct value to structured SSBO
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-04 14:45 UTC by florian.will
Modified: 2018-01-08 02:45 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Code to reproduce this issue. Requires SDL2. (7.83 KB, text/x-c++src)
2018-01-04 14:45 UTC, florian.will
Details
RADEON_DUMP_SHADERS=1 output. (53.36 KB, text/plain)
2018-01-04 14:46 UTC, florian.will
Details

Description florian.will 2018-01-04 14:45:22 UTC
Created attachment 136547 [details]
Code to reproduce this issue. Requires SDL2.

Hi,

I'm trying to get the Banshee 3D engine <https://github.com/BearishSun/BansheeEngine> working on my hardware (HD 7870) using Mesa radeonsi (amdgpu kernel module). I use 17.2.4-0ubuntu1~17.10.1 from the Ubuntu artful-proposed archive.

It uses a compute shader that reads a texture cube and writes 6 sets of coefficients into a shader storage buffer object. Details are not important as I have isolated the (probably) incorrect driver behaviour and created a much simpler program that triggers the same issue.

In short: Let's say we have a GLSL struct and access the SSBO through a dynamically-sized array of that struct type using the std430 layout, like this:

struct ResultRecord
{
	float a[10];
	float b[10];
	float c[10];
	float weight;
};

layout(std430) buffer gOutput
{
	ResultRecord ssb[];
};

Then a simple assignment of a local variable of the same struct type to ssb[i] writes the floats into incorrect buffer offsets, because b, c and "weight" are placed as if the array elements in a, b and c were vec4-aligned instead of float-aligned, tripling the size of a, b and c.

Code that triggers it is like this, which should hopefully be valid GLSL (and makes more sense the way it is used in Banshee 3D):
ResultRecord result;
// ... populate result ...
ssb[gl_LocalInvocationIndex] = result;

The test program is available here and contains a (maybe too) detailed explanation and three program variations that fix the issue: https://gist.github.com/w-flo/b1a5791749eea5f36cb54628037ba2bf
But I'll also attach it to this bug report.


Looking at the RADEON_DUMP_SHADERS=1 output, I think that the bug is already visible in the TGSI dump, as explained in the comment at the top of my reproducer program. I'll attach the output (it's possibly based on an older version of the reproducer).
Comment 1 florian.will 2018-01-04 14:46:10 UTC
Created attachment 136548 [details]
RADEON_DUMP_SHADERS=1 output.
Comment 2 Ilia Mirkin 2018-01-04 16:49:03 UTC
Can you see if this is fixed on master? I fixed a seemingly related issue with

https://cgit.freedesktop.org/mesa/mesa/commit/?id=0332c7484b712e56ce1a6648c5fa04c90e286c37
Comment 3 florian.will 2018-01-04 18:48:20 UTC
Hi Ilia,

unfortunately updating to 17.4~git1801031930.ad2187~oibaf~a  (apparently a Jan 3 commit) did not fix this test case. I'll downgrade again because that package breaks renderdoc and the gnome login screen (wayland-based) on my system, probably packaging-related.
Comment 4 florian.will 2018-01-04 21:44:24 UTC
Maybe the bug is in compiler/glsl/lower_buffer_access.cpp?
https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/glsl/lower_buffer_access.cpp?id=396c006d907b023f9b187db618ee2a6e4e1b8a85#n51

I'm not sure about the control flow for radeonsi, but if that pass is used in my setup, and this emit_access() code is used to break down struct derefs into multiple scalar/array/vec derefs, then it seems like the check for packing == GLSL_INTERFACE_STD430 is missing in lines 77/78 and 85, and the std140 layout is assumed instead. I think that might explain the observed incorrect behaviour.

In that same file, the check for std430 was added in a few places, like in line 338.

I'd love to give this a try and add checks to use std430_base_alignment() and std430_size() if appropriate, but I'm not really prepared to compile Mesa myself right now.

So, if someone who knows this code feels like my suggested change is correct and required, I'd be more than happy if they could take care of this. Otherwise I might give it a try myself in some time.
Comment 5 florian.will 2018-01-05 13:07:46 UTC
I'm preparing a patch right now with the changes described in my last comment. The patch fixes my test case and I see no piglit regressions (only some unstable tests unrelated to my patch). If I get git send-email to work, I'll send it to the list.
Comment 6 Timothy Arceri 2018-01-08 02:45:37 UTC
Fixed by:

commit 7e025def6d7d3d6bf94facd6ec6d956f40cbb31e
Author: Florian Will <florian.will@gmail.com>
Date:   Fri Jan 5 15:33:31 2018 +0100

    glsl: Respect std430 layout in lower_buffer_access
    
    Respect the std430 rules for determining offset and size of struct
    members when using a std430 buffer. std140 rules lead to wrong buffer
    offsets in that case.
    
    Fixes my test case attached in Bugzilla. No piglit changes.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104492
    Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.