I'm the main developer of Ogre 2.1
We're trying our demos in latest Mesa:
I've built Mesa from scratch. Tried commits git-edfc17a (current head of branch 12.0) and commit 17f1c49b9ad05af4f6482f6fa950e5dcc1a779d1 (current head of master branch)
Almost all of them work except for the compute shader based ones (Googling around it appears for Southern Island radeon it could be buggy so it was turned off. Not 100% sure but I think that's what's happening).
Anyway, not a big deal on that. I understand the Mesa team considers that WIP. Move on.
However our Forward3D demo was glitching and I didn't know why. I've investigated the problem and located it:
* glGetIntegerv w/ GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT is returning 4, should return 256 for my HW (GCN 1.0 AMD Radeon HD 7770 Southern Islands).
* glGetIntegerv w/ GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT is returning 4, should return 256 for my HW.
In both cases "256" is what Windows drivers return btw, including fglrx on Linux.
When aligned to 4 bytes; the following worked correctly:
glTexBufferRange( GL_TEXTURE_BUFFER, GL_R32UI, boName, offsetAlignedTo4ButNotTo256, sizeBytes );
uniform usamplerBuffer f3dGrid;
uint numLightsInGrid = texelFetch( f3dGrid, int(sampleOffset) ).x;
But the following did not:
glTexBufferRange( GL_TEXTURE_BUFFER, GL_RGBA32F, boName, offsetAlignedTo4ButNotTo256, sizeBytes );
uniform samplerBuffer f3dLightList;
vec4 values = texelFetch( f3dLightList, int(sampleOffset) ).x;
The output of the latter was flickering garbage. When I forced 256 byte alignment, it worked as expected.
I consider this a critical issue because I suspect it's causing subtle problems with a lot of titles whose buffer offsets don't start at 0.
I believe NVIDIA GPUs also require buffer offsets of 256. I don't know about Intel cards.
(In reply to Matias N. Goldberg from comment #0)
> But the following did not:
> glTexBufferRange( GL_TEXTURE_BUFFER, GL_RGBA32F, boName,
> offsetAlignedTo4ButNotTo256, sizeBytes );
> uniform samplerBuffer f3dLightList;
> vec4 values = texelFetch( f3dLightList, int(sampleOffset) ).x;
> The output of the latter was flickering garbage. When I forced 256 byte
> alignment, it worked as expected.
My guess is that 16 would have worked as well. It just has to be aligned to the texel. 4 works for GL_RGBA8 but not GL_RGBA32*. Unfortunately there's no provision for a per-format limit.
I agree with Ilia. There should be no requirement for an alignment greater than the "texel" size, that really wouldn't make sense given how the hardware works. Can you please try with 16-but-not-256 alignment?
>> BEGIN PATCH
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c
index 88f4f20..6a2d5bc 100644
@@ -363,10 +363,11 @@ static int si_get_param(struct pipe_screen* pscreen, enum pipe_cap param)
+ return 256;
- return HAVE_LLVM >= 0x0309 ? 4 : 0;
+ return HAVE_LLVM >= 0x0309 ? 256 : 0;
if (pscreen->get_shader_param(pscreen, PIPE_SHADER_COMPUTE,
<< END PATCH
I'm playing safe by going with 256 bytes with everyone, based on the overwhelming amount of reports (on the left top there's a dropdown, select "All"); unless someone wants to go through the trouble of identifying each card. Even in DX12 256 alignment is by spec:
I believe this patch could potentially increase radeonsi's compatibility greatly and should be tested against popular game titles.
Sure I will try 16 but not 256
Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN cards only require 4 byte alignment there.
All other buffer binding points require 1, 2, or 4 bytes depending on the texel size. It's really min(texel_size, 4).
Whatever the closed OpenGL driver reports doesn't correspond to the reality.
The limits reported by our Vulkan driver are very similar to Mesa.
I only tested texture and uniform buffers, not SSBOs.
* Setting UBOs to 4 byte alignments didn't seem to matter in any way. Performance didn't seem to be affected either. Nonetheless given that AMD always sets it to 256; this is what I'd recommend (i.e. there could be undocumented hw errors for edge cases?)
* Setting TBOs to non-16 byte alignments breaks in a specific pattern (i.e. 4 byte alignments produce certain glitches, 8 byte alignments produce a different set of glitches, 12 byte alignments a different set, etc). Setting TBOs to 16 byte alignments fixes everything. I benchmarked 256-byte alignments vs 16-byte alignments-non-256-byte and couldn't see any performance difference (the sample kept around 46.50-47.00ms all the time).
Our code is complex so if you keep insisting 4 byte alignment is correct, I won't deny the possibility it could be a bug in our codebase. I would have to create a simple as possible case to double check it.
> Nonetheless given that AMD always sets it to 256 ...
Not true. The Mesa driver is also developed by us. Nicolai and me work for AMD. There is also the Vulkan driver which returns "1 byte" for the texture buffer alignment.
(In reply to Marek Olšák from comment #5)
> Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN
> cards only require 4 byte alignment there.
> All other buffer binding points require 1, 2, or 4 bytes depending on the
> texel size. It's really min(texel_size, 4).
That means that you have to return max(texel_size) for the GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT. You only get one cap, and a texture buffer can be any offset.
(In reply to Ilia Mirkin from comment #9)
> (In reply to Marek Olšák from comment #5)
> > Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN
> > cards only require 4 byte alignment there.
> > All other buffer binding points require 1, 2, or 4 bytes depending on the
> > texel size. It's really min(texel_size, 4).
> That means that you have to return max(texel_size) for the
> GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT. You only get one cap, and a texture
> buffer can be any offset.
That's what we already do.
(In reply to Marek Olšák from comment #10)
> (In reply to Ilia Mirkin from comment #9)
> > (In reply to Marek Olšák from comment #5)
> > > Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN
> > > cards only require 4 byte alignment there.
> > >
> > > All other buffer binding points require 1, 2, or 4 bytes depending on the
> > > texel size. It's really min(texel_size, 4).
> > That means that you have to return max(texel_size) for the
> > GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT. You only get one cap, and a texture
> > buffer can be any offset.
> That's what we already do.
You return 4. However that's not enough if I set the format to GL_RGBA32I. There's no provision in the spec to disallow someone feeding you a aligned-to-4 buffer with GL_RGBA32I format if you return "4" as the value.
Oh, unless I totally misunderstood what you were saying, and that situation should work. In which case, apologies for the noise.
Created attachment 125723 [details]
Proof of the bug
I've added a "Proof of the bug".
There IS a bug. I don't know if it's software or in the hardware. The attached sample does the following:
1. Create a 1MB buffer.
2. Fill the first 4kb with the pattern:
data[i+0] = 1.0f;
data[i+1] = 0.5f;
data[i+2] = 0.15f;
data[i+3] = 0.0f;
3. Attach it via:
glActiveTexture( GL_TEXTURE0 );
glBindTexture( GL_TEXTURE_BUFFER, texName );
glTexBufferRange( GL_TEXTURE_BUFFER, GL_RGBA32F, vboName,
4. Render a triangle showing the colours.
If the alignment is not a problem, I should see Orange-Cream, then Carmine, then blue-violetish, then green; repeat.
Instead I **ALWAYS** see Orange-Cream; which is clearly incorrect.
After running the sample long enough, elementStart will be > 1024 (thus 1024*4 exceeding 4kb) and begins to display black. (and if you leave it long enough running it could crash or throw an error, that's not the point)
Thanks. It's not a hardware issue. It's a driver design issue.
I'm working on it.
(In reply to Marek Olšák from comment #15)
> I'm working on it.
It should be fixed in master already.
I can confirm it works on both the sample repro I provided and my actual real case.
Small note: We discovered there was a potential for a crash because at certain points we assume SIMD alignment; so we're now enforcing 16-byte alignment as minimum in our code. I'm doing this remark because Mesa is the only (or among the very few) implementations that return alignments <16; which can reveal hidden bugs in program code that runs fine everywhere else. If you ever see a bug report of these sorts (e.g. crashing in movaps, movntdqa), this is something to keep in mind.
I appreciate the 4 byte alignment as 256 reported by others is excessively wasteful. Thanks.
For the record it was fixed in
on Feb 26, 2017 at 07:45:10.
(provided by the Example extension).