Summary: | Gallium: TBOs and images set the offset in elements, not bytes | ||
---|---|---|---|
Product: | Mesa | Reporter: | Matias N. Goldberg <dark_sylinc> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | critical | ||
Priority: | medium | CC: | vedran |
Version: | git | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | Proof of the bug |
Description
Matias N. Goldberg
2016-08-11 19:53:09 UTC
(In reply to Matias N. Goldberg from comment #0) > But the following did not: > glTexBufferRange( GL_TEXTURE_BUFFER, GL_RGBA32F, boName, > offsetAlignedTo4ButNotTo256, sizeBytes ); > uniform samplerBuffer f3dLightList; > vec4 values = texelFetch( f3dLightList, int(sampleOffset) ).x; > > The output of the latter was flickering garbage. When I forced 256 byte > alignment, it worked as expected. My guess is that 16 would have worked as well. It just has to be aligned to the texel. 4 works for GL_RGBA8 but not GL_RGBA32*. Unfortunately there's no provision for a per-format limit. I agree with Ilia. There should be no requirement for an alignment greater than the "texel" size, that really wouldn't make sense given how the hardware works. Can you please try with 16-but-not-256 alignment? >> BEGIN PATCH diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index 88f4f20..6a2d5bc 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -363,10 +363,11 @@ static int si_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_CONSTANT_BUFFER_OFFSET_ALIGNMENT: case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT: + return 256; case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return 4; case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT: - return HAVE_LLVM >= 0x0309 ? 4 : 0; + return HAVE_LLVM >= 0x0309 ? 256 : 0; case PIPE_CAP_GLSL_FEATURE_LEVEL: if (pscreen->get_shader_param(pscreen, PIPE_SHADER_COMPUTE, << END PATCH I'm playing safe by going with 256 bytes with everyone, based on the overwhelming amount of reports (on the left top there's a dropdown, select "All"); unless someone wants to go through the trouble of identifying each card. Even in DX12 256 alignment is by spec: http://delphigl.de/glcapsviewer/gl_stats_caps_single.php?listreportsbycap=GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT http://delphigl.de/glcapsviewer/gl_stats_caps_single.php?listreportsbycap=GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT http://delphigl.de/glcapsviewer/gl_stats_caps_single.php?listreportsbycap=GL_SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT I believe this patch could potentially increase radeonsi's compatibility greatly and should be tested against popular game titles. Sure I will try 16 but not 256 Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN cards only require 4 byte alignment there. All other buffer binding points require 1, 2, or 4 bytes depending on the texel size. It's really min(texel_size, 4). Whatever the closed OpenGL driver reports doesn't correspond to the reality. The limits reported by our Vulkan driver are very similar to Mesa. I'm back. I only tested texture and uniform buffers, not SSBOs. Results: * Setting UBOs to 4 byte alignments didn't seem to matter in any way. Performance didn't seem to be affected either. Nonetheless given that AMD always sets it to 256; this is what I'd recommend (i.e. there could be undocumented hw errors for edge cases?) * Setting TBOs to non-16 byte alignments breaks in a specific pattern (i.e. 4 byte alignments produce certain glitches, 8 byte alignments produce a different set of glitches, 12 byte alignments a different set, etc). Setting TBOs to 16 byte alignments fixes everything. I benchmarked 256-byte alignments vs 16-byte alignments-non-256-byte and couldn't see any performance difference (the sample kept around 46.50-47.00ms all the time). Our code is complex so if you keep insisting 4 byte alignment is correct, I won't deny the possibility it could be a bug in our codebase. I would have to create a simple as possible case to double check it. > Nonetheless given that AMD always sets it to 256 ...
Not true. The Mesa driver is also developed by us. Nicolai and me work for AMD. There is also the Vulkan driver which returns "1 byte" for the texture buffer alignment.
(In reply to Marek Olšák from comment #5) > Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN > cards only require 4 byte alignment there. > > All other buffer binding points require 1, 2, or 4 bytes depending on the > texel size. It's really min(texel_size, 4). That means that you have to return max(texel_size) for the GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT. You only get one cap, and a texture buffer can be any offset. (In reply to Ilia Mirkin from comment #9) > (In reply to Marek Olšák from comment #5) > > Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN > > cards only require 4 byte alignment there. > > > > All other buffer binding points require 1, 2, or 4 bytes depending on the > > texel size. It's really min(texel_size, 4). > > That means that you have to return max(texel_size) for the > GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT. You only get one cap, and a texture > buffer can be any offset. That's what we already do. (In reply to Marek Olšák from comment #10) > (In reply to Ilia Mirkin from comment #9) > > (In reply to Marek Olšák from comment #5) > > > Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN > > > cards only require 4 byte alignment there. > > > > > > All other buffer binding points require 1, 2, or 4 bytes depending on the > > > texel size. It's really min(texel_size, 4). > > > > That means that you have to return max(texel_size) for the > > GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT. You only get one cap, and a texture > > buffer can be any offset. > > That's what we already do. You return 4. However that's not enough if I set the format to GL_RGBA32I. There's no provision in the spec to disallow someone feeding you a aligned-to-4 buffer with GL_RGBA32I format if you return "4" as the value. Oh, unless I totally misunderstood what you were saying, and that situation should work. In which case, apologies for the noise. Created attachment 125723 [details]
Proof of the bug
I've added a "Proof of the bug". There IS a bug. I don't know if it's software or in the hardware. The attached sample does the following: 1. Create a 1MB buffer. 2. Fill the first 4kb with the pattern: data[i+0] = 1.0f; data[i+1] = 0.5f; data[i+2] = 0.15f; data[i+3] = 0.0f; 3. Attach it via: glActiveTexture( GL_TEXTURE0 ); glBindTexture( GL_TEXTURE_BUFFER, texName ); glTexBufferRange( GL_TEXTURE_BUFFER, GL_RGBA32F, vboName, elementStart*sizeof(float), 1024*4*sizeof(float) ); ++elementStart; 4. Render a triangle showing the colours. If the alignment is not a problem, I should see Orange-Cream, then Carmine, then blue-violetish, then green; repeat. Instead I **ALWAYS** see Orange-Cream; which is clearly incorrect. After running the sample long enough, elementStart will be > 1024 (thus 1024*4 exceeding 4kb) and begins to display black. (and if you leave it long enough running it could crash or throw an error, that's not the point) Thanks. It's not a hardware issue. It's a driver design issue. I'm working on it. (In reply to Marek Olšák from comment #15) > I'm working on it. Any news? It should be fixed in master already. You're awesome! I can confirm it works on both the sample repro I provided and my actual real case. Small note: We discovered there was a potential for a crash because at certain points we assume SIMD alignment; so we're now enforcing 16-byte alignment as minimum in our code. I'm doing this remark because Mesa is the only (or among the very few) implementations that return alignments <16; which can reveal hidden bugs in program code that runs fine everywhere else. If you ever see a bug report of these sorts (e.g. crashing in movaps, movntdqa), this is something to keep in mind. I appreciate the 4 byte alignment as 256 reported by others is excessively wasteful. Thanks. For the record it was fixed in https://cgit.freedesktop.org/mesa/mesa/commit/?id=7cd256ce7e4bad680bb77d033cf5dd662abab2dd and https://cgit.freedesktop.org/mesa/mesa/commit/?id=325379096f54dde39171d1b8804e29a8003bb3c7 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.