Created attachment 139324 [details] Screenshots and shader imageAtomicExchange() performs writing with invalid data layout for rgba8 texture. imageStore() works fine. The same behavior for OpenGL and Vulkan. glxinfo: OpenGL vendor string: Intel Open Source Technology Center OpenGL renderer string: Mesa DRI Intel(R) Iris Pro Graphics 580 (Skylake GT4e) OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.0.0-rc4 vulkaninfo: Vulkan Instance Version: 1.1.73 apiVersion = 0x400039 (1.0.57) driverVersion = 75497472 (0x4800000) vendorID = 0x8086 deviceID = 0x193b
Image atomics are only allowed on R32_UINT or R32_SINT images.
So is the atomicImage* behavior different to imageStore? It's not absolutely clear from GL_ARB_shader_image_load_store specification: "For example, it is legal to access an image whose internal format is RGBA8 with an image unit format of R32UI." Nvidia and AMD doesn't have a difference between: layout(r32ui) uniform uimage2D surface; imageAtomicExchange(surface, ..., packUnorm4x8(color)) layout(rgba8) uniform writeonly image2D surface; imageStore(surface, ..., color)
(In reply to Alexander from comment #2) > So is the atomicImage* behavior different to imageStore? It's not absolutely > clear from GL_ARB_shader_image_load_store specification: "For example, it is > legal to access an image whose internal format is RGBA8 with an image unit > format of R32UI." > > Nvidia and AMD doesn't have a difference between: > > layout(r32ui) uniform uimage2D surface; > imageAtomicExchange(surface, ..., packUnorm4x8(color)) > > layout(rgba8) uniform writeonly image2D surface; > imageStore(surface, ..., color) Ok, yeah, that should work fine and I'm a bit confused as to why it's not. Sorry for the overly quick response. :/ Are you sure the image unit is set up with the right format? Unfortunately, from the pictures, it's hard to tell what's wrong. Do you expect them to be exactly the same? I suppose the one with atomics has a bit of a blue tint. What happens if you bind as R32_UINT and then use imageStore(surface, ..., packUnorm4x8(color))?
(In reply to Jason Ekstrand from comment #3) > (In reply to Alexander from comment #2) > > So is the atomicImage* behavior different to imageStore? It's not absolutely > > clear from GL_ARB_shader_image_load_store specification: "For example, it is > > legal to access an image whose internal format is RGBA8 with an image unit > > format of R32UI." > > > > Nvidia and AMD doesn't have a difference between: > > > > layout(r32ui) uniform uimage2D surface; > > imageAtomicExchange(surface, ..., packUnorm4x8(color)) > > > > layout(rgba8) uniform writeonly image2D surface; > > imageStore(surface, ..., color) > > Ok, yeah, that should work fine and I'm a bit confused as to why it's not. > Sorry for the overly quick response. :/ Are you sure the image unit is set > up with the right format? > > Unfortunately, from the pictures, it's hard to tell what's wrong. Do you > expect them to be exactly the same? I suppose the one with atomics has a > bit of a blue tint. > > What happens if you bind as R32_UINT and then use imageStore(surface, ..., > packUnorm4x8(color))? It's impossible to substitute imageAtomicExchange() with imageStore(). imageStore(uimage2D, ivec2, uvec4) is the only suitable signature for r32ui format. The result of such operation is weird. I have some update on this issue. I tried to store a gradient based on shader invocation: vec4(ivec4(gl_GlobalInvocationID.xy, 0, 0) % 256) / 255.0f And there is no difference between imageAtomicExchange() and imageStore(). It looks like imageAtomicExchange() somehow changes another texture sampling behavior. Please check new attachment.
Created attachment 139328 [details] gradient store
(In reply to Alexander from comment #4) > It's impossible to substitute imageAtomicExchange() with imageStore(). > imageStore(uimage2D, ivec2, uvec4) is the only suitable signature > for r32ui format. The result of such operation is weird. You would have to do imageStore(surface, ..., uvec4(packUnorm4x8(color))).
(In reply to Jason Ekstrand from comment #6) > (In reply to Alexander from comment #4) > > It's impossible to substitute imageAtomicExchange() with imageStore(). > > imageStore(uimage2D, ivec2, uvec4) is the only suitable signature > > for r32ui format. The result of such operation is weird. > > You would have to do imageStore(surface, ..., uvec4(packUnorm4x8(color))). The result on Intel is a binary red-black image. The result on Nvidia is also weird and absolutely incorrect.
Can you provide me with some mechanism to run your shader? It will be much easier to figure out what's going wrong if I can experience with it. Even telling me the global works side and required input values should be enough.
Created attachment 139362 [details] app The block size depends on local group size. Application output pattern is: store 8x8 | atomic 4x4 atomic 1x1 | atomic 8x8
Created attachment 142211 [details] denys_expected hi guys. I tested last attachment from Alexander on radeon and intel. Found out, that on radeon test looks without visible squads (on the right part of the picture). See attachment "denys_expected" with my "expected" output. Then I checked old mesa versions and found out, that the same picture I can see on 13.0.6 mesa. Below there is a result of bisection (tested on SKL and KBL): 9919542f1cfff70524bc6117d19bf88e59159caa is the first bad commit commit 9919542f1cfff70524bc6117d19bf88e59159caa Author: Kenneth Graunke <kenneth@whitecape.org> Date: Sat Jan 14 23:32:12 2017 -0800 i965: Make DCE set null destinations on messages with side effects. (Co-authored by Matt Turner.) Image atomics, for example, return a value - but the shader may not want to use it. We assigned a useless VGRF destination. This seemed harmless, but it can actually be quite harmful. The register allocator has to assign that VGRF to a real register. It may assign the same actual GRF to the destination of an instruction that follows soon after. This results in a write-after-write (WAW) dependency, and stall. A number of "Deus Ex: Mankind Divided" shaders use image atomics, but don't use the return value. Several of these were hitting WAW stalls for nearly 14,000 (poorly estimated) cycles a pop. Making dead code elimination null out the destination avoids this issue. This patch cuts one shader's estimated cycles by -98.39%! Removing the message response should also help with data cluster bandwidth. On Skylake: (instruction counts remain identical) total cycles in shared programs: 255413890 -> 248081010 (-2.87%) cycles in affected programs: 12019948 -> 4687068 (-61.01%) helped: 24 HURT: 10 v2: Make can_omit_write independent of can_eliminate (Curro). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com> :040000 040000 d3ac50f716bb55e2919ea2a8098656be67dffdc4 9763545331f08faf330b432997836bbcdce7e8c1 M src
Hi, Without this commit gradients are the same for all patterns: store 8x8 | atomic 4x4 atomic 1x1 | atomic 8x8 I have looked at a native code of the shader with and without the commit and the only difference is in the send command: Incorrect gradient (with commit) send(8) null<1>UW g1<8,8,1>UD dp data 1 ( DC typed atomic, Surface = 1, SIMD16, mov) mlen 3 rlen 0 { align1 1Q }; Correct gradient (without commit) send(8) g20<1>UW g1<8,8,1>UD dp data 1 ( DC typed atomic, Surface = 1, SIMD16, mov) mlen 3 rlen 1 { align1 1Q }; Here we can see that optimization from that commit is working as expected since return value of imageAtomicExchange is unused; <dest> is being null in such case and response length is 0. In "Send Message" section of Volume 7: 3D-Media-GPGPU it is explicitly said that it's correct to do this. However something going wrong here. One interesting observation is that writing to the red channel yields correct gradient, while writing to the green or blue channel does not. I cannot find anything related to such issue in docs, so someone who has more knowledge of inner workings of GPU is needed here. You can find slightly modified app to reproduce the issue in attachments. Change the writing from the green channel to the red to see the difference.
Created attachment 142214 [details] simplified reproducer
Created attachment 142215 [details] diff of shaders with wrong and correct gradients
Returning to this there are several observations I can add: 1) imageAtomicExchange() works with constant data: imageAtomicExchange(out_surface, ivec2(gl_GlobalInvocationID.xy), 0xFF00FF00); works while with variable color it doesn't. 2) Vertical gradiesnt are fine, the issue appears only when having different colors in a row. 3) In a row every pixel for which gl_LocalInvocationID.x == 0 has correct color while other (local_size_x - 1) pixels inherit its gba channels' values so vec4 color; if (gl_LocalInvocationID.x == 0) { color = vec4(1.0, 0.0, 0.0, 1.0); } else { color = vec4(0.0, gl_LocalInvocationID.x / float(GROUP_SIZE), 0.0, 1.0); } imageAtomicExchange(out_surface, ivec2(gl_GlobalInvocationID.xy), packUnorm4x8(color)); produces red vertical lines with black color between them while it should produce red lines with green gradient in between. I don't have any ideas what's wrong here.
Rechecked and the bug is still present. Any atomic operation which returns value is affected by it.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/835.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.