Bug 106390 - Invalid imageAtomicExchange() writes.
Summary: Invalid imageAtomicExchange() writes.
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/intel (show other bugs)
Version: 18.0
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Kenneth Graunke
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-03 19:59 UTC by Alexander
Modified: 2019-09-18 19:48 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Screenshots and shader (1.10 MB, application/zip)
2018-05-03 19:59 UTC, Alexander
Details
gradient store (1.33 MB, application/zip)
2018-05-04 05:50 UTC, Alexander
Details
app (811.47 KB, application/zip)
2018-05-05 07:46 UTC, Alexander
Details
denys_expected (761.05 KB, image/png)
2018-10-26 08:52 UTC, Denis
Details
simplified reproducer (59.72 KB, application/gzip)
2018-10-26 09:17 UTC, Danylo
Details
diff of shaders with wrong and correct gradients (161.52 KB, image/png)
2018-10-26 09:23 UTC, Danylo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander 2018-05-03 19:59:44 UTC
Created attachment 139324 [details]
Screenshots and shader

imageAtomicExchange() performs writing with invalid data layout for rgba8 texture. imageStore() works fine. The same behavior for OpenGL and Vulkan.

glxinfo:
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Iris Pro Graphics 580 (Skylake GT4e)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.0.0-rc4

vulkaninfo:
Vulkan Instance Version: 1.1.73
apiVersion     = 0x400039  (1.0.57)
driverVersion  = 75497472 (0x4800000)
vendorID       = 0x8086
deviceID       = 0x193b
Comment 1 Jason Ekstrand 2018-05-03 20:35:56 UTC
Image atomics are only allowed on R32_UINT or R32_SINT images.
Comment 2 Alexander 2018-05-04 03:38:43 UTC
So is the atomicImage* behavior different to imageStore? It's not absolutely clear from GL_ARB_shader_image_load_store specification: "For example, it is legal to access an image whose internal format is RGBA8 with an image unit format of R32UI."

Nvidia and AMD doesn't have a difference between:

layout(r32ui) uniform uimage2D surface;
imageAtomicExchange(surface, ..., packUnorm4x8(color))

layout(rgba8) uniform writeonly image2D surface;
imageStore(surface, ..., color)
Comment 3 Jason Ekstrand 2018-05-04 04:50:53 UTC
(In reply to Alexander from comment #2)
> So is the atomicImage* behavior different to imageStore? It's not absolutely
> clear from GL_ARB_shader_image_load_store specification: "For example, it is
> legal to access an image whose internal format is RGBA8 with an image unit
> format of R32UI."
> 
> Nvidia and AMD doesn't have a difference between:
> 
> layout(r32ui) uniform uimage2D surface;
> imageAtomicExchange(surface, ..., packUnorm4x8(color))
> 
> layout(rgba8) uniform writeonly image2D surface;
> imageStore(surface, ..., color)

Ok, yeah, that should work fine and I'm a bit confused as to why it's not.  Sorry for the overly quick response. :/  Are you sure the image unit is set up with the right format?

Unfortunately, from the pictures, it's hard to tell what's wrong.  Do you expect them to be exactly the same?  I suppose the one with atomics has a bit of a blue tint.

What happens if you bind as R32_UINT and then use imageStore(surface, ..., packUnorm4x8(color))?
Comment 4 Alexander 2018-05-04 05:50:14 UTC
(In reply to Jason Ekstrand from comment #3)
> (In reply to Alexander from comment #2)
> > So is the atomicImage* behavior different to imageStore? It's not absolutely
> > clear from GL_ARB_shader_image_load_store specification: "For example, it is
> > legal to access an image whose internal format is RGBA8 with an image unit
> > format of R32UI."
> > 
> > Nvidia and AMD doesn't have a difference between:
> > 
> > layout(r32ui) uniform uimage2D surface;
> > imageAtomicExchange(surface, ..., packUnorm4x8(color))
> > 
> > layout(rgba8) uniform writeonly image2D surface;
> > imageStore(surface, ..., color)
> 
> Ok, yeah, that should work fine and I'm a bit confused as to why it's not. 
> Sorry for the overly quick response. :/  Are you sure the image unit is set
> up with the right format?
> 
> Unfortunately, from the pictures, it's hard to tell what's wrong.  Do you
> expect them to be exactly the same?  I suppose the one with atomics has a
> bit of a blue tint.
> 
> What happens if you bind as R32_UINT and then use imageStore(surface, ...,
> packUnorm4x8(color))?

It's impossible to substitute imageAtomicExchange() with imageStore().
imageStore(uimage2D, ivec2, uvec4) is the only suitable signature
for r32ui format. The result of such operation is weird.

I have some update on this issue. I tried to store a gradient based on
shader invocation: vec4(ivec4(gl_GlobalInvocationID.xy, 0, 0) % 256) / 255.0f
And there is no difference between imageAtomicExchange() and imageStore().

It looks like imageAtomicExchange() somehow changes another texture sampling behavior.

Please check new attachment.
Comment 5 Alexander 2018-05-04 05:50:38 UTC
Created attachment 139328 [details]
gradient store
Comment 6 Jason Ekstrand 2018-05-04 07:00:37 UTC
(In reply to Alexander from comment #4)
> It's impossible to substitute imageAtomicExchange() with imageStore().
> imageStore(uimage2D, ivec2, uvec4) is the only suitable signature
> for r32ui format. The result of such operation is weird.

You would have to do imageStore(surface, ..., uvec4(packUnorm4x8(color))).
Comment 7 Alexander 2018-05-04 10:58:38 UTC
(In reply to Jason Ekstrand from comment #6)
> (In reply to Alexander from comment #4)
> > It's impossible to substitute imageAtomicExchange() with imageStore().
> > imageStore(uimage2D, ivec2, uvec4) is the only suitable signature
> > for r32ui format. The result of such operation is weird.
> 
> You would have to do imageStore(surface, ..., uvec4(packUnorm4x8(color))).

The result on Intel is a binary red-black image. The result on Nvidia is also weird and absolutely incorrect.
Comment 8 Jason Ekstrand 2018-05-04 15:23:22 UTC
Can you provide me with some mechanism to run your shader? It will be much easier to figure out what's going wrong if I can experience with it.  Even telling me the global works side and required input values should be enough.
Comment 9 Alexander 2018-05-05 07:46:32 UTC
Created attachment 139362 [details]
app

The block size depends on local group size.

Application output pattern is:

store  8x8 | atomic 4x4
atomic 1x1 | atomic 8x8
Comment 10 Denis 2018-10-26 08:52:15 UTC
Created attachment 142211 [details]
denys_expected

hi guys. I tested last attachment from Alexander on radeon and intel. Found out, that on radeon test looks without visible squads (on the right part of the picture). See attachment "denys_expected" with my "expected" output. 

Then I checked old mesa versions and found out, that the same picture I can see on 13.0.6 mesa. Below there is a result of bisection (tested on SKL and KBL):


9919542f1cfff70524bc6117d19bf88e59159caa is the first bad commit
commit 9919542f1cfff70524bc6117d19bf88e59159caa
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Sat Jan 14 23:32:12 2017 -0800

    i965: Make DCE set null destinations on messages with side effects.
    
    (Co-authored by Matt Turner.)
    
    Image atomics, for example, return a value - but the shader may not
    want to use it.  We assigned a useless VGRF destination.  This seemed
    harmless, but it can actually be quite harmful.  The register allocator
    has to assign that VGRF to a real register.  It may assign the same
    actual GRF to the destination of an instruction that follows soon after.
    
    This results in a write-after-write (WAW) dependency, and stall.
    
    A number of "Deus Ex: Mankind Divided" shaders use image atomics, but
    don't use the return value.  Several of these were hitting WAW stalls
    for nearly 14,000 (poorly estimated) cycles a pop.  Making dead code
    elimination null out the destination avoids this issue.
    
    This patch cuts one shader's estimated cycles by -98.39%!  Removing the
    message response should also help with data cluster bandwidth.
    
    On Skylake:
    
    (instruction counts remain identical)
    
    total cycles in shared programs: 255413890 -> 248081010 (-2.87%)
    cycles in affected programs: 12019948 -> 4687068 (-61.01%)
    helped: 24
    HURT: 10
    
    v2: Make can_omit_write independent of can_eliminate (Curro).
    
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Francisco Jerez <currojerez@riseup.net>
    Reviewed-by: Matt Turner <mattst88@gmail.com>

:040000 040000 d3ac50f716bb55e2919ea2a8098656be67dffdc4 9763545331f08faf330b432997836bbcdce7e8c1 M    src
Comment 11 Danylo 2018-10-26 09:15:41 UTC
Hi,

Without this commit gradients are the same for all patterns:

    store  8x8 | atomic 4x4
    atomic 1x1 | atomic 8x8

I have looked at a native code of the shader with and without the commit
and the only difference is in the send command:


        Incorrect gradient (with commit)
send(8)         null<1>UW       g1<8,8,1>UD
    dp data 1 ( DC typed atomic, Surface = 1, SIMD16, mov) mlen 3 rlen 0 { align1 1Q };

        Correct gradient (without commit)
send(8)         g20<1>UW        g1<8,8,1>UD
    dp data 1 ( DC typed atomic, Surface = 1, SIMD16, mov) mlen 3 rlen 1 { align1 1Q };


Here we can see that optimization from that commit is working as expected since
return value of imageAtomicExchange is unused; <dest> is being null in such case
and response length is 0.
In "Send Message" section of Volume 7: 3D-Media-GPGPU it is explicitly said
that it's correct to do this.

However something going wrong here.

One interesting observation is that writing to the red channel yields correct gradient,
while writing to the green or blue channel does not.

I cannot find anything related to such issue in docs, so someone who has more knowledge
of inner workings of GPU is needed here.

You can find slightly modified app to reproduce the issue in attachments.
Change the writing from the green channel to the red to see the difference.
Comment 12 Danylo 2018-10-26 09:17:15 UTC
Created attachment 142214 [details]
simplified reproducer
Comment 13 Danylo 2018-10-26 09:23:11 UTC
Created attachment 142215 [details]
diff of shaders with wrong and correct gradients
Comment 14 Danylo 2019-03-05 12:47:43 UTC
Returning to this there are several observations I can add:

1) imageAtomicExchange() works with constant data:

   imageAtomicExchange(out_surface, ivec2(gl_GlobalInvocationID.xy), 0xFF00FF00); 

   works while with variable color it doesn't.

2) Vertical gradiesnt are fine, the issue appears only when having different colors in a row.

3) In a row every pixel for which gl_LocalInvocationID.x == 0 has correct color while other (local_size_x - 1) pixels inherit its gba channels' values so

	vec4 color;
	if (gl_LocalInvocationID.x == 0) {
		color = vec4(1.0, 0.0, 0.0, 1.0);
	} else {
		color = vec4(0.0, gl_LocalInvocationID.x / float(GROUP_SIZE), 0.0, 1.0);
	}

        imageAtomicExchange(out_surface, ivec2(gl_GlobalInvocationID.xy), packUnorm4x8(color));

   produces red vertical lines with black color between them while it should produce red lines with green gradient in between.

I don't have any ideas what's wrong here.
Comment 15 Danylo 2019-05-28 12:59:16 UTC
Rechecked and the bug is still present.
Any atomic operation which returns value is affected by it.
Comment 16 GitLab Migration User 2019-09-18 19:48:51 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/835.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.