Bug 101572 - glMemoryBarrier is backwards
Summary: glMemoryBarrier is backwards
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: All All
: medium major
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-24 04:54 UTC by Matias N. Goldberg
Modified: 2017-06-27 17:12 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Matias N. Goldberg 2017-06-24 04:54:33 UTC
This bug may not just be in radeonsi.

I noticed the error after seeing my Compute Shaders produce output from an input FBO (used as a texture) that had missing draws.

According to spec, glMemoryBarrier reflects how the buffer will be *used afterwards*.

So for example if I have a Compute Shader that writes to an SSBO and later this buffer is used as an index buffer I should call:
glMemoryBarrier( GL_ELEMENT_ARRAY_BARRIER_BIT );
Because I will be using from this buffer later on as an index buffer.

However it appears Mesa expects me to call instead:
glMemoryBarrier( GL_SHADER_STORAGE_BARRIER_BIT );
because I am writing to this buffer as an SSBO before the barrier.

The problem I encountered specifically is that I was drawing to an FBO, and later on this FBO is used as a regular texture (sampler2D) in a compute shader. According to the spec, I should call:
glMemoryBarrier( GL_TEXTURE_FETCH_BARRIER_BIT );

However Mesa does not produce correct output unless I do:
glMemoryBarrier( GL_FRAMEBUFFER_BARRIER_BIT );

I had to re-read the spec several times and I was left wondered if I was the one who got it backwards. After all the language in which it is written is very confusing without a single example; however I then found:
http://malideveloper.arm.com/sample-code/introduction-compute-shaders-2/

which says:
"It’s important to remember the semantics of glMemoryBarrier(). As argument it takes a bitfield of various buffer types. We specify how we will read data after the memory barrier. In this case, we are writing the buffer via an SSBO, but we’re reading it when we’re using it as a vertex buffer, hence GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT."

I then consulted OpenGL SuperBible and the Programming Guide, and they both agree:
"glMemoryBarrier(GL_ATOMIC_COUNTER_BARRIER_BIT);
will ensure that any access to an atomic counter in a buffer object
will reflect updates to that buffer by a shader. You should call
glMemoryBarrier() with the GL_ATOMIC_COUNTER_BARRIER_BIT set when
something has written to a buffer that you want to see reflected in the
values of your atomic counters. If you update the values in a buffer using
an atomic counter and then use that buffer for something else, the bit you
include in the barriers parameter to glMemoryBarrier() should
correspond to what you want that buffer to be used for, which will not
necessarily include GL_ATOMIC_COUNTER_BARRIER_BIT." (from OpenGL Super Bible)

"GL_TEXTURE_FETCH_BARRIER_BIT specifies that any fetch from a
texture issued after the barrier should reflect data written to the texture
by commands issued before the barrier.
(...)
GL_FRAMEBUFFER_BARRIER_BIT specifies that reads or writes through
framebuffer attachments issued after the barrier will reflect data written
to those attachments by shaders executed before the barrier. Further,
writes to framebuffers issued after the barrier will be ordered with
respect to writes performed by shaders before the barrier." (from OpenGL Programming Manual).

It appears state_tracker/st_cb_texturebarrier.c also contains this bug because it ignores GL_TEXTURE_UPDATE_BARRIER_BIT & GL_BUFFER_UPDATE_BARRIER_BIT; as it assumes writes through texture/buffer_update will be done through Mesa functions which are always synchronized; instead of synchronizing the read and writes after the barrier.

This doesn't sound like it has a trivial fix, TBH I have no problem in supporting a glMemoryBarrierMESA( writesToFlushBeforeBarrier, readsToFlushAfterBarrier ) which would behave more sanely and the way you'd want (and I get the fastest path); and you then just workaround the standard glMemoryBarrier by issuing a barrier to all bits.
Comment 1 Nicolai Hähnle 2017-06-27 09:12:10 UTC
You're misinterpreting the spec.

glMemoryBarrier ensures that **writes from shaders** are visible to whatever consumer you indicate with the given flag bits.

What you seem to be trying to do is ensure that **writes via the framebuffer** are visible in subsequent compute shader invocations. For that, you need to either:

(1) Bind a different framebuffer, or (probably more appropriate to what you're trying to do)
(2) use glTextureBarrier.
Comment 2 Matias N. Goldberg 2017-06-27 16:46:18 UTC
That can't be right.

You're suggesting that in order to synchronize writes from FBO with a Compute Shader I am going to dispatch (which btw the Compute Shader is accessing this fbo as a regular sample fetch, not via imageLoad/imageStore) then I either need to:

1. Switch to a dummy FBO, something that is mentioned nowhere: neither on manuals or online documentation, wikis, or tutorials; also it's not mentioned to be a guarantee in the spec either.

2. Use a function that was added in OpenGL 4.5; when Compute Shaders were added in 4.3.

I may be misinterpreting the spec; but these solution don't make any sense. Best case Mesa should detect that I am trying to read from an FBO in a compute shader being dispatched and issue a barrier for me; worst case one of the functionality already present in 4.3 (like glMemoryBarrier) that doesn't look esoteric (like switching FBOs) should be enough to synchronize.
Comment 3 Matias N. Goldberg 2017-06-27 16:49:06 UTC
I rather be told that I am wrong in how to interpret glMemoryBarrier, and that I should be calling glMemoryBarrier( GL_FRAMEBUFFER_BARRIER_BIT ); because this and that.
Comment 4 Matias N. Goldberg 2017-06-27 17:11:44 UTC
After careful thought; I realized I don't need to switch to a dummy FBO; just unset the current one.

That DOES make sense to me as while the FBO is bound and I'm executing the compute shader that reads from it, I could as well be telling the driver to do both things at the same time.

I could see why unsetting the FBO would flush the necessary caches (due to OpenGL guarantees, I'm saying "I'm done writing to this FBO" and now reads should be guaranteed to be correct), so I'm going to take this for an answer.

I don't want to spend much more time on this matter either.

Thanks.
Comment 5 Matias N. Goldberg 2017-06-27 17:12:01 UTC
(btw unsetting the FBO indeed fixes the issue)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.