Bug 110229

Summary: glMemoryBarrier doesn't work.
Product: Mesa Reporter: Laurent <laurentduroisin>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED INVALID QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: laurentduroisin
Version: 19.0   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: The class which render stuff with the a per-pixel-linked-list
The main function.

Description Laurent 2019-03-23 22:55:52 UTC
Hi!
I'm trying using per pixel linked lists with a source code that I've found on the internet, but it deosn't work, the driver doesn't wait the shaders executons have finished before updating the framebuffer event if I call glFinish.

Mmmm..., I think it's a bug, I don't thing it's the code because the code is not from me but from a code that I picked up on the Internet and which works on windows.

Or maybe it's updating the double buffer with an empty framebuffer.

I added an sf::sleep to see what happens and it display the sprites during 5 seconds and then nothings.

The source code of my project can be found here :

https://github.com/LaurentDuroisin/ODFAEG

The bugs : 

-In the LightRenderComponent class, sometimes the normalMap is not updated before the lights are drawn so the light is drawn above the wall even if the light is behind the wall.

-In the per PerPixelLinkedListRenderComponent : same problem, the frame buffer is updated before the shaders have finished to execute so nothing is drawn.

-In the OITRenderComponent class (weighted blended oit), same problem.
Comment 1 Andre Klapper 2019-03-23 23:22:40 UTC
Does this still happen with mesa 19.0?
Comment 2 Laurent 2019-03-24 08:34:28 UTC
I've installed mesa 19 :

glxinfo | grep OpenGL
OpenGL vendor string: X.Org
OpenGL renderer string: AMD CEDAR (DRM 2.50.0 / 4.15.0-46-generic, LLVM 8.0.0)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 19.1.0-devel (git-1501207 2019-03-23 bionic-oibaf-ppa)
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 19.1.0-devel (git-1501207 2019-03-23 bionic-oibaf-ppa)
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 19.1.0-devel (git-1501207 2019-03-23 bionic-oibaf-ppa)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
OpenGL ES profile extensions:

But I've still the same problem, shader output data are processed after the framebuffer is updated so it doens't display anything. (Or only a part of the scene in some rare cases)
Comment 3 Laurent 2019-03-25 21:40:31 UTC
I've more information about the bug and I've an idea of how to fix it.

It seems that the window content is updated each time one x,y pixel is drawn but several pixels can be drawn at a same window position at the same time.

The solution is to make a list of each pixels to drawn, withdraw the pixel of the list once it's drawn and update the window content once the list is empty.


The picture here show the problem, only the first color of my per-pixel linked list is drawn (red). Or the color os the second sprite is green and the color of the third srite is blue.Si it should blend red pixels with green pixels and then green pixels with blue pixels or it only blend the first pixel color of the linked list : (Even with a call to the memory barrier function which have to wait, until all subsequent writes have finished)

https://image.noelshack.com/fichiers/2019/13/1/1553549295-capture-d-ecran-de-2019-03-25-01-44-31.png

Now I've to check where I have to put this code onto source code files but it's not easy because there are a lot of files, and the recompile the driver.
Comment 4 Laurent 2019-03-26 00:27:23 UTC
Heu no, not a list, but a counter, sorry.
Comment 5 Laurent 2019-03-26 17:22:47 UTC
The code is too large, there are a lot of folders and they are a lot of pointers to functions so it's very difficult to find where a specific function is called, and anyway, when I tried to compile the driver to set breakpoints to find where are the function calls, some folders/files are missing like the folder pipe in the folder util. 

I think I'll have to make clean the source code first.
Comment 6 Laurent 2019-03-26 22:09:10 UTC
Mmm... ok it's when I'm using a shader that the bug appears (without shaders, it works), I'll try to run gallium in Debug mode to see if the shaders outputs are correct, and if so, this is probably the attached framebuffer texture or image which is bad updated. (No matter if I use an image or an FBO as render target, the bug, the bug is still present then)

I tried to use primitive synchronization objects and memory barrier to see if it change something but not the problem seems to be in the draw function.

I can't guarantee I'll be able to do something about this, I've no knowledge about driver management, my game project already takes me a lot of time, and I don't thing I'll have the time to remake a driver if I can't fix this bug so I have two choose :

-Hoping this bug'll be fixed.
-If it's not the case, not using shaders.
Comment 7 Laurent 2019-03-27 00:39:53 UTC
Ha! I think I've found why the shader output is not good if my guessing about how the CPU and the GPU works together is good.

I'll try to change it and recompile it to see if it works.
Comment 8 Laurent 2019-03-27 17:33:34 UTC
Ok I'm trying to recompile mesa to see if the bug is fixed, but, I've compilation errors :

In file included from ../src/gallium/state_trackers/dri/dri_helpers.h:26:0,
                 from ../src/gallium/state_trackers/dri/dri2.c:49:
../src/gallium/state_trackers/dri/dri_context.h:85:1: error: unknown type name ‘GLXContext’; did you mean ‘EGLContext’?
 GLXContext
 ^~~~~~~~~~
 EGLContext
../src/gallium/state_trackers/dri/dri2.c: In function ‘dri2_init_screen’:
../src/gallium/state_trackers/dri/dri2.c:1955:53: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
             dri2ImageExtension.queryDmaBufModifiers =
                                                     ^
../src/gallium/state_trackers/dri/dri2.c: In function ‘dri_kms_init_screen’:
../src/gallium/state_trackers/dri/dri2.c:2032:50: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
          dri2ImageExtension.queryDmaBufModifiers = dri2_query_dma_buf_modifiers;
                                                  ^
../src/gallium/state_trackers/dri/dri2.c: At top level:
../src/gallium/state_trackers/dri/dri2.c:2089:21: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
    .CreateContext = dri_create_context,
                     ^~~~~~~~~~~~~~~~~~
../src/gallium/state_trackers/dri/dri2.c:2089:21: note: (near initialization for ‘galliumdrm_driver_api.CreateContext’)
../src/gallium/state_trackers/dri/dri2.c:2091:20: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
    .CreateBuffer = dri2_create_buffer,
                    ^~~~~~~~~~~~~~~~~~
../src/gallium/state_trackers/dri/dri2.c:2091:20: note: (near initialization for ‘galliumdrm_driver_api.CreateBuffer’)
../src/gallium/state_trackers/dri/dri2.c:2093:19: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
    .MakeCurrent = dri_make_current,
                   ^~~~~~~~~~~~~~~~
../src/gallium/state_trackers/dri/dri2.c:2093:19: note: (near initialization for ‘galliumdrm_driver_api.MakeCurrent’)
../src/gallium/state_trackers/dri/dri2.c:2094:21: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
    .UnbindContext = dri_unbind_context,
                     ^~~~~~~~~~~~~~~~~~
../src/gallium/state_trackers/dri/dri2.c:2094:21: note: (near initialization for ‘galliumdrm_driver_api.UnbindContext’)
../src/gallium/state_trackers/dri/dri2.c:2110:21: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
    .CreateContext = dri_create_context,
                     ^~~~~~~~~~~~~~~~~~
../src/gallium/state_trackers/dri/dri2.c:2110:21: note: (near initialization for ‘dri_kms_driver_api.CreateContext’)
../src/gallium/state_trackers/dri/dri2.c:2112:20: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
    .CreateBuffer = dri2_create_buffer,
                    ^~~~~~~~~~~~~~~~~~
../src/gallium/state_trackers/dri/dri2.c:2112:20: note: (near initialization for ‘dri_kms_driver_api.CreateBuffer’)
../src/gallium/state_trackers/dri/dri2.c:2114:19: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
    .MakeCurrent = dri_make_current,
                   ^~~~~~~~~~~~~~~~
../src/gallium/state_trackers/dri/dri2.c:2114:19: note: (near initialization for ‘dri_kms_driver_api.MakeCurrent’)
../src/gallium/state_trackers/dri/dri2.c:2115:21: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
    .UnbindContext = dri_unbind_context,
                     ^~~~~~~~~~~~~~~~~~
../src/gallium/state_trackers/dri/dri2.c:2115:21: note: (near initialization for ‘dri_kms_driver_api.UnbindContext’)
cc1: some warnings being treated as errors
ninja: build stopped: subcommand failed.

But no one is responding here, maybe I should use the mailing list but, I can't find a way to send messages so.
Comment 9 Laurent 2019-03-27 18:28:15 UTC
Ok I stop posting here, no one is responding. I've corrected the compilation errors.
Comment 10 Laurent 2019-03-29 14:23:34 UTC
Hey is there someone here ???

I tested and, for the bug with the shader, it seems it works, I thing, you made a mistake, if I look at the source code, I didn't see any command to start the shader execution so, I guess the shader starts to execute immediately after the first instruction is loaded, so, I guess, the problem was this :

-You load the first instruction (the boot) tho the GPU memory, it initialize the offset register to go to the next instruction but, there is no guarantee that the CPU has loaded the next instruction before the GPU is pointing to it.

The solution is to load the last instruction first, and the first instruction (the boot) last so, you're guarantee that all the shader instructions are loaded before the GPU start to execute them.

So the shader output is now correct , the light and the shadows are correctly displayed but it stills doesn't work because there is another problem, and it's not with the shaders this time because if I don't use shaders, the problem remains.

Sometimes the framebuffer is not updated so it doesn't draw anything.

Mmmm...I think there are some critical bugs with this driver. I don't know what to do, find another driver or, trying to fix it but it's too difficult to find where the rasterizer writes output datas to the render buffer, because, there are a lot of source code.

So, it's hard coded, bad coded and no one is responding so ..., bye.
Comment 11 Daniel Stone 2019-03-29 14:31:09 UTC
(In reply to Laurent from comment #10)
> -You load the first instruction (the boot) tho the GPU memory, it initialize
> the offset register to go to the next instruction but, there is no guarantee
> that the CPU has loaded the next instruction before the GPU is pointing to
> it.
> 
> The solution is to load the last instruction first, and the first
> instruction (the boot) last so, you're guarantee that all the shader
> instructions are loaded before the GPU start to execute them.

I'm afraid that's not the problem. The drivers compile the whole shader, then copy it to memory, and the entire content is present before the shader stops executing. This has been extensively battle-tested in many games, conformance suites, computer-vision analysis, industrial-scale movie renders, etc.

The real answer is that maintaining these drivers is a lot of work, and the people developing it cannot always drop everything to examine your source code and debug a relatively complex application for you in the space of a few days.

GPU execution is very, very, different from CPU execution: in particular, fragment shading is massively parallel (frequently within tiles). It may be that the execution order is not what you expect, or the access to shared variables is also not what you expect. It's hard to tell from the code, since the per-fragment linked-list approach you have is definitely ... novel.

I would try to begin by examining clearly what is going on at each step of your fragment shader execution, so you can build a more precise theory of what is going wrong. Good luck.
Comment 12 Laurent 2019-03-29 15:43:32 UTC
Ok so the problem wasn't there.

It's when I bind a texture to an image or to a frame buffer objet that sometimes that texture is not correctly updated. (Even if I don't use shaders)

I'll try to find where data are copied from the binded framebuffer object or image but it'll be difficult in all this source code.

I really need to set breakpoints, but I don't know how to proceed with a driver, with a library I can install it, compile it in debug mode, and settings breakpoints with my EDI to find where it bugs, but, it's more difficult with a driver...
Comment 13 Laurent 2019-03-29 21:36:57 UTC
"I'm afraid that's not the problem. The drivers compile the whole shader, then copy it to memory, and the entire content is present before the shader stops executing. This has been extensively battle-tested in many games, conformance suites, computer-vision analysis, industrial-scale movie renders, etc."

Yeah but the shader'll not stop if he's not launched, and you have to use some synchronisation mechanism to copy the content before the shader stops.


"The real answer is that maintaining these drivers is a lot of work, and the people developing it cannot always drop everything to examine your source code and debug a relatively complex application for you in the space of a few days."

Yes of course.

"GPU execution is very, very, different from CPU execution: in particular, fragment shading is massively parallel (frequently within tiles). It may be that the execution order is not what you expect, or the access to shared variables is also not what you expect. It's hard to tell from the code, since the per-fragment linked-list approach you have is definitely ... novel."

Yeah so there are one thread per fragment and not per instruction.
With the CPU you need mutex to avoid concurrent access, and I think two CPU thread cannot write/read to/from the memory at the same time, the GPU can because he use one micro controller per thread (if I'm not wrong) so you don't need to protect from concurrent access but you need to use an OIT mechanism in the driver (I think) because you don't know which thread'll write to the memory first...
Comment 14 Laurent 2019-03-29 21:39:31 UTC
(In reply to Laurent from comment #13)
> "I'm afraid that's not the problem. The drivers compile the whole shader,
> then copy it to memory, and the entire content is present before the shader
> stops executing. This has been extensively battle-tested in many games,
> conformance suites, computer-vision analysis, industrial-scale movie
> renders, etc."
> 
> Yeah but the shader'll not stop if he's not launched, and you have to use
> some synchronisation mechanism to copy the content before the shader stops.
> 
> 
> "The real answer is that maintaining these drivers is a lot of work, and the
> people developing it cannot always drop everything to examine your source
> code and debug a relatively complex application for you in the space of a
> few days."
> 
> Yes of course.
> 
> "GPU execution is very, very, different from CPU execution: in particular,
> fragment shading is massively parallel (frequently within tiles). It may be
> that the execution order is not what you expect, or the access to shared
> variables is also not what you expect. It's hard to tell from the code,
> since the per-fragment linked-list approach you have is definitely ...
> novel."
> 
> Yeah so there are one thread per fragment and not per instruction.
> With the CPU you need mutex to avoid concurrent access, and I think two CPU
> thread cannot write/read to/from the memory at the same time, the GPU can
> because he use one micro controller per thread (if I'm not wrong) so you
> don't need to protect from concurrent access but you need to use an OIT
> mechanism in the driver (I think) because you don't know which thread'll
> write to the memory first...

And (I've forgot), using mutex is slow so, normally we don't use them with the GPU.
Comment 15 Laurent 2019-03-30 08:11:56 UTC
Ok, I think I'll try to make my own driver, all the drivers that have tested (opensource and proprietary drivers) contains too much bugs. (If the bug is fixed I'll stop but, I really need a working driver for my project, I can't take the risk to wait if I'm not sure this bug'll be fixed...
Comment 16 Laurent 2019-03-30 18:41:08 UTC
Erf this is really bad for me, I can't find a tutorial on how to use the DRM lib to communicate with the graphic chip, the mesa source code is too much complicated for a beginner.
And All I see are c++ functions calls like memset, and for calling the GPU thread I also doesn't know how it works but I guess linux load the instructions in the gpu memory itself with DRM I think. 
And it seems that gnome use the graphic driver because without it, gnome doesn't start, and I even don't know how gnome communicate with the driver. 
I also don't know if opengl functions are loaded in a specific memory location to retrieve the function ptr address later or, simply calling a function with a specific name in a the OpenGL lib file.

I'm afraid I can't do anything now, or, simply don't using FBO or shader images because it doesn't works.
Comment 17 Laurent 2019-03-31 03:08:05 UTC
It seems to be more simple that was I thought, mesa use posix threads for the GPU, the only new thing here is the use of libdrm for dma.

So I only have to code this like a simple c++ program, and then put it, in the right linux folder to test it. (I'm not sure I can test it directly because, normally linux needs to load the driver)
I'm afraid if I test it directly with my EDI and a main function threads'll not be executed by the GPU, but by the CPU.
Comment 18 Laurent 2019-04-01 08:57:29 UTC
I saw you wrote a GLSL compiler to translate GLSL source code to binary code but I  think it's a waste of time because GLSL source code is very similar to C source code so I think I'll use gcc compiler to compile GLSL source code before sending it to the VRAM and executing it. (The only thing I'll have to do is translate GLSL code to C source code.

I'll saving me a lot of time and in this way I think I'll be able to debug it more quickly.
Comment 19 Daniel Stone 2019-04-01 09:11:51 UTC
(In reply to Laurent from comment #18)
> Yeah so there are one thread per fragment and not per instruction. With the CPU you need mutex to avoid concurrent access, and I think two CPU thread cannot write/read to/from the memory at the same time, the GPU can because he use one micro controller per thread (if I'm not wrong) so you don't need to protect from concurrent access but you need to use an OIT mechanism in the driver (I think) because you don't know which thread'll write to the memory first...

Yes, you're right that the GPU executes code for multiple fragments concurrently. The shader code has to know this and implement its own concurrency protection. This is one of the (many) reasons why pretty much no-one else uses a linked-list implementation in a fragment shader.

You also have the issue of wasted work: by the time a fragment shader executes, you might as well just render the fragment, since so much work has already been done to arrive at the per-fragment execution stage.

OpenGL and Vulkan already provide you many tools you can use to control how you render, such as compute shaders with indirect dispatch, or even just using stencil/depth tests (which you can also implement with a compute shader). Using compute shaders might also help you with your concurrency problems. I would recommend looking into these common approaches first.

> I saw you wrote a GLSL compiler to translate GLSL source code to binary code
> but I  think it's a waste of time because GLSL source code is very similar
> to C source code so I think I'll use gcc compiler to compile GLSL source
> code before sending it to the VRAM and executing it. (The only thing I'll
> have to do is translate GLSL code to C source code.
> 
> I'll saving me a lot of time and in this way I think I'll be able to debug
> it more quickly.

When you run gcc, it produces code targeted for your CPU (x86-64 or Arm). gcc cannot produce code which will be executed by a GPU. It also cannot parse GLSL: even though GLSL visually looks a bit like C, it obviously executes very differently.

I think you will very quickly discover that the amount of code in Mesa is not a 'waste of time', and is in fact all required to be able to run things on a GPU.

I'm going to close this report as 'NOTOURBUG', since it seems like the problems you have in your code are caused by incorrectly using OpenGL, and could be better handled through an OpenGL user support forum, or by following tutorials (e.g. on depth/stencil or compute shaders).
Comment 20 Laurent 2019-04-01 11:12:13 UTC
(In reply to Daniel Stone from comment #19)
> (In reply to Laurent from comment #18)
> > Yeah so there are one thread per fragment and not per instruction. With the CPU you need mutex to avoid concurrent access, and I think two CPU thread cannot write/read to/from the memory at the same time, the GPU can because he use one micro controller per thread (if I'm not wrong) so you don't need to protect from concurrent access but you need to use an OIT mechanism in the driver (I think) because you don't know which thread'll write to the memory first...
> 
> Yes, you're right that the GPU executes code for multiple fragments
> concurrently. The shader code has to know this and implement its own
> concurrency protection. This is one of the (many) reasons why pretty much
> no-one else uses a linked-list implementation in a fragment shader.
> 
> You also have the issue of wasted work: by the time a fragment shader
> executes, you might as well just render the fragment, since so much work has
> already been done to arrive at the per-fragment execution stage.
> 
> OpenGL and Vulkan already provide you many tools you can use to control how
> you render, such as compute shaders with indirect dispatch, or even just
> using stencil/depth tests (which you can also implement with a compute
> shader). Using compute shaders might also help you with your concurrency
> problems. I would recommend looking into these common approaches first.
> 
> > I saw you wrote a GLSL compiler to translate GLSL source code to binary code
> > but I  think it's a waste of time because GLSL source code is very similar
> > to C source code so I think I'll use gcc compiler to compile GLSL source
> > code before sending it to the VRAM and executing it. (The only thing I'll
> > have to do is translate GLSL code to C source code.
> > 
> > I'll saving me a lot of time and in this way I think I'll be able to debug
> > it more quickly.
> 
> When you run gcc, it produces code targeted for your CPU (x86-64 or Arm).
> gcc cannot produce code which will be executed by a GPU. It also cannot
> parse GLSL: even though GLSL visually looks a bit like C, it obviously
> executes very differently.
> 
> I think you will very quickly discover that the amount of code in Mesa is
> not a 'waste of time', and is in fact all required to be able to run things
> on a GPU.
> 
> I'm going to close this report as 'NOTOURBUG', since it seems like the
> problems you have in your code are caused by incorrectly using OpenGL, and
> could be better handled through an OpenGL user support forum, or by
> following tutorials (e.g. on depth/stencil or compute shaders).

Ok if you think that's not your bug, do as you wish, but why are you writing the driver source code in C if gcc cannot produce code which'll be executed on a GPU ?
Comment 21 Daniel Stone 2019-04-01 11:19:54 UTC
> but why are you writing the driver source code in C if gcc cannot produce code which'll be executed on a GPU ?

The driver executes code on the CPU.

One of the pieces of code executed on the CPU is the GLSL compiler, which parses GLSL source code and then produces machine code for the GPU to execute, then instructs the GPU to execute it. GCC does not parse GLSL code, it does not integrate GLSL code with OpenGL (e.g. load uniform values), it does not produce GPU machine code, and it does not instruct the GPU to execute that code.
Comment 22 Laurent 2019-04-01 12:05:54 UTC
Ok but I'll not use compute shader to do make my own pipeline because my SSBO with per pixel linked list doesn't work. 

I think it'll be more interesting to write a compiler to parse my c++ driver code into GPU code without passing by an api like openCL or a graphic driver with doesn't work and mostly if my bugs reports are not taken in consideration.
Comment 23 Daniel Stone 2019-04-01 12:20:44 UTC
> I think it'll be more interesting to write a compiler to parse my c++ driver code into GPU code without passing by an api like openCL or a graphic driver with doesn't work and mostly if my bugs reports are not taken in consideration.

We have considered your bug report, and the conclusion is that the driver works correctly, but your code does not use OpenGL correctly. It does not take into account that GPU execution is very, very, very, different from CPU execution.

Anyway, good luck writing a driver.
Comment 24 Laurent 2019-04-01 12:28:58 UTC
(In reply to Daniel Stone from comment #23)
> > I think it'll be more interesting to write a compiler to parse my c++ driver code into GPU code without passing by an api like openCL or a graphic driver with doesn't work and mostly if my bugs reports are not taken in consideration.
> 
> We have considered your bug report, and the conclusion is that the driver
> works correctly, but your code does not use OpenGL correctly. It does not
> take into account that GPU execution is very, very, very, different from CPU
> execution.
> 
> Anyway, good luck writing a driver.

Thanks I think in this way I'll better understand how openGL works, because even by going to the openGL forum I really doesn't understand why this doesn't works, and it doesn't seems there is someone who can help me so ...
Comment 25 Laurent 2019-04-08 13:19:49 UTC
Lol and you use C11 (cpu) thread to load gpu thread even if gcc can't generate byte code for a GPU ? I really doesn't understand what you're doing.

And for VBO you need to execute code with the GPU, not with the CPU, so, you can't only compile shader source code for a GPU, but all the driver source code.  

And I don't see any protection mechanism on the output shader data to protect from concurrent access.
Comment 26 Laurent 2019-04-11 10:46:41 UTC
Hi! Is still someone here ?

I've found the function which doesn't work, it's glMemoryBarrier.

glCheck(glMemoryBarrier( GL_SHADER_STORAGE_BARRIER_BIT ));

It doesn't wait after the first shader have written all nodes to the SSBO before the second shader read it!

This is why the per pixel linked list doesn't work.
Comment 27 Andre Klapper 2019-04-11 11:31:03 UTC
Feel free to attach a minimal, self-contained test case that allows reproducing.

Plus always include filenames and paths when posting random code lines.
Comment 28 Laurent 2019-04-11 13:23:41 UTC
Created attachment 143938 [details]
The class which render stuff with the a per-pixel-linked-list

Problem at line 201, glMemoryBarrier doesn't work, n is always equal to zero in the loop while in the second shader.
Comment 29 Laurent 2019-04-11 13:28:25 UTC
Created attachment 143939 [details]
The main function.
Comment 30 Laurent 2019-04-11 13:34:16 UTC
What do you mean by a test case ?

To make the source code easier to read, I put the source code into c++ class.
But if you want to have a source code to produce an executable file, just ask. ;)
But it'll take more time.

The whole source code can be found here :

https://github.com/LaurentDuroisin/ODFAEG
Comment 31 Laurent 2019-04-11 15:00:04 UTC
Okey, I've found a temporary solution : putting instructions in the second fragment shader to slow down the shader execution code and it works...

const std::string fragmentShader2 =
               R"(
               #version 140
               #extension GL_ARB_shader_atomic_counters : require
               #extension GL_ARB_shading_language_420pack : require
               #extension GL_ARB_shader_image_load_store : require
               #extension GL_ARB_shader_storage_buffer_object : require
               #define MAX_FRAGMENTS 75
               struct NodeType {
                  vec4 color;
                  float depth;
                  uint next;
               };
               layout(binding = 0, r32ui) coherent uniform uimage2D headPointers;
               layout(binding = 0, std430) coherent buffer linkedLists {
                   NodeType nodes[];
               };
               void main() {
                  NodeType frags[MAX_FRAGMENTS];
                  NodeType frags2[MAX_FRAGMENTS];
                  int count = 0;
                  uint n = imageLoad(headPointers, ivec2(gl_FragCoord.xy)).r;
                  while( n != 0u && count < MAX_FRAGMENTS) {
                       frags[count] = nodes[n];
                       frags2[count] = frags[count];
                       n = nodes[n].next;
                       imageStore(headPointers, ivec2(gl_FragCoord.xy), uvec4(n, 0, 0, 0));
                       count++;
                  }
                  //merge sort
                  int i, j1, j2, k;
                  int a, b, c;
                  int step = 1;
                  NodeType leftArray[MAX_FRAGMENTS/2]; //for merge sort

                  while (step <= count)
                  {
                      i = 0;
                      while (i < count - step)
                      {
                          ////////////////////////////////////////////////////////////////////////
                          //merge(step, i, i + step, min(i + step + step, count));
                          a = i;
                          b = i + step;
                          c = (i + step + step) >= count ? count : (i + step + step);

                          for (k = 0; k < step; k++)
                              leftArray[k] = frags[a + k];

                          j1 = 0;
                          j2 = 0;
                          for (k = a; k < c; k++)
                          {
                              if (b + j1 >= c || (j2 < step && leftArray[j2].depth > frags[b + j1].depth))
                                  frags[k] = leftArray[j2++];
                              else
                                  frags[k] = frags[b + j1++];
                          }
                          ////////////////////////////////////////////////////////////////////////
                          i += 2 * step;
                      }
                      step *= 2;
                  }
                  vec4 color = vec4(0, 0, 0, 0);
                  for( int i = 0; i < count; i++ )
                  {
                      color = mix( color, frags[i].color, frags[i].color.a);
                      if (frags2[i].color.r > 1 || frags2[i].color.g > 1 || frags2[i].color.b > 1)
                        color = vec4(1, 1, 1, 1);
                  }
                  gl_FragColor = color;
               })";

First I had to affect n whith nodes[n] instead of frags[count], I don't understand why.
Secondly I have to make another array and adding a if in the last loop (for) when I mix colors, otherwise, I have a black screen....
Comment 32 Andre Klapper 2019-04-11 15:39:08 UTC
Closing as invalid as nothing was fixed in Mesa code and as this issue tracker is not a support desk to go through hundreds of lines of external random code.
Comment 33 Laurent 2019-04-11 18:05:50 UTC
(In reply to Andre Klapper from comment #32)
> Closing as invalid as nothing was fixed in Mesa code and as this issue
> tracker is not a support desk to go through hundreds of lines of external
> random code.

Of course nothing was fixed in Mesa code I've no time to waste by checking bugs in thousands lines of code.
Comment 34 Andre Klapper 2019-04-12 09:53:13 UTC
Right. And we cannot waste time going through your custom code if you do not provide a minimal self-contained test case.
Comment 35 Laurent 2019-04-12 13:03:11 UTC
(In reply to Andre Klapper from comment #34)
> Right. And we cannot waste time going through your custom code if you do not
> provide a minimal self-contained test case.

Ok next time I'll provide a test case, anyway, I'll have to do this, because, I've a lot of bugs (or crash) with new opengl features. (From opengl 3.3 to opengl 4.4 (opengl 4.5 with GL_ARB_bindless_texture is not supported by my GPU) but normally (I've the specs of my GPU here), all functionalities until opengl 4.4 should be supported)

This is why my shader version is 140 and not 330.

On windows I've a driver which runs opengl 3.3 without any problems but it's an old driver and some shaders doesn't seems to run properly, and most of all, only opengl 4.1 is supported by this driver, and the driver on amd website doesn't work too I had to doawnload the driver on the asus website.

And I haven't the bases to code a simple driver by myself. (Even if I have the technical specifications of the r600 GPU here, as far as I don't have a simple tutorial to simply run a basic shader to display something without passing by opencl which is not supported by my GPU)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.