Bug 99532

Summary: Compute shader doesn't give right result under some circumstances
Product: Mesa Reporter: Boyan Ding <stu_dby>
Component: Drivers/DRI/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Nouveau Project <nouveau>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: apitrace of my program
Wrong rendering (histogram blank)
Correct rendering of the last frame
The compute shader binary and assembly compiled with nouveau
The compute shader binary and assembly compiled with proprietary driver
A minimal reproducer
Correct shader image (i965)
Incorrect shader image (nouveau)

Description Boyan Ding 2017-01-25 11:35:33 UTC
Created attachment 129135 [details]
apitrace of my program

When I'm doing a course project in my compute graphics class, I found my program didn't fully work on nouveau while it was working fine on intel and nvidia proprietary drivers. It is a program using deferred shading with some shadow map techniques, and it has an option to draw a "histogram" of depth distribution sampled by a compute shader. I found histogram was showing wrong results (totally blank) on nouveau.

After looking into the problem for a while, I happened to find that if I didn't render the histogram (i.e. didn't use compute shader to sample) on the first frame, the histogram would go wrong, otherwise, it would be okay.

This problem can be reproduced with the attached apitrace file, recorded with simplified program and trimmed to contain only 4 frames. The first 2 frames are mainly initialization and the third frame renders the scene from one angle without generating histogram. The last frame renders the scene from another angle and generates a histogram at bottom-right edge, which is completely blank on nouveau. However, if the trace is further trimmed to contain frame 0,1,3, the histogram will render correctly. (Results of the two will be attached below)

I'm willing provide assistance if needed.
Comment 1 Boyan Ding 2017-01-25 11:36:04 UTC
Created attachment 129136 [details]
Wrong rendering (histogram blank)
Comment 2 Boyan Ding 2017-01-25 11:36:33 UTC
Created attachment 129137 [details]
Correct rendering of the last frame
Comment 3 Ilia Mirkin 2017-01-25 13:18:01 UTC
On GM107 it actually appears to work correctly. However on GK208 I got a blank histogram on the first run, and then a semi-random one on subsequent runs. (And now it's blank again.)

What GPU are you using?
Comment 4 Boyan Ding 2017-01-25 13:24:17 UTC
(In reply to Ilia Mirkin from comment #3)
> On GM107 it actually appears to work correctly. However on GK208 I got a
> blank histogram on the first run, and then a semi-random one on subsequent
> runs. (And now it's blank again.)
> 
> What GPU are you using?

Mine is a GT740M (GK208).
Comment 5 Ilia Mirkin 2017-01-25 13:38:19 UTC
Can you figure out which compute invocation messes up? Looks like you have 2 compute shaders in there, although my bet is on the one that does all the atomic ops on shared memory - that stuff changed between Kepler and Maxwell, which could explain why it works on GM107.
Comment 6 Ilia Mirkin 2017-01-25 13:41:50 UTC
Oh wait - while there are 2 compute shaders, you only use one of them (the one with all the shared atomics). Also need to double-check that it's not an earlier draw messing things up (e.g. that the compute shader is getting the proper inputs). But my bet is on an issue in the compute shader.
Comment 7 Boyan Ding 2017-01-25 13:44:35 UTC
(In reply to Ilia Mirkin from comment #6)
> Oh wait - while there are 2 compute shaders, you only use one of them (the
> one with all the shared atomics). Also need to double-check that it's not an
> earlier draw messing things up (e.g. that the compute shader is getting the
> proper inputs). But my bet is on an issue in the compute shader.

I agree with you, since the same compute shader input was also used in subsequent rendering. If that were wrong, the following render would also be incorrect.
Comment 8 Samuel Pitoiset 2017-01-25 13:47:56 UTC
A missing OP_CCTL maybe? I have something which could help (but it's definitely not the right thing to do).

The possible fix is here:
https://cgit.freedesktop.org/~hakzsam/mesa/commit/?h=atomic_fixes&id=d3a1158acde83d9d7a2a58e431623dd89e59ca65
Comment 9 Samuel Pitoiset 2017-01-25 13:51:25 UTC
That might explain the failure on kepler (and presumably fermi) because maxwell doesn't need that.
Comment 10 Ilia Mirkin 2017-01-25 14:07:09 UTC
(In reply to Samuel Pitoiset from comment #8)
> A missing OP_CCTL maybe? I have something which could help (but it's
> definitely not the right thing to do).
> 
> The possible fix is here:
> https://cgit.freedesktop.org/~hakzsam/mesa/commit/
> ?h=atomic_fixes&id=d3a1158acde83d9d7a2a58e431623dd89e59ca65

Nope. It does change the rendering very slightly - there's now one bar consistently in the histogram - still incorrect.
Comment 11 Boyan Ding 2017-01-26 06:09:57 UTC
Created attachment 129157 [details]
The compute shader binary and assembly compiled with nouveau

I took some time today to capture the compiled shader binaries under nouveau and proprietary driver. It seems there are differences when handling atomic operations. The code generated by nvidia driver (will be attached below) is rather huge, but seems mostly repetitions of several small sections of code.

I don't know about nvidia isa, but hope it helps.
Comment 12 Boyan Ding 2017-01-26 06:10:27 UTC
Created attachment 129158 [details]
The compute shader binary and assembly compiled with proprietary driver
Comment 13 Boyan Ding 2017-02-06 12:24:50 UTC
Now I doubt if the problem really lies in the atomic operations, even in compiler at all. I changed the compute shader a little bit, only letting thread (0, 0) to accumulate its result to the ssbo (the dividing factor in histogram fragment shader changed from 20000 to 20). This bypasses atomic operation and shared memory, but the same problem persists.

I suspect that the input state of compute shader is somehow invalid, and that explains why removing one frame of the trace will magically eliminate the problem.
Comment 14 Boyan Ding 2017-02-07 12:01:11 UTC
Created attachment 129380 [details]
A minimal reproducer

I wrote another much simpler program today to find the real cause of this bug. Turns out that compute shader is unable to sample from depth texture.

The attached program first renders a shape to a FBO. Then a compute shader is used to copy the color and depth textures attached to the former FBO to a shader image, which is drawn to the screen. The depth part is different between nouveau and other drivers.

Compile the program with gcc test.c -lglfw -lGLEW -lGL
Comment 15 Boyan Ding 2017-02-07 12:02:19 UTC
Created attachment 129381 [details]
Correct shader image (i965)
Comment 16 Boyan Ding 2017-02-07 12:03:42 UTC
Created attachment 129382 [details]
Incorrect shader image (nouveau)
Comment 17 Ilia Mirkin 2017-02-12 18:35:57 UTC
A few observations:

- Pretty sure you want GL_TEXTURE_FETCH_BARRIER_BIT. Not that this matters for nvc0.
- In addition to this always working on GM107, it *sometimes* works on my GK208. Something about card state? Or something in compute insufficiently initialized?
- I tried setting the DEPTH_TEXTURE bit in the TIC with no effect.
- The GM107 works just as effectively with both old- and new-style TIC formats.
- Throwing glFlush or glTextureBarrier into the mix made no difference, so it's not a cache issue.
- Switching to DEPTH24 and DEPTH32F made no difference.
Comment 18 Ilia Mirkin 2017-02-13 01:31:19 UTC
Also likely relevant is the fact that all these fail, even on GM107. But the ones for other shader stages do pass. We're clearly missing some setting of some kind.

dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.compute.sampler2dshadow,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.compute.samplercubeshadow,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.compute.sampler2darrayshadow,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.compute.isampler2d,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_expression.compute.sampler2dshadow,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_expression.compute.samplercubeshadow,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_expression.compute.sampler2darrayshadow,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.uniform.compute.sampler2dshadow,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.uniform.compute.samplercubeshadow,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.uniform.compute.sampler2darrayshadow,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.dynamically_uniform.compute.sampler2dshadow,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.dynamically_uniform.compute.samplercubeshadow,Fail
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.dynamically_uniform.compute.sampler2darrayshadow,Fail
Comment 19 Boyan Ding 2017-02-14 02:14:11 UTC
commit 956556b3c30ce3d38d0af795f9383df3bc2cf8a2
Author: Ilia Mirkin <imirkin@alum.mit.edu>
Date:   Mon Feb 13 11:14:51 2017 -0500

    nvc0: disable linked tsc mode in compute launch descriptor
    
    Empirically, this makes things work. Presumably this was originally
    copied from the blob, which does make use of linked tsc mode.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99532
    Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
    Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
    Cc: mesa-stable@lists.freedesktop.org

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.