Bug 99511 - Compute Shader can't deal with Depth Buffers correctly
Summary: Compute Shader can't deal with Depth Buffers correctly
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-24 05:51 UTC by Matias N. Goldberg
Modified: 2017-02-11 22:40 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Sample repro. Also contains GCN generated ISA for the Compute Shader (146.10 KB, application/x-bzip)
2017-01-24 05:51 UTC, Matias N. Goldberg
Details

Description Matias N. Goldberg 2017-01-24 05:51:07 UTC
Created attachment 129119 [details]
Sample repro. Also contains GCN generated ISA for the Compute Shader

OS: Linux Ubuntu 16.04
Kernel 4.7.3
AMD Radeon HD 7770 1GB
Mesa from git 0f8afde7baf2b4764c3832387607021f2d318f6e

After discovering a Mesa bug in my own app; I've managed to isolate it in a repro.

The demo does the following:
1. Render a triangle with a specific depth pattern to an 8x8 FBO depth texture (also outputs colour but this will be ignored). We use gl_FragDepth to achieve this.
2. Use a compute shader to copy the 8x8 depth texture to a GL_R32F. The CS copies 4 pixels (2x2 blocks) per thread (there will a few out of bound read and writes that per spec should be handled correctly by GLSL). It basically acts like a memcpy.
3. Render a triangle to the RenderWindow sampling from that GL_R32F we just wrote to.

On Windows, it works as expected and produces the following output: http://imgur.com/MLWzweG
You can see the 8x8 grid pattern, (the sampling causes the tiling).

However on Mesa, I only get a black triangle.

The sample works as intended if we try to copy colour. I did not research whether the problem is the GL_R32F destination, or that the source is a depth texture rather than a colour texture.

The sample should be easy to build and run.

A few tips:
1. There is a "#if 1" in main.cpp. Flip it to 0 so that that step 3 will sample directly from the depth buffer instead of using the compute shader. It will produce the desired output (just for reference), though it's grey instead of red (because of GL specs).

2. Inside the "#if 1", changing instances of texName[1] & dstTexName[1] for texName[0] & dstTexName[0] will cause the demo to use the colour texture instead of the depth; which does work as expected.

Oddities:
Comparing Mesa's generated ISA vs CodeXL makes no sense.

CodeXL generates something like this:
image_load v[X:X], v[0:3], s[X:X] dmask:0xf
image_load v[X:X], v[4:7], s[X:X] dmask:0xf
image_load v[X:X], v[8:11], s[X:X] dmask:0xf
image_load v[X:X], v[12:15], s[X:X] dmask:0xf

image_store v[X:X], v[0:3], s[X:X] dmask:0xf unorm glc
image_store v[X:X], v[4:7], s[X:X] dmask:0xf unorm glc
image_store v[X:X], v[8:11], s[X:X] dmask:0xf unorm glc
image_store v[X:X], v[12:15], s[X:X] dmask:0xf unorm glc

That is, both loads and stores reuse the second arguments and they match.

However Mesa generates:
image_load_mip v[X:X], v[0:3], s[X:X] dmask:0xf
image_load_mip v[X:X], v[4:7], s[X:X] dmask:0xf
image_load_mip v[X:X], v[8:11], s[X:X] dmask:0xf
image_load_mip v[X:X], v[12:15], s[X:X] dmask:0xf

image_store v[X:X], v[0:1], s[X:X] dmask:0xf unorm glc
image_store v[X:X], v[34:35], s[X:X] dmask:0xf unorm glc
image_store v[X:X], v[11:12], s[X:X] dmask:0xf unorm glc
image_store v[X:X], v[21:22], s[X:X] dmask:0xf unorm glc

That is, the vectors used for the image_store look totally random; and it stands out that subtracting these ranges is 1 (i.e. Mesa's 35-34 = 1) when CodeXL's is always 3. Either the dump is decoding the instructions incorrectly, the ISA is wrong, or this is a different but wasteful way to do it correctly (???).
Also Mesa uses image_load_mip whereas CodeXL prefers image_store.

Happy bug hunting
Comment 1 Matias N. Goldberg 2017-02-11 22:40:18 UTC
I'm closing this bug as it is fixed in latest git + LLVM 4.0

Additionally, the sample I uploaded contained a bug:
Line 276:
glBindImageTexture( 0, dstTexName[1], 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA8 );

Should have been:
glBindImageTexture( 0, dstTexName[1], 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_R32F );


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.