Bug 99919 - Wrong and unstable image rendering from GLSL fragment shaders
Summary: Wrong and unstable image rendering from GLSL fragment shaders
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 17.0
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL: https://github.com/mc-imperial/shader...
Depends on:
Reported: 2017-02-23 11:04 UTC by Hugues Evrard
Modified: 2017-03-21 10:11 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:

Archive with pair of shaders and their expected renderings (13.74 KB, application/zip)
2017-02-23 11:04 UTC, Hugues Evrard
apitrace of variant shader (71.96 KB, application/octet-stream)
2017-02-24 19:47 UTC, Ilia Mirkin

Description Hugues Evrard 2017-02-23 11:04:19 UTC
Created attachment 129862 [details]
Archive with pair of shaders and their expected renderings

Steps to Reproduce:

1. Obtain and build the latest release of get-image, a simple tool that
   creates a .png image from a fragment shader

2. From a terminal, execute:

   /path/to/get_image/linux_out/install/bin/get_image original.frag

   This will create output.png, which should look like original.png

3. From a terminal, execute:

   /path/to/get_image/linux_out/install/bin/get_image variant.frag

   This will create output.png, which should look like variant.png

Expected Results:

Both shaders should render an image that looks like original.png.  This
is because the only difference between the .frag files is:

- a for loop of 1 iteration

- inside this loop, a "if" which condition is sure to be false at
  runtime (we control the value of injectionSwitch to ensure this)

Actual Results:
The shaders render completely different images, which they should not.
Moreover, the variant.frag file rendering seems unstable, i.e. it does
not always render the same image, but it always renders an image
different from the expected one.
Comment 1 Hugues Evrard 2017-02-23 16:52:45 UTC
Some info on the system where the bug was found:

CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
GPU: "2nd Generation Core Processor Family Integrated Graphics Controller"

OS: Arch Linux: 4.9.6-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux

Mesa: compiled from 17.0.0 source downloaded from website
GCC version: gcc (GCC) 6.3.1 20170109
Comment 2 Ilia Mirkin 2017-02-24 19:47:55 UTC
Created attachment 129905 [details]
apitrace of variant shader

For convenience, attaching a trace of the variant.

Very surprising that this breaks... the diff between shaders is

--- large-v100-webgl-9803840f5a05935c_inv_variant_77/original.frag      2017-02-23 05:18:29.000000000 -0500
+++ large-v100-webgl-9803840f5a05935c_inv_variant_77/variant.frag       2017-02-23 05:18:58.000000000 -0500
@@ -67,12 +70,18 @@
+            for(int i = 0; i < 1; i++) {
             if((s2.x < s1.x || s1.x <= 0.0) && s2.x > 0.0)
                     hd.c = ca.y;
+                    if(injectionSwitch.x > injectionSwitch.y)
+                        {
+                            break;
+                        }
                     hd.d = s2.x;
                     hd.n = s2.yzw;
+            }
     return hd;

(and injectionSwitch is set to 0,1 so that should never be true)

Tested on a SKL GT2, so it's not just some weird SNB problem.
Comment 3 Matt Turner 2017-02-24 19:56:04 UTC
Thanks for the report!

I can reproduce on Haswell, in addition to your report on Sandybridge. I expect this means it will apply to all generations supported by the i965 driver.

The first thing that pops out at me is in the NIR for the variant, we have:

        vec1 32 ssa_0 = undefined
        vec1 32 ssa_1 = undefined
        vec1 32 ssa_2 = undefined
                loop {
                        block block_6:
                        /* preds: block_5 block_15 */
                        vec1 32 ssa_118 = phi block_5: ssa_0, block_15: ssa_130
                        vec1 32 ssa_119 = phi block_5: ssa_1, block_15: ssa_131
                        vec1 32 ssa_120 = phi block_5: ssa_2, block_15: ssa_132

whereas there are no undefined SSA values in the original. Since the undefined values are flowing into the loop from the entrance, I believe this indeed indicates a bug in our control flow analysis in NIR.
Comment 4 Ilia Mirkin 2017-02-24 20:41:32 UTC
As an aside, for gallium-based drivers this seems to generate a checkerboard. (Tested with at least llvmpipe and softpipe.) Both the original and the variant.
Comment 5 Grazvydas Ignotas 2017-02-25 00:34:02 UTC
radeonsi renders original.png, so reassigning.
I can confirm softpipe renders a checkerboard though.
Comment 6 Roland Scheidegger 2017-02-25 00:53:38 UTC
Interesting it would render an incorrect checkerboard both with softpipe and llvmpipe (as these typically don't share any bugs wrt shader execution which aren't shared elsewhere too).
Is the shader NaN-safe?
Comment 7 Roland Scheidegger 2017-02-25 14:34:05 UTC
I think there's a potential issue in the original shader itself.
The opU function has an out parameter HitData hd, which should probably be inout. Otherwise depending on some conditions the value returned will be undefined (as it gets initialized before in the calling function).
Not sure if these conditions are actually hittable and matter here - I've looked at the tgsi code and indeed the tgsi code was different with this change, but it made no difference to the checkerboard pattern being rendered with llvmpipe.
Comment 8 Roland Scheidegger 2017-02-27 02:49:39 UTC
This is not a bug, the shader has undefined output.
I've confirmed the path taken through the shader in the opU() function is always the else clause, then not satisfying the if there, so the returned HitData value is always undefined due to using out instead of inout for the hd parameter.
Therefore the path taken in render() (using the undefined trc.d value) is completely undefined, and of course such undefined results can change randomly if injecting code which seemingly should have no effect.

(As for the checkerboard pattern on llvmpipe/softpipe, I'm not sure but I've verified that with these drivers in the opU() function always the first if clause is taken - hence things are well defined but the result is different. Not sure why exactly the numbers in opU() are different, the shader could use float math which isn't quite stable given the inputs but whatever it's not connected to the shader injection and unless someone says otherwise I'm not convinced it's driver bugs neither.)
Comment 9 Matt Turner 2017-02-27 04:36:44 UTC
(In reply to Roland Scheidegger from comment #8)

Thanks a lot for the analysis. I agree with your assessment.


-HitData opU(out HitData hd, in vec4 s1, in vec4 s2, in vec2 ca)
+HitData opU(inout HitData hd, in vec4 s1, in vec4 s2, in vec2 ca)

indeed changes the shader program generated for original.frag (but not its output), and the same change to variant.frag "fixes" its output to match original.frag.
Comment 10 Hugues Evrard 2017-02-27 06:05:01 UTC
(In reply to Matt Turner from comment #9)
> [...], and the same change to variant.frag "fixes" its output to match
> original.frag.

Thanks for the analysis indeed!

This bug report comes from experiments on fuzzing shaders to chase bugs in graphics drivers. Unfortunately here the original shader that we fuzzed exposes undefined behavior in opU() as explained by Roland, we didn't spot that earlier.

I'll try to confirm the fix suggested by Matt on our setup, but I'm currently travelling so it may take a few days.
Comment 11 Hugues Evrard 2017-03-21 10:11:18 UTC
I confirme that changing the parameter to "inout" fixes the issue on our setup too. Thanks again for the analysis.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.