Bug 101648

Summary: Adding a single-iteration do-while loop in a GLSL shader leads to a different image
Product: Mesa Reporter: Hugues Evrard <h.evrard>
Component: Drivers/DRI/i965Assignee: Francisco Jerez <currojerez>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: alastair.donaldson, h.evrard
Version: 17.1   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: attachment-21457-0.html

Description Hugues Evrard 2017-06-29 14:26:23 UTC
https://github.com/mc-imperial/shader-compiler-bugs/issues/75

Steps to Reproduce:
-------------------------------

0. This is reproducable with Mesa version 17.1.3, on Intel intergated GPUs.

1. Obtain and build the latest release of get-image-glsl, a simple tool
   that creates a .png image from a fragment shader. We need the
   `get_image_glfw` executable.
   https://github.com/graphicsfuzz/get-image-glsl

2. From a terminal, execute (this must be done in the directory where
   both .frag and .json files are):

   `/path/to/get_image_glfw original.frag`

   This will create output.png, which should look like original.png

3. From a terminal, execute (same remark as above):

   `/path/to/get_image_glfw variant.frag`

   This will create output.png, which should look like variant.png

Expected Results:
-------------------------------

Both shaders should render an image that looks like original.png.  This
is because the only difference between the .frag files is a 'do-while'
loop that should execute only once:

```
$ diff -u original.frag variant.frag
--- original.frag	2017-06-28 11:04:21.278754777 +0100
+++ variant.frag	2017-06-28 11:23:16.925591708 +0100
@@ -42,10 +42,12 @@
                 {
                     s += _FLOAT_CONST[2];
                 }
+            do {
             if(float(i) >= limit)
                 {
                     return s;
                 }
+            } while(gl_FragCoord.x < 0.0);
         }
     return s;
 }
```

Actual Results:
-------------------------------
The shaders render completely different images, which they should not.

Mesa code coverage:
-------------------------------
Unzip the `coverage.html.zip` file to get a web page that present the
comparative coverage of Mesa source code between the execution of the
original and the variant shader.

Additional Information:
-------------------------------
Some info on the system where the bug was found:

CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
GPU: "2nd Generation Core Processor Family Integrated Graphics Controller"
OS: Linux 4.8.0-56-generic #61~16.04.1-Ubuntu SMP
Mesa version: 17.1.3
Comment 1 Ilia Mirkin 2017-06-29 14:38:50 UTC
Repro'd with:

OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2) 
OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.1.3

as well as current HEAD (ccfac28835).

Works fine with llvmpipe, softpipe, and swr from the same mesa versions, so the issue is most likely somewhere in i965.
Comment 2 Ilia Mirkin 2017-06-29 18:58:47 UTC
[Just some musings, figured I'd take a closer look... feel free to ignore.]

GLSL seems fine (note use of return_flag):

        (loop (
          (if (expression bool >= (expression float i2f (var_ref i@2) ) (swiz x (var_ref gl_FragCoord) )) (
            (assign  (x) (var_ref return_value)  (var_ref s) ) 
            (assign  (x) (var_ref return_flag)  (constant bool (1)) ) 
            break
          )
          ())

          (if (expression bool >= (swiz x (var_ref gl_FragCoord) )(constant float (0.000000)) ) (
            break
          )
          ())

        ))

        (if (var_ref return_flag) (
          break
        )

And the NIR seems OK:

(return_flag)
                vec1 32 ssa_39 = phi block_9: ssa_7, block_12: ssa_19
                /* succs: block_16 block_17 */
                if ssa_39 {
                        block block_16:
                        /* preds: block_15 */
                        break

And ssa_7 is 0xffffffff as expected (from block_9 which is the one where return_flag is set to 1)

ssa_19 is defined in the outer loop:

                vec1 32 ssa_19 = phi block_0: ssa_1, block_18: ssa_39

Where ssa_1 == 0, and ssa_39 is defined above.

And the register-ized NIR seems fine too at first glance, at least as far as this return business is concerned.

However looking at the SIMD8 shader, the control flow seems off. This is where I get a lot weaker, but I believe that I've found the register for the return_flag, but I don't see any control flow that depends on it. There's also only 1 while loop, rather than nested (it seems).
Comment 3 Ilia Mirkin 2017-06-30 16:51:56 UTC
I guess this makes sense, but commenting out the opt_predicated_break pass in brw_fs.cpp:fs_visitor::optimize fixes it (at least this specific example).
Comment 4 Matt Turner 2017-06-30 18:08:49 UTC
opt_copy_propagate() is at fault. The additional control flow and the predicated break pass somehow leave the IR in a state opt_copy_propagate() mishandles.
Comment 5 Hugues Evrard 2017-07-03 13:22:36 UTC
Could the comparative code coverage between the different runs help to locate the bug root cause?

The report covergage.html is available in the bug file archive, it is also temporary directly accessible at this URL -- yet it is a 35MB web page, so it may freeze your browser for a some seconds before being displayed:
https://www.doc.ic.ac.uk/~hevrard/hidden_glfuzz/coverage_mesa-17-1-3_18May_130_noliterals_colorgrid_modulo_inv_variant_2/index.html

This code coverage compares how many times each line of Mesa source code is executed when rendering each shader. It highlights when line execution counters are different (yellow), and especially when some lines are executed only during the rendering of one of the two shaders (red). Only files with different counters are reported, and they are ranked to display the source files where there is the biggest difference first. The line counters indicates first the number of hits when rendering the *original* shader, then the ones for the *variant* shader.

We can see for instance that the top three source files where line execution differs are:
src/intel/compiler/brw_cfg.cpp
src/intel/brw_predicated_break.cpp
src/compiler/nir_loop_analyze.c

My lack of knowledge of Mesa internals makes it harder for me to feel what's going on, but for you Mesa developers, the red lines in these files may hint at the bug root cause?
Comment 6 Ilia Mirkin 2017-10-26 16:31:42 UTC
Retested on master: commit 109de3049dda6be2a5a3910f777feea0bbf9ce92.

Issue persists with SKL.
Comment 7 Alastair Donaldson 2017-10-26 16:54:39 UTC
Created attachment 135074 [details]
attachment-21457-0.html

I am attending SPLASH this week in Vancouver, and will only have occasional access to email, so my responses may be slower than usual.

Best wishes

Ally
Comment 8 Andriy Khulap 2018-04-02 09:51:19 UTC
I am able to reproduce this issue on Skylake GT2. But it looks like it is already fixed. Tested the fallowing mesa versions:
    Mesa 17.3.6 (git-b3e5a3f35b) - FAIL
    Mesa 17.3.7 (git-8a51f3857c) - FAIL
    mesa-18.0.0-rc1 (e91e68d6a87a14) - OK
    Mesa 18.0.0 (git-dceb1ce807) - OK
    Mesa 18.1.0-devel (git-e7fc18097e) - OK

Bisected and found the fix (tested it by reverting from the latest mesa master):

commit 4d1959e69328cf0d59f0ec7aeea5a2b704ef0c5f
Author: Francisco Jerez <currojerez@riseup.net>
Date:   Fri Oct 13 17:52:00 2017 -0700

    intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.
    
    This addresses a long-standing back-end compiler bug that could lead
    to cross-channel data corruption in loops executed non-uniformly.  In
    some cases live variables extending through a loop divergence point
    (e.g. a non-uniform break) into a convergence point (e.g. the end of
    the loop) wouldn't be considered live along all physical control flow
    paths the SIMD thread could possibly have taken in between due to some
    channels remaining in the loop for additional iterations.
    
    This patch fixes the problem by extending the CFG with physical edges
    that don't exist in the idealized non-vectorized program, but
    represent valid control flow paths the SIMD EU may take due to the
    divergence of logical threads.  This makes sense because the i965 IR
    is explicitly SIMD, and it's not uncommon for instructions to have an
    influence on neighboring channels (e.g. a force_writemask_all header
    setup), so the behavior of the SIMD thread as a whole needs to be
    considered.
    
    No changes in shader-db.
    
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Comment 9 Ilia Mirkin 2018-06-05 13:38:59 UTC
I've re-tested this on SKL-GT2 and can confirm that this specific repro no longer triggers this bug. The commit that was reverse-bisected to certainly seems to make sense given what I saw when I was originally looking as well as what Matt said.

Marking this as FIXED. Thanks Curro!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.