Bug 110953 - Adding a redundant single-iteration do-while loop causes different image to be rendered
Summary: Adding a redundant single-iteration do-while loop causes different image to b...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: glsl-compiler (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-20 17:18 UTC by Abel Briggs
Modified: 2019-06-25 18:41 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
piglit shader tests, images, diff (2.02 KB, application/zip)
2019-06-20 17:18 UTC, Abel Briggs
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Abel Briggs 2019-06-20 17:18:42 UTC
Created attachment 144602 [details]
piglit shader tests, images, diff

Similar bug reproduced with Intel i965 that was resolved as fixed: https://bugs.freedesktop.org/show_bug.cgi?id=100303

I'm unsure if the error reproduced with the enclosed shaders occur on AMD/Intel - please feel free to move this if it can't be reproduced.

The attached archive contains two shaders, a reference shader and a variant shader, that by construction should render the same image. On the build and PC specified below, however, these shaders render different images. Images of the shaders’ output are also supplied in the archive, as well as the minor diff between the source code of the two shaders.

The difference between the two shaders is that the variant wraps a single-iteration do-while loop around existing code.

46c46
<       do
---
>        if(doSwap)
48,53c48,50
<        if(doSwap)
<         {
<             float temp = data[i];
<             data[i] = data[j];
<             data[j] = temp;
<         }
---
>        float temp = data[i];
>        data[i] = data[j];
>        data[j] = temp;
55d51
<      while(false);

Steps to reproduce:
-------------------------------------------------------------------------------
1. Obtain and build piglit, the Mesa OpenGL test suite runner: https://gitlab.freedesktop.org/mesa/piglit
2. Download the attached archive.
3. From a terminal, execute the supplied tests with the piglit GLES3 shader runner: 
$ bin/shader_runner_gles3 reference.shader_test
$ bin/shader_runner_gles3 variant.shader_test

Expected results:
-------------------------------------------------------------------------------
Both images should produce an image like reference.png, and so both tests should pass. If both fail, it’s likely due to floating point errors that can affect the precise color values - if this is the case, please manually observe that the shader images have significant differences that are not limited to minor floating point variation.

Actual results:
-------------------------------------------------------------------------------
The variant produces a different image (and fails the test) even though it should run exactly the same as the reference shader. This is because the only difference between the reference and variant shaders is that the variant shader wraps a single-iteration do-while loop around existing code, which makes no semantic difference between the two shaders.

Build & PC specs:
-------------------------------------------------------------------------------
CPU: Intel Core i7-5820k 
GPU: nVIDIA GTX 970

OS: Ubuntu 19.04
libdrm: git-5db0f7692d1fdf05f9f6c0c02ffa5a5f4379c1f3 (most recent as of this writing)
Mesa: git-9c19d07b1cdcd22ced0f4e1c147e496b6ff5cf23 (most recent as of this writing)
Xf86-video-nouveau: 1.0.16
Linux kernel version: 5.0.0-16-generic

This bug was found with GraphicsFuzz: https://github.com/google/graphicsfuzz
Comment 1 Ian Romanick 2019-06-20 19:45:17 UTC
I'm not able to reproduce this bug on i965.  It looks like llvmpipe (from Fedora 30, Mesa 19.0.5) produces the variant.png image.  The other bug was in a GLSL IR pass that i965 doesn't use, so this one probably is too.  I'll take a look.
Comment 2 Ian Romanick 2019-06-20 19:53:13 UTC
The difference in the output of "./glsl_compiler --version 300 --dump-lir" for both shaders is below.  It looks like the loop in the second shader is (incorrectly) unrolled twice.

--- /tmp/before.txt	2019-06-20 12:48:58.425392405 -0700
+++ /tmp/after.txt	2019-06-20 12:49:05.983437986 -0700
@@ -72,6 +72,14 @@
           )
           ())
 
+          (if (var_ref checkSwap_retval) (
+            (declare (temporary ) float assignment_tmp@2)
+            (assign  (x) (var_ref assignment_tmp@2)  (array_ref (var_ref data) (var_ref i) ) ) 
+            (assign  (x) (array_ref (var_ref data) (var_ref i) )  (array_ref (var_ref data) (var_ref j) ) ) 
+            (assign  (x) (array_ref (var_ref data) (var_ref j) )  (var_ref assignment_tmp@2) ) 
+          )
+          ())
+
           (assign  (x) (var_ref j)  (expression int + (var_ref j) (constant int (1)) ) ) 
         ))
 
@@ -87,12 +95,12 @@
         (assign  (xyzw) (var_ref _GLF_color)  (var_ref vec_ctor) ) 
       )
       (
-        (declare (temporary ) vec4 vec_ctor@2)
-        (assign  (w) (var_ref vec_ctor@2)  (constant float (1.000000)) ) 
-        (assign  (x) (var_ref vec_ctor@2)  (expression float / (array_ref (var_ref data) (constant int (5)) ) (constant float (10.000000)) ) ) 
-        (assign  (y) (var_ref vec_ctor@2)  (expression float / (array_ref (var_ref data) (constant int (9)) ) (constant float (10.000000)) ) ) 
-        (assign  (z) (var_ref vec_ctor@2)  (expression float / (array_ref (var_ref data) (constant int (0)) ) (constant float (10.000000)) ) ) 
-        (assign  (xyzw) (var_ref _GLF_color)  (var_ref vec_ctor@2) ) 
+        (declare (temporary ) vec4 vec_ctor@3)
+        (assign  (w) (var_ref vec_ctor@3)  (constant float (1.000000)) ) 
+        (assign  (x) (var_ref vec_ctor@3)  (expression float / (array_ref (var_ref data) (constant int (5)) ) (constant float (10.000000)) ) ) 
+        (assign  (y) (var_ref vec_ctor@3)  (expression float / (array_ref (var_ref data) (constant int (9)) ) (constant float (10.000000)) ) ) 
+        (assign  (z) (var_ref vec_ctor@3)  (expression float / (array_ref (var_ref data) (constant int (0)) ) (constant float (10.000000)) ) ) 
+        (assign  (xyzw) (var_ref _GLF_color)  (var_ref vec_ctor@3) ) 
       ))
 
     ))
Comment 3 Ian Romanick 2019-06-20 22:11:27 UTC
Setting debug = true in do_common_optimization (src/compiler/glsl/glsl_parser_extras.cpp), it seems the bad code appears in do_dead_code_unlinked.
Comment 4 Ian Romanick 2019-06-20 22:19:44 UTC
(In reply to Ian Romanick from comment #3)
> Setting debug = true in do_common_optimization
> (src/compiler/glsl/glsl_parser_extras.cpp), it seems the bad code appears in
> do_dead_code_unlinked.

Scratch that.  It is unroll_loops, but unroll_loops doesn't log its progress the way the other optimization passes do.
Comment 5 Ian Romanick 2019-06-20 22:35:39 UTC
The problem is this bit of code loop_unroll_visitor::simple_unroll.

   if (limit_if != first_ir->as_if() || exit_branch_has_instructions)
      iterations++;

The check that the first thing in the loop is an if-statement expects that to only occur when the if-statement is a terminator for the loop (e.g., if (i > 6) break;).  This loop starts with an if-statement that is not a terminator, so the loop iteration count is incorrectly incremented.
Comment 6 Ian Romanick 2019-06-21 05:17:34 UTC
This MR should fix the bug:

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1152

And this MR has a minimal test case:

https://gitlab.freedesktop.org/mesa/piglit/merge_requests/88

I'm running both through Intel's CI on G33 hardware.  That very, very old hardware has OpenGL ES 2.0, but it does not use the NIR loop unrolling path.  It should hit this bug... I'll find out for sure in the morning.  I tried the test and the fix on my local machine using softpipe.

@abel: Can you test the fix on nouveau?
Comment 7 Abel Briggs 2019-06-21 14:38:32 UTC
(In reply to Ian Romanick from comment #6)
> This MR should fix the bug:
> 
> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1152
> 
> And this MR has a minimal test case:
> 
> https://gitlab.freedesktop.org/mesa/piglit/merge_requests/88
> 
> I'm running both through Intel's CI on G33 hardware.  That very, very old
> hardware has OpenGL ES 2.0, but it does not use the NIR loop unrolling path.
> It should hit this bug... I'll find out for sure in the morning.  I tried
> the test and the fix on my local machine using softpipe.
> 
> @abel: Can you test the fix on nouveau?

I pulled your repo and ran the shader, and I can confirm that the issue is fixed (the reference and variant now render the same image).
Comment 8 Ian Romanick 2019-06-25 18:41:38 UTC
Fixed by:

commit ee1c69faddb3624ace6548dafaff50549a031380
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Thu Jun 20 15:48:48 2019 -0700

    glsl: Don't increase the iteration count when there are no terminators
    
    Incrementing the iteration count was intended to fix an off-by-one error
    when the first terminator was superseded by a later terminator.  If
    there is no first terminator or later terminator, there is no off-by-one
    error.  Incrementing the loop count creates one.  This can be seen in
    loops like:
    
        do {
            if (something) {
                // No breaks or continues here.
            }
        } while (false);
    
    Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
    Tested-by: Abel Briggs <abelbriggs1@hotmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110953
    Fixes: 646621c66da ("glsl: make loop unrolling more like the nir unrolling path")


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.