Bug 105440 - GEN7: rendering issue on citra
Summary: GEN7: rendering issue on citra
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-11 21:12 UTC by Markus Wick
Modified: 2018-04-04 05:23 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Shader dump without any changes (608.59 KB, text/plain)
2018-03-19 16:33 UTC, vadym
Details
Shader dump with the patch (613.41 KB, text/plain)
2018-03-19 16:33 UTC, vadym
Details
Proposed patch (1.37 KB, patch)
2018-03-23 10:11 UTC, vadym
Details | Splinter Review

Description Markus Wick 2018-03-11 21:12:41 UTC
There is a wrong rendering on citra-emu on ivy bridge and haswell. llvmpipe, skylake and kabylake are verified to not be affected. I've tested 17.3.6 and git master.

Broken screenshot: http://markus.members.selfnet.de/dolphin/citra0000079859.png
apitrace dump: http://markus.members.selfnet.de/dolphin/citra-qt.trace

The first broken draw call is frame 71, call 7343. I'm sorry for the vertex shader :/
Comment 1 vadym 2018-03-12 16:43:10 UTC
Issue is reproducible on Haswell:

OS: Ubuntu 16.04 LTS 64-bit
CPU: Intel® Core™ i5-4310M CPU @ 2.70GHz × 4
GPU: Intel® Haswell Mobile
mesa: OpenGL ES 3.2 Mesa 18.1.0-devel (git-fb5825e7ce)
kernel: 4.13.0-26-generic

Also I tried mesa back to 12.1.0 and issue is reproducible there.
Comment 2 vadym 2018-03-14 13:33:47 UTC
Bisected:

2458ea95c5676807a064f24ec720f12506975402 is the first bad commit
commit 2458ea95c5676807a064f24ec720f12506975402
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Wed Sep 9 14:40:06 2015 -0700

    nir/lower_vec_to_movs: Coalesce movs on-the-fly when possible
    
    The old pass blindly inserted a bunch of moves into the shader with no
    concern for whether or not it was really needed.  This adds code to try and
    coalesce into the destination of the instruction providing the value.
    
    Shader-db results for vec4 shaders on Haswell:
    
       total instructions in shared programs: 1754420 -> 1747753 (-0.38%)
       instructions in affected programs:     231230 -> 224563 (-2.88%)
       helped:                                1017
       HURT:                                  2
    
    This approach is heavily based on a different patch by Eduardo Lima Mitev
    <elima@igalia.com>.  Eduardo's patch did this in a separate pass as opposed
    to integrating it into nir_lower_vec_to_movs.
    
    Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

Issue is related to optimizations in try_coalesce() function. With the simple change bug is no longer reproducible:

diff --git a/src/compiler/nir/nir_lower_vec_to_movs.c b/src/compiler/nir/nir_lower_vec_to_movs.c
index 711ddd3..73efa0e 100644
--- a/src/compiler/nir/nir_lower_vec_to_movs.c
+++ b/src/compiler/nir/nir_lower_vec_to_movs.c
@@ -263,8 +263,8 @@ lower_vec_to_movs_block(nir_block *block, nir_function_impl *impl)
          if (!(vec->dest.write_mask & (1 << i)))
             continue;
 
-         if (!(finished_write_mask & (1 << i)))
-            finished_write_mask |= try_coalesce(vec, i);
+         //if (!(finished_write_mask & (1 << i)))
+         //   finished_write_mask |= try_coalesce(vec, i);
 
          if (!(finished_write_mask & (1 << i)))
             finished_write_mask |= insert_mov(vec, i, shader);

Below are the results of Shader-db on Haswell with this patch:

total instructions in shared programs: 241498 -> 241599 (0.04%)
instructions in affected programs: 25728 -> 25829 (0.39%)
helped: 14
HURT: 29

total cycles in shared programs: 142924052 -> 143021242 (0.07%)
cycles in affected programs: 85446208 -> 85543398 (0.11%)
helped: 16
HURT: 64
Comment 3 vadym 2018-03-14 15:01:49 UTC
I also run the test with INTEL_DEBUG=vs option with the patch (comment 2) and without it and got following result:

@@ -76,20 +76,16 @@ impl main {
        vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */)
        vec4 32 ssa_2 = intrinsic load_input (ssa_1) () (0, 0) /* base=0 */ /* component=0 */   /* in_0 */
        vec4 32 ssa_3 = intrinsic load_uniform (ssa_1) () (0, 64) /* base=0 */ /* range=64 */   /* parameters */
-       vec4 32 ssa_4 = fdot_replicated4 ssa_2, ssa_3
+       r4.x = fdot_replicated4 ssa_2, ssa_3
        vec1 32 ssa_5 = load_const (0x00000010 /* 0.000000 */)
        vec4 32 ssa_6 = intrinsic load_uniform (ssa_5) () (0, 64) /* base=0 */ /* range=64 */   /* parameters */
-       vec4 32 ssa_7 = fdot_replicated4 ssa_2, ssa_6
+       r4.y = fdot_replicated4 ssa_2, ssa_6
        vec1 32 ssa_8 = load_const (0x00000020 /* 0.000000 */)
        vec4 32 ssa_9 = intrinsic load_uniform (ssa_8) () (0, 64) /* base=0 */ /* range=64 */   /* parameters */
-       vec4 32 ssa_10 = fdot_replicated4 ssa_2, ssa_9
+       r4.z = fdot_replicated4 ssa_2, ssa_9
        vec1 32 ssa_11 = load_const (0x00000030 /* 0.000000 */)
        vec4 32 ssa_12 = intrinsic load_uniform (ssa_11) () (0, 64) /* base=0 */ /* range=64 */ /* parameters */
-       vec4 32 ssa_13 = fdot_replicated4 ssa_2, ssa_12
-       r4.x = imov ssa_4.x
-       r4.y = imov ssa_7.x
-       r4.z = imov ssa_10.x
-       r4.w = imov ssa_13.x
+       r4.w = fdot_replicated4 ssa_2, ssa_12
        vec4 32 ssa_15 = intrinsic load_input (ssa_1) () (1, 0) /* base=1 */ /* component=0 */
        vec4 32 ssa_16 = intrinsic load_input (ssa_1) () (2, 0) /* base=2 */ /* component=0 */  /* in_2 */
        r5.x = imov ssa_16.x
Comment 4 Jason Ekstrand 2018-03-19 15:53:52 UTC
The NIR change looks fine to me.  Can you attach full INTEL_DEBUG=vs dumps with the back-end IR as well as the NIR for both with and without the patch?
Comment 5 vadym 2018-03-19 16:33:09 UTC
Created attachment 138205 [details]
Shader dump without any changes
Comment 6 vadym 2018-03-19 16:33:44 UTC
Created attachment 138206 [details]
Shader dump with the patch
Comment 7 vadym 2018-03-19 16:35:52 UTC
Hi, Jason. 

Dumps are attached. Can you please look at it ?
Comment 8 Jason Ekstrand 2018-03-20 05:32:36 UTC
In the two dumps you provided, there's quite a few differences.  It would help if we could narrow it down a bit and figure out which vertex shader is causing the problem.  Hopefully, it's one of the small ones. :-)

I recommend putting a little check in try_coalesce in nir_lower_vec_to_movs.c to look at shader->info.name return early for one shader at a time such as:

if (strcmp(shader->info.name, "GLSL8") == 0)
   return;

That way we can narrow it down to the exact shader that's causing problems.  From there, sometimes it's useful to narrow it down further with something like

if (strcmp(shader->info.name, "GLSL8") == 0) {
   static int i = 0;
   i++;
   if (i > 8 && i < 20)
      return;
}

Until you get down to the smallest change you can manage which still reproduces the bug.
Comment 9 vadym 2018-03-20 12:55:53 UTC
(In reply to Jason Ekstrand from comment #8)
> In the two dumps you provided, there's quite a few differences.  It would
> help if we could narrow it down a bit and figure out which vertex shader is
> causing the problem.  Hopefully, it's one of the small ones. :-)
> 
> I recommend putting a little check in try_coalesce in
> nir_lower_vec_to_movs.c to look at shader->info.name return early for one
> shader at a time such as:
> 
> if (strcmp(shader->info.name, "GLSL8") == 0)
>    return;
> 
> That way we can narrow it down to the exact shader that's causing problems. 
> From there, sometimes it's useful to narrow it down further with something
> like
> 
> if (strcmp(shader->info.name, "GLSL8") == 0) {
>    static int i = 0;
>    i++;
>    if (i > 8 && i < 20)
>       return;
> }
> 
> Until you get down to the smallest change you can manage which still
> reproduces the bug.

Thanks for the hint! Looks like issue is in GLSL18 shader.
Comment 10 vadym 2018-03-23 10:11:27 UTC
Created attachment 138307 [details] [review]
Proposed patch
Comment 11 vadym 2018-03-23 12:54:05 UTC
Found out that issue is presented in GLSL18 shader and it is related to mmov optimization for fadd command. Disabling optimization when src and dest are the same register fixing the issue.
Comment 12 Jason Ekstrand 2018-04-04 05:23:00 UTC
This is fixed with the following commit in master:

commit 800df942eadc5356840f5cbc2ceaa8a65c01ee91
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Fri Mar 23 11:05:04 2018 -0700

    nir/lower_vec_to_movs: Only coalesce if the vec had a SSA destination
    
    Otherwise we may end up trying to coalesce in a case such as
    
    ssa_1 = fadd r1, r2
    r3.x = fneg(r2);
    r3 = vec4(ssa_1, ssa_1.y, ...)
    
    and that would cause us to move the writes to r3 from the vec to the
    fadd which would re-order them with respect to the write from the fneg.
    In order to solve this, we just don't coalesce if the destination of the
    vec is not SSA.  We could try to get clever and still coalesce if there
    are no writes to the destination of the vec between the vec and the ALU
    source.  However, since registers only come from phi webs and indirects,
    the chances of having a vec with a register destination that is actually
    coalescable into its source is very slim.
    
    Shader-db results on Haswell:
    
        total instructions in shared programs: 13657906 -> 13659101 (<.01%)
        instructions in affected programs: 149291 -> 150486 (0.80%)
        helped: 0
        HURT: 592
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105440
    Fixes: 2458ea95c56 "nir/lower_vec_to_movs: Coalesce movs on-the-fly when possible"
    Reported-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
    Tested-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
    Reviewed-by: Matt Turner <mattst88@gmail.com>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.