Bug 111308 - [Regression, NIR, bisected] Black squares in Unigine Heaven via DXVK
Summary: [Regression, NIR, bisected] Black squares in Unigine Heaven via DXVK
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-06 15:53 UTC by Danylo
Modified: 2019-08-30 19:23 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
unigine_heaven_artifacts (1.06 MB, image/png)
2019-08-06 15:53 UTC, Danylo
Details
good_vs_bad_shaders_nir (5.40 KB, application/gzip)
2019-08-07 12:40 UTC, Danylo
Details

Description Danylo 2019-08-06 15:53:44 UTC
Created attachment 144958 [details]
unigine_heaven_artifacts

On latest master there are black squares artifacts when running Unigine Heaven through DXVK under wine.

It is bisected to:

96fcb3f95bdd53c8c1bdc243c95811acabd3f52c is the first bad commit
commit 96fcb3f95bdd53c8c1bdc243c95811acabd3f52c
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Thu Oct 11 14:21:42 2018 -0700

    nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements
    
    This just eliminates tautological / contradictory compares that are used
    for bcsel and other non-if-statement cases.  If-statements are not
    affected because removing flow control can cause the i965 instrution
    scheduler to create some very long live ranges resulting in unncessary
    spilling.  This causes some shaders to fall of a performance cliff.
    
    Since many small if-statements are already flattened to bcsel, this
    optimization covers more than 68% of the possible cases (2417 shaders
    helped for instructions on Skylake vs. 3554).
    
    v2: Reorder and add whitespace to make the relationship between the
    patterns more obvious.  Suggested by Caio.
    
    All Gen7+ platforms had similar results. (Ice Lake shown)
    total instructions in shared programs: 16333474 -> 16322028 (-0.07%)
    instructions in affected programs: 438559 -> 427113 (-2.61%)
    helped: 1765
    HURT: 0
    helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4
    helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82%
    95% mean confidence interval for instructions value: -6.87 -6.10
    95% mean confidence interval for instructions %-change: -4.30% -3.84%
    Instructions are helped.
    
    total cycles in shared programs: 367608554 -> 367511103 (-0.03%)
    cycles in affected programs: 8368829 -> 8271378 (-1.16%)
    helped: 1541
    HURT: 129
    helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39
    helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17%
    HURT stats (abs)   min: 1 max: 973 x̄: 42.25 x̃: 10
    HURT stats (rel)   min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60%
    95% mean confidence interval for cycles value: -64.90 -51.81
    95% mean confidence interval for cycles %-change: -3.89% -3.36%
    Cycles are helped.
    
    total spills in shared programs: 8867 -> 8868 (0.01%)
    spills in affected programs: 18 -> 19 (5.56%)
    helped: 0
    HURT: 1
    
    total fills in shared programs: 21900 -> 21903 (0.01%)
    fills in affected programs: 78 -> 81 (3.85%)
    helped: 0
    HURT: 1
    
    All Gen6 and earlier platforms had similar results. (Sandy Bridge shown)
    total instructions in shared programs: 10829877 -> 10829247 (<.01%)
    instructions in affected programs: 30240 -> 29610 (-2.08%)
    helped: 177
    HURT: 0
    helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3
    helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94%
    95% mean confidence interval for instructions value: -3.93 -3.18
    95% mean confidence interval for instructions %-change: -3.04% -2.32%
    Instructions are helped.
    
    total cycles in shared programs: 154036580 -> 154035437 (<.01%)
    cycles in affected programs: 352402 -> 351259 (-0.32%)
    helped: 96
    HURT: 28
    helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6
    helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46%
    HURT stats (abs)   min: 1 max: 117 x̄: 9.68 x̃: 4
    HURT stats (rel)   min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23%
    95% mean confidence interval for cycles value: -13.40 -5.03
    95% mean confidence interval for cycles %-change: -1.62% -0.53%
    Cycles are helped.
    
    Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

:040000 040000 30117ba78cfcc635e7974f4af6e4eff4f16eadf2 7e5be8503a115d930b1dc2591365e1ea0e1d4d5f M	src

The exact optimization which causes it is:

(('flt', 'b(is_not_positive)', 'a(is_gt_zero)'),      True),

Which itself looks correct. Maybe there is a bug in range analysis or/and some wild NaN sneaked in.
Comment 1 Ian Romanick 2019-08-06 18:02:11 UTC
Checking out 96fcb3f95bd and commenting out that optimization doesn't affect any of the Unigine Heaven shaders that I have in shader-db (which are all from the native Linux OpenGL version).  Do you think you could send me dumps with and without that optimization with the environment variable NIR_PRINT=true?  If I can see what changes in the NIR, that should shed some light on things.
Comment 2 Ian Romanick 2019-08-06 18:05:23 UTC
Another thing to try... does changing 'flt' to '~flt' make any difference?
Comment 3 Danylo 2019-08-07 12:40:02 UTC
Created attachment 144968 [details]
good_vs_bad_shaders_nir

> does changing 'flt' to '~flt' make any difference?
No

> Do you think you could send me dumps with and without that optimization with the environment variable NIR_PRINT=true?
I've attached the relevant parts of optimizations and the final NIR, the whole dump is 300 mb so I didn't attach it.

Vulkan renderdoc of frame with issue: https://mega.nz/#!sR8DGKwD!IHSQv6dWjk-YfyOnWZy36v-STBctmJbqGod19RPVDfg . Captured on Mesa master and HD 620. It is a capture of dx11 capture because the app is 32 bit so I was unable to directly capture Vulkan trace of it. Don't mind green artifacts in trace - they appear only in Vulkan trace.

The corruption starts from the call EID 2821, select "NaN/INF" overlay and look at the only output of this draw. NaNs would be highlighted as red.
Comment 4 Ian Romanick 2019-08-07 15:31:10 UTC
I think I have an idea what could be happening.  There are a lot of occurrences of a pattern like

    y = exp2(-(x*x)) * small_constant + y;

At the end, y is compared 0 < y, and that comparison is eliminated.  If x*x is sufficiently large, exp2(-(x*x)) will flush to zero.

Does changing

   case nir_op_fexp2:
      r = (struct ssa_result_range){gt_zero, analyze_expression(alu, 0, ht).is_integral};
      break;

to

   case nir_op_fexp2:
      r = (struct ssa_result_range){ge_zero, analyze_expression(alu, 0, ht).is_integral};
      break;

help?  I don't have the renderdoc set up on this system.  I can try that later today of you don't beat me to it.

If that fixes the problem, then fmul and ffma (and possibly others) will need fixes to account for flush-to-zero behavior.
Comment 5 Danylo 2019-08-07 18:55:02 UTC
Interesting observation. I would be able to try tomorrow.
Comment 6 Danylo 2019-08-08 08:23:16 UTC
Yes, changing fexp2 be >= 0 instead of > 0 solves the issue.
Comment 7 Ian Romanick 2019-08-09 17:24:20 UTC
https://gitlab.freedesktop.org/mesa/piglit/merge_requests/110 contains several test cases that reproduce this problem and some related problems.  I should have an MR for the fixes later today.
Comment 8 Ian Romanick 2019-08-30 19:23:21 UTC
Fixed by commit

commit 33ad2bab4bcb52c0f6be56e2f9cce5f52601a4ea
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Wed Aug 7 08:56:22 2019 -0700

    nir/range-analysis: Adjust result range of exp2 to account for flush-to-zero
    
    Fixes piglit tests (new in piglit!110):
    
        - fs-underflow-exp2-compare-zero.shader_test
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
    Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
    Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
    
    Most of the shaders affected are, unsurprisingly, in Unigine Heaven.
    
    All Gen6+ platforms had similar results. (Ice Lake shown)
    total instructions in shared programs: 16278207 -> 16278465 (<.01%)
    instructions in affected programs: 11374 -> 11632 (2.27%)
    helped: 0
    HURT: 58
    HURT stats (abs)   min: 2 max: 13 x̄: 4.45 x̃: 4
    HURT stats (rel)   min: 0.54% max: 4.11% x̄: 2.42% x̃: 2.82%
    95% mean confidence interval for instructions value: 3.77 5.13
    95% mean confidence interval for instructions %-change: 2.19% 2.64%
    Instructions are HURT.
    
    total cycles in shared programs: 367134284 -> 367135159 (<.01%)
    cycles in affected programs: 81207 -> 82082 (1.08%)
    helped: 17
    HURT: 36
    helped stats (abs) min: 6 max: 356 x̄: 90.35 x̃: 6
    helped stats (rel) min: 0.69% max: 21.45% x̄: 5.71% x̃: 0.78%
    HURT stats (abs)   min: 4 max: 235 x̄: 66.97 x̃: 16
    HURT stats (rel)   min: 0.35% max: 27.58% x̄: 5.34% x̃: 1.09%
    95% mean confidence interval for cycles value: -20.36 53.38
    95% mean confidence interval for cycles %-change: -1.08% 4.67%
    Inconclusive result (value mean confidence interval includes 0).
    
    No changes on any earlier platforms.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.