Created attachment 144958 [details] unigine_heaven_artifacts On latest master there are black squares artifacts when running Unigine Heaven through DXVK under wine. It is bisected to: 96fcb3f95bdd53c8c1bdc243c95811acabd3f52c is the first bad commit commit 96fcb3f95bdd53c8c1bdc243c95811acabd3f52c Author: Ian Romanick <ian.d.romanick@intel.com> Date: Thu Oct 11 14:21:42 2018 -0700 nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements This just eliminates tautological / contradictory compares that are used for bcsel and other non-if-statement cases. If-statements are not affected because removing flow control can cause the i965 instrution scheduler to create some very long live ranges resulting in unncessary spilling. This causes some shaders to fall of a performance cliff. Since many small if-statements are already flattened to bcsel, this optimization covers more than 68% of the possible cases (2417 shaders helped for instructions on Skylake vs. 3554). v2: Reorder and add whitespace to make the relationship between the patterns more obvious. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333474 -> 16322028 (-0.07%) instructions in affected programs: 438559 -> 427113 (-2.61%) helped: 1765 HURT: 0 helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4 helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82% 95% mean confidence interval for instructions value: -6.87 -6.10 95% mean confidence interval for instructions %-change: -4.30% -3.84% Instructions are helped. total cycles in shared programs: 367608554 -> 367511103 (-0.03%) cycles in affected programs: 8368829 -> 8271378 (-1.16%) helped: 1541 HURT: 129 helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39 helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17% HURT stats (abs) min: 1 max: 973 x̄: 42.25 x̃: 10 HURT stats (rel) min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60% 95% mean confidence interval for cycles value: -64.90 -51.81 95% mean confidence interval for cycles %-change: -3.89% -3.36% Cycles are helped. total spills in shared programs: 8867 -> 8868 (0.01%) spills in affected programs: 18 -> 19 (5.56%) helped: 0 HURT: 1 total fills in shared programs: 21900 -> 21903 (0.01%) fills in affected programs: 78 -> 81 (3.85%) helped: 0 HURT: 1 All Gen6 and earlier platforms had similar results. (Sandy Bridge shown) total instructions in shared programs: 10829877 -> 10829247 (<.01%) instructions in affected programs: 30240 -> 29610 (-2.08%) helped: 177 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3 helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94% 95% mean confidence interval for instructions value: -3.93 -3.18 95% mean confidence interval for instructions %-change: -3.04% -2.32% Instructions are helped. total cycles in shared programs: 154036580 -> 154035437 (<.01%) cycles in affected programs: 352402 -> 351259 (-0.32%) helped: 96 HURT: 28 helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6 helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46% HURT stats (abs) min: 1 max: 117 x̄: 9.68 x̃: 4 HURT stats (rel) min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23% 95% mean confidence interval for cycles value: -13.40 -5.03 95% mean confidence interval for cycles %-change: -1.62% -0.53% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> :040000 040000 30117ba78cfcc635e7974f4af6e4eff4f16eadf2 7e5be8503a115d930b1dc2591365e1ea0e1d4d5f M src The exact optimization which causes it is: (('flt', 'b(is_not_positive)', 'a(is_gt_zero)'), True), Which itself looks correct. Maybe there is a bug in range analysis or/and some wild NaN sneaked in.
Checking out 96fcb3f95bd and commenting out that optimization doesn't affect any of the Unigine Heaven shaders that I have in shader-db (which are all from the native Linux OpenGL version). Do you think you could send me dumps with and without that optimization with the environment variable NIR_PRINT=true? If I can see what changes in the NIR, that should shed some light on things.
Another thing to try... does changing 'flt' to '~flt' make any difference?
Created attachment 144968 [details] good_vs_bad_shaders_nir > does changing 'flt' to '~flt' make any difference? No > Do you think you could send me dumps with and without that optimization with the environment variable NIR_PRINT=true? I've attached the relevant parts of optimizations and the final NIR, the whole dump is 300 mb so I didn't attach it. Vulkan renderdoc of frame with issue: https://mega.nz/#!sR8DGKwD!IHSQv6dWjk-YfyOnWZy36v-STBctmJbqGod19RPVDfg . Captured on Mesa master and HD 620. It is a capture of dx11 capture because the app is 32 bit so I was unable to directly capture Vulkan trace of it. Don't mind green artifacts in trace - they appear only in Vulkan trace. The corruption starts from the call EID 2821, select "NaN/INF" overlay and look at the only output of this draw. NaNs would be highlighted as red.
I think I have an idea what could be happening. There are a lot of occurrences of a pattern like y = exp2(-(x*x)) * small_constant + y; At the end, y is compared 0 < y, and that comparison is eliminated. If x*x is sufficiently large, exp2(-(x*x)) will flush to zero. Does changing case nir_op_fexp2: r = (struct ssa_result_range){gt_zero, analyze_expression(alu, 0, ht).is_integral}; break; to case nir_op_fexp2: r = (struct ssa_result_range){ge_zero, analyze_expression(alu, 0, ht).is_integral}; break; help? I don't have the renderdoc set up on this system. I can try that later today of you don't beat me to it. If that fixes the problem, then fmul and ffma (and possibly others) will need fixes to account for flush-to-zero behavior.
Interesting observation. I would be able to try tomorrow.
Yes, changing fexp2 be >= 0 instead of > 0 solves the issue.
https://gitlab.freedesktop.org/mesa/piglit/merge_requests/110 contains several test cases that reproduce this problem and some related problems. I should have an MR for the fixes later today.
Fixed by commit commit 33ad2bab4bcb52c0f6be56e2f9cce5f52601a4ea Author: Ian Romanick <ian.d.romanick@intel.com> Date: Wed Aug 7 08:56:22 2019 -0700 nir/range-analysis: Adjust result range of exp2 to account for flush-to-zero Fixes piglit tests (new in piglit!110): - fs-underflow-exp2-compare-zero.shader_test Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308 Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass") Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Most of the shaders affected are, unsurprisingly, in Unigine Heaven. All Gen6+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16278207 -> 16278465 (<.01%) instructions in affected programs: 11374 -> 11632 (2.27%) helped: 0 HURT: 58 HURT stats (abs) min: 2 max: 13 x̄: 4.45 x̃: 4 HURT stats (rel) min: 0.54% max: 4.11% x̄: 2.42% x̃: 2.82% 95% mean confidence interval for instructions value: 3.77 5.13 95% mean confidence interval for instructions %-change: 2.19% 2.64% Instructions are HURT. total cycles in shared programs: 367134284 -> 367135159 (<.01%) cycles in affected programs: 81207 -> 82082 (1.08%) helped: 17 HURT: 36 helped stats (abs) min: 6 max: 356 x̄: 90.35 x̃: 6 helped stats (rel) min: 0.69% max: 21.45% x̄: 5.71% x̃: 0.78% HURT stats (abs) min: 4 max: 235 x̄: 66.97 x̃: 16 HURT stats (rel) min: 0.35% max: 27.58% x̄: 5.34% x̃: 1.09% 95% mean confidence interval for cycles value: -20.36 53.38 95% mean confidence interval for cycles %-change: -1.08% 4.67% Inconclusive result (value mean confidence interval includes 0). No changes on any earlier platforms.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.