Summary: | Assertion fails in nir_opt_remove_phis.c during compilation of SPIR-V shader | ||
---|---|---|---|
Product: | Mesa | Reporter: | Alastair Donaldson <afdx> |
Component: | Drivers/Vulkan/intel | Assignee: | Caio Marcelo de Oliveira Filho <caio.oliveira> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | caio.oliveira, cwabbott0, jason, jasuarez, t_arceri |
Version: | git | Keywords: | bisected, regression |
Hardware: | x86-64 (AMD64) | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 111444 | ||
Attachments: |
Amber test exposing the problem
backtrace_for_the_issue |
Description
Alastair Donaldson
2019-07-04 13:47:41 UTC
Created attachment 144705 [details] backtrace_for_the_issue Hi, thanks for bug report. I made a bisect and attached backtrace with debug symbols. Also want to note, that bisect lead to these two commits: [den@den-pc mesa]$ git bisect good There are only 'skip'ped commits left to test. The first bad commit could be any of: dc349f07b5f9c257f68ecf0cbb89f9722a42233a b3c6146925595ec3a7eece3afb9ccaad32906d4c We cannot bisect more! Commit dc349f07b5f9c257f68ecf0cbb89f9722a42233a failed to build, and commit below looked the first "bad" commit then commit b3c6146925595ec3a7eece3afb9ccaad32906d4c (HEAD) Author: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Date: Fri Sep 14 11:41:39 2018 -0700 nir: Copy propagation between blocks Extend the pass to propagate the copies information along the control flow graph. It performs two walks, first it collects the vars that were written inside each node. Then it walks applying the copy propagation using a list of copies previously available. At each node the list is invalidated according to results from the first walk. This approach is simpler than a full data-flow analysis, but covers various cases. If derefs are used for operating on more memory resources (e.g. SSBOs), the difference from a regular pass is expected to be more visible -- as the SSA copy propagation pass won't apply to those. A full data-flow analysis would handle more scenarios: conditional breaks in the control flow and merge equivalent effects from multiple branches (e.g. using a phi node to merge the source for writes to the same deref). However, as previous commentary in the code stated, its complexity 'rapidly get out of hand'. The current patch is a good intermediate step towards more complex analysis. The 'copies' linked list was modified to use util_dynarray to make it more convenient to clone it (to handle ifs/loops). Annotated shader-db results for Skylake: total instructions in shared programs: 15105796 -> 15105451 (<.01%) instructions in affected programs: 152293 -> 151948 (-0.23%) helped: 96 HURT: 17 All the HURTs and many HELPs are one instruction. Looking at pass by pass outputs, the copy prop kicks in removing a bunch of loads correctly, which ends up altering what other other optimizations kick. In those cases the copies would be propagated after lowering to SSA. In few HELPs we are actually helping doing more than was possible previously, e.g. consolidating load_uniforms from different blocks. Most of those are from shaders/dolphin/ubershaders/. total cycles in shared programs: 566048861 -> 565954876 (-0.02%) cycles in affected programs: 151461830 -> 151367845 (-0.06%) helped: 2933 HURT: 2950 A lot of noise on both sides. total loops in shared programs: 4603 -> 4603 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 11085 -> 11073 (-0.11%) spills in affected programs: 23 -> 11 (-52.17%) helped: 1 HURT: 0 The shaders/dolphin/ubershaders/12.shader_test was able to pull a couple of loads from inside if statements and reuse them. total fills in shared programs: 23143 -> 23089 (-0.23%) fills in affected programs: 2718 -> 2664 (-1.99%) helped: 27 HURT: 0 All from shaders/dolphin/ubershaders/. LOST: 0 GAINED: 0 The other generations follow the same overall shape. The spills and fills HURTs are all from the same game. shader-db results for Broadwell. total instructions in shared programs: 15402037 -> 15401841 (<.01%) instructions in affected programs: 144386 -> 144190 (-0.14%) helped: 86 HURT: 9 total cycles in shared programs: 600912755 -> 600902486 (<.01%) cycles in affected programs: 185662820 -> 185652551 (<.01%) helped: 2598 HURT: 3053 total loops in shared programs: 4579 -> 4579 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 80929 -> 80924 (<.01%) spills in affected programs: 720 -> 715 (-0.69%) helped: 1 HURT: 5 total fills in shared programs: 93057 -> 93013 (-0.05%) fills in affected programs: 3398 -> 3354 (-1.29%) helped: 27 HURT: 5 LOST: 0 GAINED: 2 : total cycles in shared programs: 600912755 -> 600902486 (<.01%) cycles in affected programs: 185662820 -> 185652551 (<.01%) helped: 2598 HURT: 3053 total loops in shared programs: 4579 -> 4579 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 80929 -> 80924 (<.01%) spills in affected programs: 720 -> 715 (-0.69%) helped: 1 HURT: 5 total fills in shared programs: 93057 -> 93013 (-0.05%) fills in affected programs: 3398 -> 3354 (-1.29%) helped: 27 HURT: 5 LOST: 0 GAINED: 2 shader-db results for Haswell: total instructions in shared programs: 9231975 -> 9230357 (-0.02%) instructions in affected programs: 44992 -> 43374 (-3.60%) helped: 27 HURT: 69 total cycles in shared programs: 87760587 -> 87727502 (-0.04%) cycles in affected programs: 7720673 -> 7687588 (-0.43%) helped: 1609 HURT: 1416 total loops in shared programs: 1830 -> 1830 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 1988 -> 1692 (-14.89%) spills in affected programs: 296 -> 0 helped: 1 HURT: 0 total fills in shared programs: 2103 -> 1668 (-20.68%) fills in affected programs: 438 -> 3 (-99.32%) helped: 4 HURT: 0 LOST: 0 GAINED: 1 v2: Remove the DISABLE prefix from tests we now pass. v3: Add comments about missing write_mask handling. (Caio) Add unreachable when switching on cf_node type. (Jason) : total cycles in shared programs: 600912755 -> 600902486 (<.01%) cycles in affected programs: 185662820 -> 185652551 (<.01%) helped: 2598 HURT: 3053 total loops in shared programs: 4579 -> 4579 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 80929 -> 80924 (<.01%) spills in affected programs: 720 -> 715 (-0.69%) helped: 1 HURT: 5 total fills in shared programs: 93057 -> 93013 (-0.05%) fills in affected programs: 3398 -> 3354 (-1.29%) helped: 27 HURT: 5 LOST: 0 GAINED: 2 shader-db results for Haswell: total instructions in shared programs: 9231975 -> 9230357 (-0.02%) instructions in affected programs: 44992 -> 43374 (-3.60%) helped: 27 HURT: 69 total cycles in shared programs: 87760587 -> 87727502 (-0.04%) cycles in affected programs: 7720673 -> 7687588 (-0.43%) helped: 1609 HURT: 1416 total loops in shared programs: 1830 -> 1830 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 1988 -> 1692 (-14.89%) spills in affected programs: 296 -> 0 helped: 1 HURT: 0 total fills in shared programs: 2103 -> 1668 (-20.68%) fills in affected programs: 438 -> 3 (-99.32%) helped: 4 HURT: 0 LOST: 0 GAINED: 1 v2: Remove the DISABLE prefix from tests we now pass. v3: Add comments about missing write_mask handling. (Caio) Add unreachable when switching on cf_node type. (Jason) Properly merge the component information in written map instead of replacing. (Jason) Explain how removal from written arrays works. (Jason) Use mode directly from deref instead of getting the var. (Jason) v4: Register the local written mode for calls. (Jason) Prefer cf_node instead of node. (Jason) Clarify that remove inside iteration only works in backward iterations. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Ugh... This one gets sticky. The problem is a rather strange sequence of events: 1. nir_opt_dead_cf cleans up the outer `if (true) {`. 2. nir_opt_if figures out the condition inside condition and turns `while (gl_FragCoord.x < 0.0)` into `while (1)`. 3. nir_opt_dead_cf cleans up an if statement with the break that was the only exit from a loop. 4. Because the deleted break was the only exit of the loop, now unreachable code later in the shader still consumes values generated by the loop (dead_cf only cleans up phis, not things which exit the dead code without a phi). 5. Loop unrolling comes along and attempts to partially unroll the loop (I think? I'm not actually sure on this one.) and inserts LCSSA phis which have no sources. 6. nir_opt_remove_phis dies on the existence of phis with zero sources. So what do we do about it? Technically, 4. is ok because the way dominance is defined, unreachable code is vacuously dominated by all other code in the shader. I see a few options: 1. Detect when we've deleted the last exit of a loop and simply delete the entire loop immediately in nir_opt_dead_cf. 2. Don't generate LCSSA phis if the loop exit is unreachable 3. Make nir_opt_remove_phis replace phis with zero sources with ssa_undef instructions 4. All of the above? Thoughs? Would there be any disadvantage to just doing option 3? That sounds like an elegant solution (though I don't know the code). There is one more issue related to the infinity loops without exits: bug111405 I implemented option 1. there and it helped for both issue: https://gitlab.freedesktop.org/asimiklit/mesa/commits/fix/loop_with_no_exits But I am not sure whether it is a correct way to fix it. What do you think about it? Option 1. is under discussion here: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1717 new proposal has been implemented to fix this: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1827 This issue doesn't reproduce on the latest master, was fixed by MR: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1827 MR!1827 has been merged in master |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.