I recently encountered some generated NIR like: vec1 ssa_3 = load_const (0xffffffff /* -nan */) ... /* succs: block_22 block_23 */ if ssa_153 { block block_22: /* preds: block_21 */ /* succs: block_24 */ } else { block block_23: /* preds: block_21 */ vec1 ssa_175 = fadd ssa_171, -ssa_135 vec1 ssa_176 = fge abs(ssa_175), ssa_145 /* succs: block_24 */ } block block_24: /* preds: block_22 block_23 */ vec1 ssa_177 = phi block_22: ssa_3, block_23: ssa_176 This is just a bloated way of saying (ssa_153 || (abs(ssa_171 - ssa_135) >= ssa_145)). Some part of the compiler should be able to recognize this and generate the more compact code. The out-of-SSA pass generates: /* succs: block_19 block_20 */ if r12 { block block_19: /* preds: block_18 */ r12 = imov ssa_3 /* succs: block_21 */ } else { block block_20: /* preds: block_18 */ vec1 ssa_172 = fadd r14, -r3 r12 = fge abs(ssa_172), ssa_145 /* succs: block_21 */ } block block_21: /* preds: block_19 block_20 */ And the i965 backend generates (for BDW): mov.nz.f0(8) null<1>D g39<8,8,1>D { align1 1Q }; (+f0) if(8) JIP: 5 UIP: 7 { align1 1Q }; END B14 ->B15 ->B16 START B15 <-B14 mov(8) g39<1>UD 0xffffffffUD { align1 1Q compacted }; else(8) JIP: 4 { align1 1Q }; END B15 ->B17 START B16 <-B14 add(8) g2<1>F g41<8,8,1>F -g34<8,8,1>F { align1 1Q compacted }; cmp.ge.f0(8) g39<1>F (abs)g2<8,8,1>F g111<8,8,1>F { align1 1Q compacted }; END B16 ->B17 There are a bunch of improvements that could be made to the generated code. At the very least, the spurious assignment to g39 in the then-statement should be removed. Then, depending on the surrounding code any one of add(8) g2<1>F g41<8,8,1>F -g34<8,8,1>F { align1 1Q compacted }; cmp.ge.f0(8) g40<1>F (abs)g2<8,8,1>F g111<8,8,1>F { align1 1Q compacted }; or(8) g39<1>D g39<8,8,1>D g40<8,8,1>D { align1 1Q compacted }; or add(8) g2<1>F g41<8,8,1>F -g34<8,8,1>F { align1 1Q compacted }; cmp.ge.f0(8) null<1>F (abs)g2<8,8,1>F g111<8,8,1>F { align1 1Q compacted }; (+f0) mov(8) g39<1>UD 0xffffffffUD { align1 1Q compacted }; or mov.nz.f0(8) null<1>D g39<8,8,1>D { align1 1Q }; (-f0) add(8) g2<1>F g41<8,8,1>F -g34<8,8,1>F { align1 1Q compacted }; (-f0) cmp.ge.f0(8) g39<1>F (abs)g2<8,8,1>F g111<8,8,1>F { align1 1Q compacted };
The code for this already exists. It's called nir_opt_peephole_select. The only problem is that it only triggers if both sides of the if are empty. I've been wanting to add some sort of heuristic to it for some time now. The only problem is that it's really back-end specific.
As (In reply to Jason Ekstrand from comment #1) > The code for this already exists. It's called nir_opt_peephole_select. The > only problem is that it only triggers if both sides of the if are empty. > I've been wanting to add some sort of heuristic to it for some time now. > The only problem is that it's really back-end specific. Although i965 doesn't use it I believe this is otherwise fixed by the following commit. We should probably close this bug: commit 36f0f0318275f65f8744ec6f9471702e2f58e6d5 Author: Eric Anholt <eric@anholt.net> Date: Tue Sep 6 19:45:51 2016 -0700 nir: Allow opt_peephole_sel to be more aggressive in flattening IFs. VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 101282 -> 99543 (-1.72%) instructions in affected programs: 17365 -> 15626 (-10.01%) total uniforms in shared programs: 31295 -> 31172 (-0.39%) uniforms in affected programs: 3580 -> 3457 (-3.44%) total estimated cycles in shared programs: 225182 -> 223746 (-0.64%) estimated cycles in affected programs: 26085 -> 24649 (-5.51%) v2: Update shader-db output. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
(In reply to Timothy Arceri from comment #2) > As (In reply to Jason Ekstrand from comment #1) > > The code for this already exists. It's called nir_opt_peephole_select. The > > only problem is that it only triggers if both sides of the if are empty. > > I've been wanting to add some sort of heuristic to it for some time now. > > The only problem is that it's really back-end specific. > > Although i965 doesn't use it I believe this is otherwise fixed by the > following commit. We should probably close this bug: Thanks for reminding me about these "don't forget" bugs that I filed. :) I clearly forgot about them anyway. I have a patch for this that has been on the list for almost 3 weeks. https://patchwork.freedesktop.org/patch/233182/ That will just eliminate the flow control (but it might not since there are 2 instructions... I wish I could remember which shader this was). Even then we'll get something like ssa_177 = ssa_153 ? true : abs(ssa_171 - ssa_135) >= ssa_145 when we really want (ssa_153 || (abs(ssa_171 - ssa_135) >= ssa_145)). I have another patch series in progress that will take care of that.
(In reply to Ian Romanick from comment #3) > (In reply to Timothy Arceri from comment #2) > > As (In reply to Jason Ekstrand from comment #1) > > > The code for this already exists. It's called nir_opt_peephole_select. The > > > only problem is that it only triggers if both sides of the if are empty. > > > I've been wanting to add some sort of heuristic to it for some time now. > > > The only problem is that it's really back-end specific. > > > > Although i965 doesn't use it I believe this is otherwise fixed by the > > following commit. We should probably close this bug: > > Thanks for reminding me about these "don't forget" bugs that I filed. :) I > clearly forgot about them anyway. > > I have a patch for this that has been on the list for almost 3 weeks. > > https://patchwork.freedesktop.org/patch/233182/ > > That will just eliminate the flow control (but it might not since there are > 2 instructions... I wish I could remember which shader this was). Even then > we'll get something like > > ssa_177 = ssa_153 ? true : abs(ssa_171 - ssa_135) >= ssa_145 > > when we really want (ssa_153 || (abs(ssa_171 - ssa_135) >= ssa_145)). I > have another patch series in progress that will take care of that. I suspect we have been working on similar changes. I've just sent a series that should cover this also. https://patchwork.freedesktop.org/series/46827/
In the original message I neglected to mention any specific shader that exhibited this issue. Allow me to remedy that: shaders/closed/unigine/superposition/high/5392.shader_test. This shader generates a very frustrating sequence like: vec1 32 ssa_24 = load_const (0x00000000 /* 0.000000 */) ... vec1 32 ssa_248 = fge32 ssa_237.x, ssa_241 vec1 32 ssa_249 = uge32 ssa_23, ssa_247 vec1 32 ssa_250 = iand ssa_249, ssa_248 /* succs: block_9 block_10 */ if ssa_250 { block block_9: /* preds: block_8 */ ... vec1 32 ssa_257 = ior ssa_255, ssa_256 vec1 32 ssa_258 = inot ssa_257 /* succs: block_11 */ } else { block block_10: /* preds: block_8 */ /* succs: block_11 */ } block block_11: /* preds: block_9 block_10 */ vec1 32 ssa_259 = phi block_9: ssa_258, block_10: ssa_24 vec1 32 ssa_260 = inot ssa_259 /* succs: block_12 block_13 */ if ssa_260 { block block_12: /* preds: block_11 */ break /* succs: block_15 */ } else { block block_13: /* preds: block_11 */ /* succs: block_14 */ }
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/810.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.