Bug 94747 - Convert phi nodes to logical operations
Summary: Convert phi nodes to logical operations
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: glsl-compiler (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Ian Romanick
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: i965-perf
  Show dependency treegraph
 
Reported: 2016-03-29 19:25 UTC by Ian Romanick
Modified: 2019-01-29 01:08 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ian Romanick 2016-03-29 19:25:59 UTC
I recently encountered some generated NIR like:

        vec1 ssa_3 = load_const (0xffffffff /* -nan */)

        ...

                        /* succs: block_22 block_23 */
                        if ssa_153 {
                                block block_22:
                                /* preds: block_21 */
                                /* succs: block_24 */
                        } else {
                                block block_23:
                                /* preds: block_21 */
                                vec1 ssa_175 = fadd ssa_171, -ssa_135
                                vec1 ssa_176 = fge abs(ssa_175), ssa_145
                                /* succs: block_24 */
                        }
                        block block_24:
                        /* preds: block_22 block_23 */
                        vec1 ssa_177 = phi block_22: ssa_3, block_23: ssa_176

This is just a bloated way of saying (ssa_153 || (abs(ssa_171 - ssa_135) >= ssa_145)).  Some part of the compiler should be able to recognize this and generate the more compact code.

The out-of-SSA pass generates:

                        /* succs: block_19 block_20 */
                        if r12 {
                                block block_19:
                                /* preds: block_18 */
                                r12 = imov ssa_3
                                /* succs: block_21 */
                        } else {
                                block block_20:
                                /* preds: block_18 */
                                vec1 ssa_172 = fadd r14, -r3
                                r12 = fge abs(ssa_172), ssa_145
                                /* succs: block_21 */
                        }
                        block block_21:
                        /* preds: block_19 block_20 */

And the i965 backend generates (for BDW):

mov.nz.f0(8)    null<1>D        g39<8,8,1>D                     { align1 1Q };
(+f0) if(8)     JIP: 5          UIP: 7                          { align1 1Q };
   END B14 ->B15 ->B16
   START B15 <-B14
mov(8)          g39<1>UD        0xffffffffUD                    { align1 1Q compacted };
else(8)         JIP: 4                                          { align1 1Q };
   END B15 ->B17
   START B16 <-B14
add(8)          g2<1>F          g41<8,8,1>F     -g34<8,8,1>F    { align1 1Q compacted };
cmp.ge.f0(8)    g39<1>F         (abs)g2<8,8,1>F g111<8,8,1>F    { align1 1Q compacted };
   END B16 ->B17

There are a bunch of improvements that could be made to the generated code.  At the very least, the spurious assignment to g39 in the then-statement should be removed.

Then, depending on the surrounding code any one of

add(8)          g2<1>F          g41<8,8,1>F     -g34<8,8,1>F    { align1 1Q compacted };
cmp.ge.f0(8)    g40<1>F         (abs)g2<8,8,1>F g111<8,8,1>F    { align1 1Q compacted };
or(8)           g39<1>D         g39<8,8,1>D     g40<8,8,1>D     { align1 1Q compacted };

or

add(8)          g2<1>F          g41<8,8,1>F     -g34<8,8,1>F    { align1 1Q compacted };
cmp.ge.f0(8)    null<1>F        (abs)g2<8,8,1>F g111<8,8,1>F    { align1 1Q compacted };
(+f0) mov(8)    g39<1>UD        0xffffffffUD                    { align1 1Q compacted };

or

mov.nz.f0(8)    null<1>D        g39<8,8,1>D                     { align1 1Q };
(-f0) add(8)          g2<1>F          g41<8,8,1>F     -g34<8,8,1>F    { align1 1Q compacted };
(-f0) cmp.ge.f0(8)    g39<1>F         (abs)g2<8,8,1>F g111<8,8,1>F    { align1 1Q compacted };
Comment 1 Jason Ekstrand 2016-03-31 00:33:57 UTC
The code for this already exists.  It's called nir_opt_peephole_select.  The only problem is that it only triggers if both sides of the if are empty.  I've been wanting to add some sort of heuristic to it for some time now.  The only problem is that it's really back-end specific.
Comment 2 Timothy Arceri 2018-07-18 06:36:42 UTC
As (In reply to Jason Ekstrand from comment #1)
> The code for this already exists.  It's called nir_opt_peephole_select.  The
> only problem is that it only triggers if both sides of the if are empty. 
> I've been wanting to add some sort of heuristic to it for some time now. 
> The only problem is that it's really back-end specific.

Although i965 doesn't use it I believe this is otherwise fixed by the following commit. We should probably close this bug:

commit 36f0f0318275f65f8744ec6f9471702e2f58e6d5
Author: Eric Anholt <eric@anholt.net>
Date:   Tue Sep 6 19:45:51 2016 -0700

    nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.
    
    VC4 was running into a major performance regression from enabling control
    flow in the glmark2 conditionals test, because of short if statements
    containing an ffract.
    
    This pass seems like it was was trying to ensure that we only flattened
    IFs that should be entirely a win by guaranteeing that there would be
    fewer bcsels than there were MOVs otherwise.  However, if the number of
    ALU ops is small, we can avoid the overhead of branching (which itself
    costs cycles) and still get a win, even if it means moving real
    instructions out of the THEN/ELSE blocks.
    
    For now, just turn on aggressive flattening on vc4.  i965 will need some
    tuning to avoid regressions.  It does looks like this may be useful to
    replace freedreno code.
    
    Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
    fps to 95 fps on vc4.
    
    vc4 shader-db:
    total instructions in shared programs: 101282 -> 99543 (-1.72%)
    instructions in affected programs:     17365 -> 15626 (-10.01%)
    total uniforms in shared programs: 31295 -> 31172 (-0.39%)
    uniforms in affected programs:     3580 -> 3457 (-3.44%)
    total estimated cycles in shared programs: 225182 -> 223746 (-0.64%)
    estimated cycles in affected programs:     26085 -> 24649 (-5.51%)
    
    v2: Update shader-db output.
    
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Comment 3 Ian Romanick 2018-07-18 21:46:04 UTC
(In reply to Timothy Arceri from comment #2)
> As (In reply to Jason Ekstrand from comment #1)
> > The code for this already exists.  It's called nir_opt_peephole_select.  The
> > only problem is that it only triggers if both sides of the if are empty. 
> > I've been wanting to add some sort of heuristic to it for some time now. 
> > The only problem is that it's really back-end specific.
> 
> Although i965 doesn't use it I believe this is otherwise fixed by the
> following commit. We should probably close this bug:

Thanks for reminding me about these "don't forget" bugs that I filed. :)  I clearly forgot about them anyway.

I have a patch for this that has been on the list for almost 3 weeks.

https://patchwork.freedesktop.org/patch/233182/

That will just eliminate the flow control (but it might not since there are 2 instructions... I wish I could remember which shader this was).  Even then we'll get something like

    ssa_177 = ssa_153 ? true : abs(ssa_171 - ssa_135) >= ssa_145

when we really want (ssa_153 || (abs(ssa_171 - ssa_135) >= ssa_145)).  I have another patch series in progress that will take care of that.
Comment 4 Timothy Arceri 2018-07-19 04:00:41 UTC
(In reply to Ian Romanick from comment #3)
> (In reply to Timothy Arceri from comment #2)
> > As (In reply to Jason Ekstrand from comment #1)
> > > The code for this already exists.  It's called nir_opt_peephole_select.  The
> > > only problem is that it only triggers if both sides of the if are empty. 
> > > I've been wanting to add some sort of heuristic to it for some time now. 
> > > The only problem is that it's really back-end specific.
> > 
> > Although i965 doesn't use it I believe this is otherwise fixed by the
> > following commit. We should probably close this bug:
> 
> Thanks for reminding me about these "don't forget" bugs that I filed. :)  I
> clearly forgot about them anyway.
> 
> I have a patch for this that has been on the list for almost 3 weeks.
> 
> https://patchwork.freedesktop.org/patch/233182/
> 
> That will just eliminate the flow control (but it might not since there are
> 2 instructions... I wish I could remember which shader this was).  Even then
> we'll get something like
> 
>     ssa_177 = ssa_153 ? true : abs(ssa_171 - ssa_135) >= ssa_145
> 
> when we really want (ssa_153 || (abs(ssa_171 - ssa_135) >= ssa_145)).  I
> have another patch series in progress that will take care of that.

I suspect we have been working on similar changes. I've just sent a series that should cover this also.

https://patchwork.freedesktop.org/series/46827/
Comment 5 Ian Romanick 2019-01-29 01:08:44 UTC
In the original message I neglected to mention any specific shader that exhibited this issue.  Allow me to remedy that: shaders/closed/unigine/superposition/high/5392.shader_test.  This shader generates a very frustrating sequence like:

        vec1 32 ssa_24 = load_const (0x00000000 /* 0.000000 */)
        ...

                        vec1 32 ssa_248 = fge32 ssa_237.x, ssa_241
                        vec1 32 ssa_249 = uge32 ssa_23, ssa_247
                        vec1 32 ssa_250 = iand ssa_249, ssa_248
                        /* succs: block_9 block_10 */
                        if ssa_250 {
                                block block_9:
                                /* preds: block_8 */
                                ...
                                vec1 32 ssa_257 = ior ssa_255, ssa_256
                                vec1 32 ssa_258 = inot ssa_257
                                /* succs: block_11 */
                        } else {
                                block block_10:
                                /* preds: block_8 */
                                /* succs: block_11 */
                        }
                        block block_11:
                        /* preds: block_9 block_10 */
                        vec1 32 ssa_259 = phi block_9: ssa_258, block_10: ssa_24
                        vec1 32 ssa_260 = inot ssa_259
                        /* succs: block_12 block_13 */
                        if ssa_260 {
                                block block_12:
                                /* preds: block_11 */
                                break
                                /* succs: block_15 */
                        } else {
                                block block_13:
                                /* preds: block_11 */
                                /* succs: block_14 */
                        }


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.