Bug 77544 - i965: Try to use LINE instructions to perform MAD with immediate arguments
Summary: i965: Try to use LINE instructions to perform MAD with immediate arguments
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: Other All
: medium enhancement
Assignee: Matt Turner
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: i965-perf
  Show dependency treegraph
 
Reported: 2014-04-16 20:47 UTC by Matt Turner
Modified: 2014-12-07 19:03 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Turner 2014-04-16 20:47:34 UTC
The LINE instruction performs a multiply-add instruction (a * b + c) where b and c are immediate arguments. It reads b and c from offsets in src0 such that you can load them (it they're representable) as a Vector-Float immediate.

I have a work-in-progress branch that implements this, and enables it on Gen 4 and 5.

http://cgit.freedesktop.org/~mattst88/mesa/log/?h=line

The shader-db results are promising once we allow MOV dst, VF to be CSE'd:

<Piles of shaders helped>

HURT:   shaders/gst-gl-text-download-yuy2-uyvy.frag fs8:  58 -> 59 (1.72%)
HURT:   shaders/unigine-tropics/465.shader_test fs8:      699 -> 1304 (86.55%)

LOST:   shaders/unigine-tropics/465.shader_test fs16

total instructions in shared programs: 806295 -> 803540 (-0.34%)
instructions in affected programs:     370163 -> 367408 (-0.74%)
GAINED:                                0
LOST:                                  1

Some investigation needs to happen to determine what in the world is going on in 465.shader_test.

Follow on work:
 - We should also implement this for the vec4 backend.

 - Consider whether using mov(1) to load floats that aren't representable as VF for LINE consumption is an improvement over not using LINE if the floats aren't representable as VF. (Probably so?)

 - Consider whether this optimization is beneficial on newer platforms:

SNB doesn't co-issue (so MAD isn't faster than LINE) but perhaps the MOV immediates can be reused more easily for other instructions?

Seems unlikely that IVB+ (that can co-issue) would benefit from preferring  LINE (that can't be co-issued) over MAD (that can be co-issued).

Talk to Matt if you want to work on this.
Comment 1 Matt Turner 2014-12-05 00:38:18 UTC
Patches sent to the mailing list.
Comment 2 Matt Turner 2014-12-07 19:03:15 UTC
Committed:

commit a28ad9d4c0d4b95aee8c3b99e9aaa59add21ea9d
Author: Matt Turner <mattst88@gmail.com>
Date:   Thu Apr 3 14:29:30 2014 -0700

    i965/fs: Perform CSE on MOV ..., VF instructions.
    
    Safe from causing optimization loops, since we don't constant propagate
    VF arguments.
    
    (for this and the previous patch):
    total instructions in shared programs: 4289075 -> 4271932 (-0.40%)
    instructions in affected programs:     1616779 -> 1599636 (-1.06%)
    
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit 963a3c7f90672c8d4931606d45e172792caf84ca
Author: Matt Turner <mattst88@gmail.com>
Date:   Tue Apr 1 16:49:13 2014 -0700

    i965/fs: Try to emit LINE instructions on Gen <= 5.
    
    The LINE instruction performs a multiply-add instruction (a * b + c)
    where b and c are scalar arguments. It reads b and c from offsets in
    src0 such that you can load them (it they're representable) as a
    vector-float immediate with a single instruction.
    
    Hurts some programs, but that'll all get better once we CSE the
    vector-float MOVs in the next patch.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77544
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>


bug/show.html.tmpl processed on Mar 27, 2017 at 20:24:42.
(provided by the Example extension).