Bug 77544

Summary:	i965: Try to use LINE instructions to perform MAD with immediate arguments
Product:	Mesa	Reporter:	Matt Turner <mattst88>
Component:	Drivers/DRI/i965	Assignee:	Matt Turner <mattst88>
Status:	RESOLVED FIXED	QA Contact:	Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity:	enhancement
Priority:	medium	CC:	juhapekka.heikkila
Version:	unspecified
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	77547

Description Matt Turner 2014-04-16 20:47:34 UTC

The LINE instruction performs a multiply-add instruction (a * b + c) where b and c are immediate arguments. It reads b and c from offsets in src0 such that you can load them (it they're representable) as a Vector-Float immediate.

I have a work-in-progress branch that implements this, and enables it on Gen 4 and 5.

http://cgit.freedesktop.org/~mattst88/mesa/log/?h=line

The shader-db results are promising once we allow MOV dst, VF to be CSE'd:

<Piles of shaders helped>

HURT:   shaders/gst-gl-text-download-yuy2-uyvy.frag fs8:  58 -> 59 (1.72%)
HURT:   shaders/unigine-tropics/465.shader_test fs8:      699 -> 1304 (86.55%)

LOST:   shaders/unigine-tropics/465.shader_test fs16

total instructions in shared programs: 806295 -> 803540 (-0.34%)
instructions in affected programs:     370163 -> 367408 (-0.74%)
GAINED:                                0
LOST:                                  1

Some investigation needs to happen to determine what in the world is going on in 465.shader_test.

Follow on work:
 - We should also implement this for the vec4 backend.

 - Consider whether using mov(1) to load floats that aren't representable as VF for LINE consumption is an improvement over not using LINE if the floats aren't representable as VF. (Probably so?)

 - Consider whether this optimization is beneficial on newer platforms:

SNB doesn't co-issue (so MAD isn't faster than LINE) but perhaps the MOV immediates can be reused more easily for other instructions?

Seems unlikely that IVB+ (that can co-issue) would benefit from preferring  LINE (that can't be co-issued) over MAD (that can be co-issued).

Talk to Matt if you want to work on this.

Comment 1 Matt Turner 2014-12-05 00:38:18 UTC

Patches sent to the mailing list.

Comment 2 Matt Turner 2014-12-07 19:03:15 UTC

Committed:

commit a28ad9d4c0d4b95aee8c3b99e9aaa59add21ea9d
Author: Matt Turner <mattst88@gmail.com>
Date:   Thu Apr 3 14:29:30 2014 -0700

    i965/fs: Perform CSE on MOV ..., VF instructions.
    
    Safe from causing optimization loops, since we don't constant propagate
    VF arguments.
    
    (for this and the previous patch):
    total instructions in shared programs: 4289075 -> 4271932 (-0.40%)
    instructions in affected programs:     1616779 -> 1599636 (-1.06%)
    
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit 963a3c7f90672c8d4931606d45e172792caf84ca
Author: Matt Turner <mattst88@gmail.com>
Date:   Tue Apr 1 16:49:13 2014 -0700

    i965/fs: Try to emit LINE instructions on Gen <= 5.
    
    The LINE instruction performs a multiply-add instruction (a * b + c)
    where b and c are scalar arguments. It reads b and c from offsets in
    src0 such that you can load them (it they're representable) as a
    vector-float immediate with a single instruction.
    
    Hurts some programs, but that'll all get better once we CSE the
    vector-float MOVs in the next patch.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77544
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.