|Summary:||i965: Try to use LINE instructions to perform MAD with immediate arguments|
|Product:||Mesa||Reporter:||Matt Turner <mattst88>|
|Component:||Drivers/DRI/i965||Assignee:||Matt Turner <mattst88>|
|Status:||RESOLVED FIXED||QA Contact:||Intel 3D Bugs Mailing List <intel-3d-bugs>|
|i915 platform:||i915 features:|
|Bug Depends on:|
Description Matt Turner 2014-04-16 20:47:34 UTC
The LINE instruction performs a multiply-add instruction (a * b + c) where b and c are immediate arguments. It reads b and c from offsets in src0 such that you can load them (it they're representable) as a Vector-Float immediate. I have a work-in-progress branch that implements this, and enables it on Gen 4 and 5. http://cgit.freedesktop.org/~mattst88/mesa/log/?h=line The shader-db results are promising once we allow MOV dst, VF to be CSE'd: <Piles of shaders helped> HURT: shaders/gst-gl-text-download-yuy2-uyvy.frag fs8: 58 -> 59 (1.72%) HURT: shaders/unigine-tropics/465.shader_test fs8: 699 -> 1304 (86.55%) LOST: shaders/unigine-tropics/465.shader_test fs16 total instructions in shared programs: 806295 -> 803540 (-0.34%) instructions in affected programs: 370163 -> 367408 (-0.74%) GAINED: 0 LOST: 1 Some investigation needs to happen to determine what in the world is going on in 465.shader_test. Follow on work: - We should also implement this for the vec4 backend. - Consider whether using mov(1) to load floats that aren't representable as VF for LINE consumption is an improvement over not using LINE if the floats aren't representable as VF. (Probably so?) - Consider whether this optimization is beneficial on newer platforms: SNB doesn't co-issue (so MAD isn't faster than LINE) but perhaps the MOV immediates can be reused more easily for other instructions? Seems unlikely that IVB+ (that can co-issue) would benefit from preferring LINE (that can't be co-issued) over MAD (that can be co-issued). Talk to Matt if you want to work on this.
Comment 1 Matt Turner 2014-12-05 00:38:18 UTC
Patches sent to the mailing list.
Comment 2 Matt Turner 2014-12-07 19:03:15 UTC
Committed: commit a28ad9d4c0d4b95aee8c3b99e9aaa59add21ea9d Author: Matt Turner <email@example.com> Date: Thu Apr 3 14:29:30 2014 -0700 i965/fs: Perform CSE on MOV ..., VF instructions. Safe from causing optimization loops, since we don't constant propagate VF arguments. (for this and the previous patch): total instructions in shared programs: 4289075 -> 4271932 (-0.40%) instructions in affected programs: 1616779 -> 1599636 (-1.06%) Reviewed-by: Ian Romanick <firstname.lastname@example.org> commit 963a3c7f90672c8d4931606d45e172792caf84ca Author: Matt Turner <email@example.com> Date: Tue Apr 1 16:49:13 2014 -0700 i965/fs: Try to emit LINE instructions on Gen <= 5. The LINE instruction performs a multiply-add instruction (a * b + c) where b and c are scalar arguments. It reads b and c from offsets in src0 such that you can load them (it they're representable) as a vector-float immediate with a single instruction. Hurts some programs, but that'll all get better once we CSE the vector-float MOVs in the next patch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77544 Reviewed-by: Ian Romanick <firstname.lastname@example.org>