Bug 83658 - Loop unrolling doesn't take into account whether it will decrease or increase performance
Summary: Loop unrolling doesn't take into account whether it will decrease or increase...
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: i965-perf
  Show dependency treegraph
 
Reported: 2014-09-09 10:10 UTC by Eero Tamminen
Modified: 2014-10-06 07:44 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eero Tamminen 2014-09-09 10:10:22 UTC
Currently loop unrolling is done in frontend.  This is done based on heuristics, not on actual data from backend whether and how much unrolling makes sense for given HW: will unrolling run out of registers for SIMD16, how many instructions fit into cache...

Backend should provide this kind of information for unrolling and frontend should utilize it (either before unrolling, to control how much to do it, or after unrolling to revert it).

(Frontend probably should be fixed to push loop invariant code outside of loop before devoting time to improving unrolling itself.)
Comment 1 Tapani Pälli 2014-09-09 10:44:44 UTC
FWIW there exists code to detect expressions that are constant within a loop (during loop analysis), I think this could be used as starting point for such optimization.
Comment 2 Eero Tamminen 2014-09-29 11:29:25 UTC
(In reply to comment #1)
> FWIW there exists code to detect expressions that are constant within a loop
> (during loop analysis), I think this could be used as starting point for
> such optimization.

Loop invariant code handling could also be the root cause making e.g. SynMark2 PSPhong 3x slower than it should be:
- code doesn't get unrolled with all the loop invariant code inside the loop 
- this resulting in cascade effect that causes also other problems. e.g. stuff which on Windows GL is handled with push & no sampler, changes to (unecessarily looped) pull and makes thing sampler bound with Mesa instead of it being PS ALU bound
Comment 3 Tapani Pälli 2014-10-06 07:44:29 UTC
(In reply to Eero Tamminen from comment #2)
> (In reply to comment #1)
> > FWIW there exists code to detect expressions that are constant within a loop
> > (during loop analysis), I think this could be used as starting point for
> > such optimization.
> 
> Loop invariant code handling could also be the root cause making e.g.
> SynMark2 PSPhong 3x slower than it should be:
> - code doesn't get unrolled with all the loop invariant code inside the loop 
> - this resulting in cascade effect that causes also other problems. e.g.
> stuff which on Windows GL is handled with push & no sampler, changes to
> (unecessarily looped) pull and makes thing sampler bound with Mesa instead
> of it being PS ALU bound

For me it looks like invariant code in loop is not the problem within PSPhong example, I will try to make a handwritten loop with invariant code to investigate this optimization opportunity more.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.