Bug 101064 - 2-3% performance drop in GpuTest v0.7 FurMark with "nir/copy_prop: Respect the source's number of components"
Summary: 2-3% performance drop in GpuTest v0.7 FurMark with "nir/copy_prop: Respect th...
Status: VERIFIED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-16 16:22 UTC by Eero Tamminen
Modified: 2019-07-12 11:54 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eero Tamminen 2017-05-16 16:22:54 UTC
This commit regressed GpuTest v0.7 FurMark performance by 2-3% on GEN9+:
--------------------------------------------------------------------
commit 3c312be7b3e95ec7540e98abed1b6f3cc8d31b2a
Author:     Jason Ekstrand <jason.ekstrand@intel.com>
AuthorDate: Thu Mar 2 17:10:24 2017 -0800
Commit:     Jason Ekstrand <jason.ekstrand@intel.com>
CommitDate: Tue Mar 14 07:36:20 2017 -0700

    nir/copy_prop: Respect the source's number of components
    
    In the near future we are going to require that the num_components in a
    src dereference match the num_components of the SSA value being
    dereferenced.  To do that, we need copy_prop to not remove our MOVs from
    a larger SSA value into an instruction that uses fewer channels.
    
    Because we suddenly have to know how many components each source has,
    this makes the pass a bit more complicated.  Fortunately, copy
    propagation is the only pass that cares about the number of components
    are read by any given source so it's fairly contained.
    
    Shader-db results on Sky Lake:
    
       total instructions in shared programs: 13318947 -> 13320265 (0.01%)
       instructions in affected programs: 260633 -> 261951 (0.51%)
       helped: 324
       HURT: 1027
--------------------------------------------------------------------

On few of the GEN8+ platforms, that commit also:
* Regressed SynMark v7 PSBump2 performance by ~1%
* Increased SynMark v7 PSBump8 performance by ~1%


Timothy's commits 2 months later:
* i965: remove GLSL IR optimisation loop
* nir: shuffle constants to the top

Compensated for the FurMark drop (SynMark tests were not affected by them).

If those commits fixed what this regressed, this bug should be closed.  If they improved something else, it may be better to keep this open.

(Sorry that it took so long to bisect this, the change was too small to be bisected on devices with slightly larger variance.)
Comment 1 Eero Tamminen 2019-04-16 12:50:28 UTC
(In reply to Eero Tamminen from comment #0)
> This commit regressed GpuTest v0.7 FurMark performance by 2-3% on GEN9+:
> --------------------------------------------------------------------
> commit 3c312be7b3e95ec7540e98abed1b6f3cc8d31b2a
> Author:     Jason Ekstrand <jason.ekstrand@intel.com>
> AuthorDate: Thu Mar 2 17:10:24 2017 -0800
> Commit:     Jason Ekstrand <jason.ekstrand@intel.com>
> CommitDate: Tue Mar 14 07:36:20 2017 -0700
> 
>     nir/copy_prop: Respect the source's number of components
>     
>     In the near future we are going to require that the num_components in a
>     src dereference match the num_components of the SSA value being
>     dereferenced.  To do that, we need copy_prop to not remove our MOVs from
>     a larger SSA value into an instruction that uses fewer channels.
>     
>     Because we suddenly have to know how many components each source has,
>     this makes the pass a bit more complicated.  Fortunately, copy
>     propagation is the only pass that cares about the number of components
>     are read by any given source so it's fairly contained.
>     
>     Shader-db results on Sky Lake:
>     
>        total instructions in shared programs: 13318947 -> 13320265 (0.01%)
>        instructions in affected programs: 260633 -> 261951 (0.51%)
>        helped: 324
>        HURT: 1027
> --------------------------------------------------------------------
> 
> On few of the GEN8+ platforms, that commit also:
> * Regressed SynMark v7 PSBump2 performance by ~1%
> * Increased SynMark v7 PSBump8 performance by ~1%
> 
> 
> Timothy's commits 2 months later:
> * i965: remove GLSL IR optimisation loop
> * nir: shuffle constants to the top
> 
> Compensated for the FurMark drop (SynMark tests were not affected by them).
> 
> If those commits fixed what this regressed, this bug should be closed.  If
> they improved something else, it may be better to keep this open.

There has been lately further improvements to FurMark perf, so current performance is higher than the originally regressed one.

Is the original regression still valid or should this be closed as fixed?
Comment 2 Eero Tamminen 2019-07-12 11:54:24 UTC
(In reply to Eero Tamminen from comment #1)
> (In reply to Eero Tamminen from comment #0)
> > This commit regressed GpuTest v0.7 FurMark performance by 2-3% on GEN9+:
...
> > Timothy's commits 2 months later:
> > * i965: remove GLSL IR optimisation loop
> > * nir: shuffle constants to the top
> > 
> > Compensated for the FurMark drop (SynMark tests were not affected by them).
> > 
> > If those commits fixed what this regressed, this bug should be closed.  If
> > they improved something else, it may be better to keep this open.
> 
> There has been lately further improvements to FurMark perf, so current
> performance is higher than the originally regressed one.
> 
> Is the original regression still valid or should this be closed as fixed?

Marking this 2017 bug as FIXED.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.