Bug 86140

Summary: [BYT, HSW, BDW] SynMark2_v6_0_0_OglDrvShComp performance reduced ~60%
Product: Mesa Reporter: wendy.wang
Component: Drivers/DRI/i965Assignee: Intel 3D Bugs Mailing List <intel-3d-bugs>
Status: RESOLVED WONTFIX QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: christophe.prigent, eero.t.tamminen, valtteri.rantala
Version: unspecifiedKeywords: bisected
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Xorg.0.log
Callgraph of where Mesa shader compiler spends its time
Callee map of where Mesa shader compiler spends its time
callgraph zoomed to fsvisitor CPU usage
callgraph zoomed to common optimizations CPU usage

Description wendy.wang 2014-11-11 09:19:56 UTC
Created attachment 109264 [details]
Xorg.0.log

Environment:
-----------------------------------
Platform:BDW
Libdrm:                 (master)libdrm-2.4.58-4-g00847fa48b83a85b0cb882594a12ed1511f780db
Mesa:                  (master)f3b709c0ac073cd0ec90a3a0d91d1ee94668e043
Xserver:                              (master)xorg-server-1.16.99.901-3-g63bb5c5ef16edf652179770294dcca4fc07dc992
Xf86_video_intel:                           (master)2.99.916-139-ge96520327bd2ef4fbc7b7b5169a17b966d9f85f3
Cairo:                   (master)a03f2ff72054c9530f98738aac729354a3f56102
Libva:                   (master)ccd93de5a707e92a629cccd595757c8d436fa3cc
Libva_intel_driver:                         (master)24cba20a119c96556ae4dc9a90043896ea70e567
Kernel:   (drm-intel-nightly)b921a5e434f541c6d96378838df47b4259cc3489


Bug detailed description:
---------------------------------------------
SynMark2_v6_0_0_OglDrvShComp ~60% 
Issue happened on BYT/HSW/BDW

It's Mesa regression,bisect result show first bad commit as below:

Commit: a16ca4ac6a356e02c6aa03c1e305f613a4e23202
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Wed Oct 29 20:56:07 2014 -0700

    glsl: Skip loop-too-large heuristic if indexing arrays of a certain size

    A pattern in certain shaders is:

       uniform vec4 colors[NUM_LIGHTS];

       for (int i = 0; i < NUM_LIGHTS; i++) {
          ...use colors[i]...
       }

    In this case, the application author expects the shader compiler to
    unroll the loop.  By doing so, it replaces variable indexing of the
    array with constant indexing, which is more efficient.

    This patch extends the heuristic to see if arrays accessed within the
    loop are indexed by an induction variable, and if the array size exactly
    matches the number of loop iterations.  If so, the application author
    probably intended us to unroll it.  If not, we rely on the existing
    loop-too-large heuristic.

    Improves performance in a phong shading microbenchmark by 2.88x, and a
    shadow mapping microbenchmark by 1.63x.  Without variable indexing, we
    can upload the small uniform arrays as push constants instead of pull
    constants, avoiding shader memory access.  Affects several games, but
    doesn't appear to impact their performance.

    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Matt Turner <mattst88@gmail.com>
    Acked-by: Kristian Høgsberg <krh@bitplanet.net>



Reproduce steps:
---------------------------------------------
1.            xinit&
2.            ./ Synmark2 OglDrvShComp


Xorg.0.log file attached.
Comment 1 Eero Tamminen 2014-11-11 12:05:05 UTC
Shader compiler compilation speed regression is expected, as with that commit it does more work (frontend for the commit logic and backed for code that got unrolled).  I.e. This commit is not going to be revered.

However, compiler is now too slow, so it needs to be optimized.
Comment 2 Eero Tamminen 2014-11-11 12:05:27 UTC
not s/revered/reverted/
Comment 3 Eero Tamminen 2014-11-17 15:30:49 UTC
Created attachment 109627 [details]
Callgraph of where Mesa shader compiler spends its time
Comment 4 Eero Tamminen 2014-11-17 15:33:50 UTC
Created attachment 109628 [details]
Callee map of where Mesa shader compiler spends its time
Comment 5 Eero Tamminen 2014-11-17 15:34:47 UTC
Created attachment 109629 [details]
callgraph zoomed to fsvisitor CPU usage
Comment 6 Eero Tamminen 2014-11-17 15:35:23 UTC
Created attachment 109630 [details]
callgraph zoomed to common optimizations CPU usage
Comment 7 shuo.wang 2014-11-26 02:55:55 UTC
This bug is also exist on Mesa10.4RC1
Comment 8 Matt Turner 2016-11-02 06:34:31 UTC
Possibly inevitable.
Comment 9 Eero Tamminen 2016-11-02 14:48:29 UTC
After indicated commit, there has been further slowdowns as compiler does more optimizations, but there has been also also occasional (smaller) improvements.

I'm most worried about compilation speed for spilled shaders, DrvShComp includes one of those (from ShMapPcf subtest).
Comment 10 Eero Tamminen 2017-09-08 07:42:46 UTC
I'm closing this bug as WONTFIX as it's obsolete.

* There have been lot of additional shader optimizations which have slowed down shader compilation more
* Loop unrolling has moved from GLSL to NIR, which sped it up a lot, *and* made the attached profiling data invalid
* After that there have been additional additional speedups too
* We've switched from SynMark v6 to v7, which made the tests included into this composite test heavier

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.