Bug 90895 - [IVB/HSW/BDW/BSW Bisected] GLB2.7 Egypt, GfxBench3.0 T-Rex & ALU and many SynMark cases performance reduced by 10-23%
Summary: [IVB/HSW/BDW/BSW Bisected] GLB2.7 Egypt, GfxBench3.0 T-Rex & ALU and many Syn...
Status: VERIFIED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high major
Assignee: Matt Turner
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-08 08:58 UTC by ye.tian
Modified: 2015-06-19 03:24 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg.0.log info (17.52 KB, text/plain)
2015-06-08 08:58 UTC, ye.tian
Details
dmesg info (123.38 KB, text/plain)
2015-06-08 08:58 UTC, ye.tian
Details
p.patch (800 bytes, patch)
2015-06-09 21:32 UTC, Matt Turner
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description ye.tian 2015-06-08 08:58:14 UTC
Created attachment 116360 [details]
Xorg.0.log info

System Environment:       
Platform:  BSW
Libdrm: (master)libdrm-2.4.61-13-g97be70b45eccc37e98a1cecf360593f36956ea42
Mesa:  (master)b639ed2f1b170d1184c6d94c88c826c51ffc8726
Xserver: (master)xorg-server-1.17.0-158-gfa12f2c150b2f50de9dac4a2b09265f13af353af
Xf86_video_intel: (master)2.99.917-342-g46d74edf991f315319236ba81c6e357e0cb0dddc
Cairo:(master)2cf2d8e340a325adb205baf8e4bd64e1d1858008
Libva:  (master)50d936cb527ba6da82ff9ee482f64e1ac070e1df
Libva_intel_driver: (master)d0c22df7094788a58d3282a2c25483d26a3cca54
Kernel: (drm-intel-nightly)d4f412886ec9694658ab17092c3f70569a0405f9

Bug detailed description:
--------------------------------------------------
SynMark2_v6_0_0_OglBatch0 to OglBatch4 performance reduce by 23% with xinit on BSW. It’s Mesa regression. Bisect delay...
Good :  (master)cb277cde6f2a210b0515cd04269964fd409307e9
Bad: (master)b639ed2f1b170d1184c6d94c88c826c51ffc8726

Please see Xrog.

Reproduce steps:
----------------------------
1, xinit& 
2, ./synmark2 OglBatch0
Comment 1 ye.tian 2015-06-08 08:58:34 UTC
Created attachment 116361 [details]
dmesg info
Comment 2 ye.tian 2015-06-09 06:59:42 UTC
By bisected, shows the first mesa commit is b639ed2.

commit b639ed2f1b170d1184c6d94c88c826c51ffc8726
Author:     Ben Widawsky <benjamin.widawsky@intel.com>
AuthorDate: Thu Jun 4 23:59:23 2015 -0700
Commit:     Ben Widawsky <benjamin.widawsky@intel.com>
CommitDate: Fri Jun 5 14:25:47 2015 -0700

    i965: Add gen8 fast clear perf debug

    In an ideal world I would just implement this instead of adding the perf debug.
    There are some errata involved which lead me to believe it won't be so simple as
    flipping a few bits.

    There is room to add a thing for Gen9s flexibility, but since I am actively
    working on that I have opted to ignore it.

    Example:
    Multi-LOD fast clear - giving up (256x128x8).

    v2: Use braces for if statements because they are multiple lines (Ken)

    Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Comment 3 ye.tian 2015-06-09 08:45:19 UTC
The impact of the case:
GLBenchmark V3.0 gl_alu and gl_alu_offscreen  reduce by 17%
GLBenchmark V3.0 gl_trex and gl_trex_off      reduce by 11% 
SynMark2_v6_0_0 OglDeferred  OglGeomTriList OglGeomTriStrip OglShMapVsm reduce by 10%
Synmark2_v6_0_0 OglGeomPoint  OglVSDiffuse1 OglVSDiffuse8  OglVStangent   reduce by 14%
GLbenchmark 2.7.0 EgyptHD_FixedTime        reduce by 11%
GLbenchmark 2.7.0 TRex_FixedTimeStep       reduce by 17%
Comment 4 wendy.wang 2015-06-09 08:50:00 UTC
The bad commit cause many Performance drop on IVB/HSW/BSW platforms.

The impacted cases are:
Glbenchmark v3.0 cases: gl_alu/gl_alu_offscreen/gl_trex/gl_trex_offscreen
Synmark2_v6 Batch0--Batch3

We verified Synmark OglBatch cases on BSW platform,
Bad commit Mesa has lower FPS, his parent commit show higher FPS.

Verified Glbenchmark v3.0 cases: gl_alu/gl_alu_offscreen on HSW platform:
Bad commit Mesa has lower FPS:
glb3016_gl_alu 501.9
glb3016_gl_alu_off 661.8
his parent commit show higher FPS:
glb3016_gl_alu 599.8
glb3016_gl_alu_off 752.3

good commit:
commit 77a44512d9ed56be5e53ebf09e917b5aeeba0189
Author:     Ben Widawsky <benjamin.widawsky@intel.com>
AuthorDate: Thu Jun 4 22:05:13 2015 -0700
Commit:     Ben Widawsky <benjamin.widawsky@intel.com>
CommitDate: Fri Jun 5 14:25:47 2015 -0700

    i965: Add buffer sizes to perf debug of fast clears

    When we cannot do the optimized fast clear it's important to know the buffer
    size since a small buffer will have much less performance impact.

    A follow-on patch could restrict printing the message to only certain sizes.

    Example:
    Failed to fast clear 1400x1056 depth because of scissors.  Possible 5% performance win if avoided.

    Recommended-by: Kenneth Graunke <kenneth@whitecape.org>
    Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>


Bad commit:
commit b639ed2f1b170d1184c6d94c88c826c51ffc8726
Author:     Ben Widawsky <benjamin.widawsky@intel.com>
AuthorDate: Thu Jun 4 23:59:23 2015 -0700
Commit:     Ben Widawsky <benjamin.widawsky@intel.com>
CommitDate: Fri Jun 5 14:25:47 2015 -0700

    i965: Add gen8 fast clear perf debug

    In an ideal world I would just implement this instead of adding the perf debug.
    There are some errata involved which lead me to believe it won't be so simple as
    flipping a few bits.

    There is room to add a thing for Gen9s flexibility, but since I am actively
    working on that I have opted to ignore it.

    Example:
    Multi-LOD fast clear - giving up (256x128x8).

    v2: Use braces for if statements because they are multiple lines (Ken)

    Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Comment 5 Eero Tamminen 2015-06-09 09:57:31 UTC
Looking at intel_debug.h, the second "if" in perf_debug() macro is lacking "unlikely" annotation and I wonder could either of the ifs contain the other as perf_debug is not set in many places:
$ grep 'perf_debug *=' $(find -type f)
./src/mesa/drivers/dri/i915/intel_context.c:      intel->perf_debug = true;
./src/mesa/drivers/dri/i965/brw_context.c:      brw->perf_debug = true;
./src/mesa/drivers/dri/i965/intel_debug.c:      brw->perf_debug = true;
?

Looking further into perf_debug member and INTEL_DEBUG usage, the unlikely() annotation usage is inconsistent:
$ grep '> *perf_debug.*)' $(find -type f)
...
$ grep '(.*INTEL_DEBUG.*)' $(find -type f) | grep -v unlikely

Maybe missing annotations could be added everywhere at the same time?


Seems weird that the indicated change causes such a huge performance impact though.  Performance change itself is true, we're seeing it also in addition to QA.
Comment 6 Tapani Pälli 2015-06-09 12:09:36 UTC
(In reply to Eero Tamminen from comment #5)
> Seems weird that the indicated change causes such a huge performance impact
> though.  Performance change itself is true, we're seeing it also in addition
> to QA.

Do you see this change also with IVB and HSW (like comment #4 mentions)?
Comment 7 Matt Turner 2015-06-09 21:32:26 UTC
Created attachment 116405 [details] [review]
p.patch

Try this?

Just adds unlikely() around brw->perf_debug in perf_debug()
Comment 8 ye.tian 2015-06-10 01:49:59 UTC
(In reply to Matt Turner from comment #7)
> Created attachment 116405 [details] [review] [review]
> p.patch
> 
> Try this?
> 
> Just adds unlikely() around brw->perf_debug in perf_debug()

It still low fps.
Comment 9 Eero Tamminen 2015-06-10 15:25:39 UTC
Did some quick tests on this with GfxBench ALU test, on HSW GT3e.  Weird bug...

Reverting Ben's commit fixes the regression, although on HSW the cost per frame should only be couple of ifs (for gen8) and skipping over perf output code on unused branch.

Just replacing the perf_debug() calls with puts() calls improves perf, so maybe the issue is somehow related to code inside the ifs (size of it?), although that code doesn't get executed.

Adding missing unlikely() to the if in perf_debug() macro gave similar improvement.  Additionally removing gen8 ifs didn't change perf.

Then I tested most of the indicated tests on BSW, HSW GT3e and HSW GT2. Just adding unlikely() to the second if in perf_debug() macro improved the performance several percents in the tests on all these platforms.
Comment 10 Ben Widawsky 2015-06-11 14:08:43 UTC
I'm not sure what to do for this one. I really can't justify spending too much time on this since we know a revert fixes the problem - but I really feel this is telling us something important we don't yet understand. I'm afraid if we just revert it, nobody will bother figuring out what's wrong.

Eero, do you think you could do a little more analysis on this, and if we get nowhere, we can revert?

(I was actually hoping to add some of that gen8 code anyway since it should help XCOM).
Comment 11 Matt Turner 2015-06-18 23:16:09 UTC
Missing braces.

Patch incoming.
Comment 12 Matt Turner 2015-06-18 23:46:33 UTC
Pushed as

commit 22af95af8316f2888a3935cdf774ff0997b3dd42
Author: Matt Turner <mattst88@gmail.com>
Date:   Thu Jun 18 16:14:50 2015 -0700

    i965: Add missing braces around if-statement.
    
    Fixes a performance problem caused by commit b639ed2f.
Comment 13 ye.tian 2015-06-19 03:24:30 UTC
Test it on BSW with latest mesa, this issue has go away.
So verified it.


bug/show.html.tmpl processed on Feb 19, 2017 at 11:48:27.
(provided by the Example extension).