Bug 86242

Summary: [BDW] Unigine-valley_1_0 performance reduced ~10%
Product: Mesa Reporter: wendy.wang
Component: Drivers/DRI/i965Assignee: Kenneth Graunke <kenneth>
Status: RESOLVED WONTFIX QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: low CC: christophe.prigent, eero.t.tamminen
Version: unspecifiedKeywords: bisected
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Xor.0.log

Description wendy.wang 2014-11-13 08:21:22 UTC
Created attachment 109389 [details]
Xor.0.log

Environment:
-----------------------------------
Platform:BDW GT2 E2
 Libdrm:		(master)libdrm-2.4.58-4-g00847fa48b83a85b0cb882594a12ed1511f780db
 Mesa:		(master)f7819650979d1fa5339af3eacfa1af1090bf53e8
 Xserver:		(master)xorg-server-1.16.99.901-3-g63bb5c5ef16edf652179770294dcca4fc07dc992
 Xf86_video_intel:		(master)2.99.916-127-gcc3b8a542ecb1ba873efefaeab630fa8f69b5b96
 Cairo:		(master)adbeb3d53c6c6e8ddcc63988200da4c5c9627717
 Libva:		(master)ccd93de5a707e92a629cccd595757c8d436fa3cc
 Libva_intel_driver:		(master)24cba20a119c96556ae4dc9a90043896ea70e567
 Kernel:   (drm-intel-nightly)782bafb46cc12737b16e5007583bd7b534c6202a

Bug detailed description:
---------------------------------------------
unigine-valley_1_0 performance reduced ~10% on BDW

It's Mesa regression,bisect result shows below one is the first bad commit:

commit 7423cc891b4d6fcc63bfeb79cc1d711ce81122bd
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Wed Oct 22 08:58:58 2014 -0700

    i965: Implement the PMA stall fix.

    Certain non-promoted depth cases typically incur stalls.  In very
    specific cases, we can enable a workaround which improves performance.

    Improves performance in GLBenchmark 2.7 TRex by 1.17762% +/- 0.448765%
    (n=75) at 1280x720 on Broadwell GT3.

    Haswell has this feature as well, but we can't currently write registers
    from userspace batches (and we'd incur additional software batch
    scanning overhead as well), so we haven't enabled it.  Broadwell allows
    us to write CACHE_MODE_1.  Backporters beware: the formula and flushing
    incantation differs between Haswell and Broadwell.

    v2: Move pma_stall_bits from brw->state to brw itself (requested by
        Kristian Høgsberg).

    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>



Reproduce steps:
---------------------------------------------
1.            xinit&
2.            ./ unigine-valley_1_0

Xor.0.log attached.
Comment 1 Eero Tamminen 2014-11-14 09:55:35 UTC
Tested the effect of just this particular patch, on BDW GT2, for all GLBenchmark / GfxBench, GpuTest, Unigine demos and SynMark test-cases (3 runs each):

-6% Unigine Valley (fullscreen GL)
 0% GLB 3.0 T-Rex (fullscreen GL)
+2% GLB 2.7 Egypt (windowed GLES)
+4% GLB 2.7 T-Rex (windowed GLES)

On other tests there was no effect that wouldn't have been within their normal variances.


Wendy, there have been mentions of PMA issue having much larger benefits (on HSW) in Valve games and Lightsmark.  Could you mail me information about the performance before and after the indicated Mesa commit in Lightsmark and all the games?

NOTE: You need to do that test with the fixed X intel driver (which came after this commit), so that that issue doesn't distort the numbers.
Comment 2 wendy.wang 2014-11-18 06:10:12 UTC
Test base on Eero's suggestion, got below compare result on BDW GT2 E2:
parents of bad commit	FPS	after of bad commit 	FPS	(Before-After)/Before
doom3 50.1	50.1	doom3 49.5	49.5	1.37%
etqw_1_10 36.5	36.5	etqw_1_10 36.0	36	0.90%
etqw-demo 33.2	33.2	etqw-demo 32.9	32.9	0.78%
lightsmark 124.85	124.85	lightsmark 123.87	123.87	1.67%
openarena 35.9	35.9	openarena 35.3	35.3	2.83%
padman 151.9	151.9	padman 147.6	147.6	-1.16%
smokin-guns 129.5	129.5	smokin-guns 131.0	131	0.52%
urbanterror 96.9	96.9	urbanterror 96.4	96.4	1.58%
warsow01 126.7	126.7	warsow01 124.7	124.7	1.23%
xonotic07 166.31	166.31	xonotic07 164.26	164.26	-0.22%
cs 77.34	77.34	cs 77.51	77.51	-1.04%
tf2 44.17	44.17	tf2 44.63	44.63	-1.33%
hl2 88.55	88.55	hl2 89.73	89.73	0.08%
portal 63.48	63.48	portal 63.43	63.43	1.62%
unigine-heaven_4_0 6.18	6.18	unigine-heaven_4_0 6.08	6.08	8.27%
unigine-vally_1_0 5.44	5.44	unigine-vally_1_0 4.99	4.99	8.27%


Bad commit:
 commit 7423cc891b4d6fcc63bfeb79cc1d711ce81122bd
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Wed Oct 22 08:58:58 2014 -0700

    i965: Implement the PMA stall fix.

Parents of bad commit: 
commit 8ccf54ab098032da4652b314761c04f7724a7277
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Wed Oct 22 08:58:57 2014 -0700

    i965: Add #defines for Broadwell HiZ workarounds in CACHE_MODE_1.

    This patch adds macros needed for the HiZ PMA stall optimization.

Signed-off-by: Kenneth Graunke kenneth@whitecape.org


After bad commit:
commit 6107557f8fa34e0b7191813792be43eaa03aed19
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Wed Oct 22 08:58:59 2014 -0700

    i965: Re-enable Z16 on Gen8+.

    Improves performance in GLBenchmark 2.7 TRex by 3.88889% +/- 0.336383%
    (n=80) at 1280x720 on Broadwell GT3.  Together with the previous patch,
    it improves performance by 5.42738% +/- 0.541971% (n=10) at 1920x1080.

    Note that without the PMA stall fix, this would instead decrease
    performance by 22%.

    v2: Update comment (noticed by Kristian Høgsberg).

Signed-off-by: Kenneth Graunke kenneth@whitecape.org


Fixed other gfx sw driver to below list 2014-11-17 
Libdrm:		(master)libdrm-2.4.58-4-g00847fa48b83a85b0cb882594a12ed1511f780db
 Xserver:		(master)xorg-server-1.16.99.901-3-g63bb5c5ef16edf652179770294dcca4fc07dc992
 Xf86_video_intel:		(master)2.99.916-145-g6c2707d7bbc0ebb422be66618b6f78887c46446e
 Cairo:		(master)121f384c0e231c9c5d9c937b216d342bfc7810a6
 Libva:		(master)f9309a6f44b51bb2c463a6a16d3ccf3edc6e6c7a
 Libva_intel_driver:		(master)8e34fb34ed402811e512f9d41b14345f3795bac5
 Kernel:   (drm-intel-nightly)e49ebf9ed863e9522260ebd7bd0338ef5641c0e6
Comment 3 Kenneth Graunke 2014-11-18 09:04:43 UTC
(In reply to Eero Tamminen from comment #1)
> Wendy, there have been mentions of PMA issue having much larger benefits (on
> HSW) in Valve games and Lightsmark.  Could you mail me information about the
> performance before and after the indicated Mesa commit in Lightsmark and all
> the games?

Eero, I never actually tested PMA on Lightsmark or Valve games...I wouldn't expect it to make a big impact.  It sounds like you're thinking of the shadow_c performance fix, which is different.

I'm okay with reverting the PMA stuff if it's a loss overall.  I'm kind of surprised though...

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.