Bug 88103 - [BSW Bisected] RC6 timeout mode patch causes random GPU freq drops and up to -40% performance regressions
Summary: [BSW Bisected] RC6 timeout mode patch causes random GPU freq drops and up to ...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: highest major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-06 11:26 UTC by wendy.wang
Modified: 2017-10-06 14:32 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg_taiji.txt (124.87 KB, text/plain)
2015-01-06 11:26 UTC, wendy.wang
no flags Details

Description wendy.wang 2015-01-06 11:26:12 UTC
Created attachment 111843 [details]
dmesg_taiji.txt

System Environment:
--------------------------
Platform: BSW
OS Distribution: Ubuntu 14.04

Description:
--------------------------
Run GLES 3DMMES 2.0 taiji case, the bad commit cause taiji case FPS decreased by ~40%.

Regression:
--------------------------
Yes, bisected on drm-intel-next-queued branch:

First bad commit is 
commit 5a0afd4b78ec23f27f5d486ac3d102c2e8d66bd7 
Author: Deepak S <deepak.s@linux.intel.com>
Date:   Sat Dec 13 11:43:27 2014 +0530

    drm/i915/chv: Use timeout mode for RC6 on chv

    Higher RC6 residency is observed using timeout mode
    instead of EI mode. It's Recommended to use TO Method for RC6.

    v2: Add comment about timeout threshold. (Tom)

    Signed-off-by: Deepak S <deepak.s@linux.intel.com>
    Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Reproduce steps:
1. xinit &
2. ./3DMMES2_taiji.sh
Comment 1 Eero Tamminen 2015-01-07 07:57:54 UTC
What other tests this particular commit affected?
Comment 2 wendy.wang 2015-01-08 08:32:02 UTC
(In reply to Eero Tamminen from comment #1)
> What other tests this particular commit affected?

We are investigating, will update on 8th Jan.
Comment 3 zhipeng.Zheng 2015-01-09 08:13:25 UTC
Test with below cases:


lightsmark 
openarena 
warsow01 
GpuTest_v05_triangle_Windowed 
cs 
portal  
hoverjet 
TRex_FixedTimeStepOffscreen 
FillTestC24Z16_Offscreen 
manhattan
Comment 4 zhipeng.Zheng 2015-01-12 01:24:34 UTC
1.We found the following case happen regression because of bad commit
2.listed regression percentage

Test with below cases:


lightsmark   -13%
openarena    -33%
warsow01     -9%
cs           -28%
portal       -28%
hoverjet      -16%
TRex_FixedTimeStepOffscreen      -22%
FillTestC24Z16_Offscreen         -12%
Comment 5 Rodrigo Vivi 2015-01-12 21:16:23 UTC
http://lists.freedesktop.org/archives/intel-gfx/2015-January/058300.html

Could you please test the revert?
Also, I'd like to see the difference of power savings measure with and without this revert.
Comment 6 wendy.wang 2015-01-14 07:49:11 UTC
Tested with revert commit, all the impacted cases can recovered to normal performance FPS values

lightsmark    -13%
openarena     -33%
warsow01      -9%
cs            -28%
portal        -28%
hoverjet      -16%
TRex_FixedTimeStepOffscreen      -22%
FillTestC24Z16_Offscreen         -12%
unigine-valley_1_0                -17%
EgyptHD_FixedTimeOffscreen       -14%
manhattan                        -22%
OglDrvCtx_6 60.54                -24%

But the bad commit is also very important which was to fix RC6 residency issue.

So we need developer comments to see if these case perf regression required revert the bad commit.
Comment 7 Eero Tamminen 2015-01-16 08:25:33 UTC
This seems to cause GPU frequency to fluctuate randomly.   Some test-runs go at full speed, some fluctuate, for test cases that are fully GPU bound.

I've also seen the effect in several other tests (e.g. Unigine Valley and SynMark Deferred).

This kind of behavior breaks performance testing so *it needs to be reverted* until GPU freq issue if fixed.
Comment 9 Rodrigo Vivi 2015-01-19 19:23:34 UTC
Wendy, could you please try Ville's patch and check if it fix performance and still allows RC6 good residency?

Otherwise I'll continue the patch I had started here to change mode and treshold according to idle and boost requests...
Comment 10 wendy.wang 2015-01-20 05:38:38 UTC
Verified Ville's patch on BSW B1+ FAB2 RVP

1. RC6 residency are good:
RC6 residency when idle: 100%
RC6 residency when media workload: 79.7%

2. Previous impacted performance FPS also improved except OglDrvCtx case , Below percentage is for Ville's fix patch vs. first bad commit 

lightsmark 	28.00%
openarena 	2.60%
warsow01 	6.55%
unigine-vally_1_0 	26.44%
cs 	               44.76%
portal 	               58.72%
3DMMES2_taiji 	       -2.76%
3DMMES2_hoverjet 	-1.87%
glb270_TRex_FixedTimeStepOffscreen 	2.78%
glb270_EgyptHD_FixedTimeOffscreen 	-1.54%
glb270_FillTestC24Z16_Offscreen 	4.17%
glb3016_manhattan 	33.33%
OglDrvCtx_6 	        -9.96%
OglVSInstancing_6 	-0.21%
Comment 11 Jani Nikula 2015-01-21 17:54:18 UTC
commit eca5aff51ea6c4d9b4936d94c61707cee4b12902
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Mon Jan 19 13:50:49 2015 +0200

    drm/i915: Configure GEN6_RP_DOWN_TIMEOUT on CHV
Comment 12 wendy.wang 2015-01-28 05:41:40 UTC
Bug has been verified as PASS using latest -nightly branch kernel.
Comment 13 Elizabeth 2017-10-06 14:32:32 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.