Bug 90112 - [BSW bisected] OglGSCloth/Lightsmark/CS/ Portal/ Half Life 2 games performance decreased by 15%-45%
Summary: [BSW bisected] OglGSCloth/Lightsmark/CS/ Portal/ Half Life 2 games performanc...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high major
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords: bisected, regression
Depends on:
Blocks:
 
Reported: 2015-04-20 07:25 UTC by Ding Heng
Modified: 2017-06-27 15:50 UTC (History)
3 users (show)

See Also:
i915 platform: BSW/CHT
i915 features: display/atomic


Attachments
dmesg (123.46 KB, text/plain)
2015-04-20 07:25 UTC, Ding Heng
no flags Details
xorg log (14.92 KB, text/plain)
2015-04-20 07:26 UTC, Ding Heng
no flags Details
Don't downclock if clients are waiting for GPU results (1.22 KB, patch)
2015-04-20 07:35 UTC, Chris Wilson
no flags Details | Splinter Review
Use infinite wait instead of set-domain for explicit throttling (1.36 KB, patch)
2015-04-21 08:05 UTC, Chris Wilson
no flags Details | Splinter Review
Use coarse throttling first (1.59 KB, patch)
2015-04-21 08:06 UTC, Chris Wilson
no flags Details | Splinter Review
Always apply RPS boosts for severely delayed work (2.95 KB, patch)
2015-04-21 08:50 UTC, Chris Wilson
no flags Details | Splinter Review
rej files when install patch (968 bytes, text/plain)
2015-04-22 05:16 UTC, Ding Heng
no flags Details
call trace dmesg (3.35 KB, text/plain)
2015-05-05 02:32 UTC, Ding Heng
no flags Details
trace.bz2 (3.02 KB, text/plain)
2015-05-06 07:43 UTC, Ding Heng
no flags Details
output.txt (5.11 KB, text/plain)
2015-05-06 07:44 UTC, Ding Heng
no flags Details
dmesg (123.44 KB, text/plain)
2015-05-06 07:44 UTC, Ding Heng
no flags Details
dmesg_0509 (55.43 KB, text/plain)
2015-05-11 08:27 UTC, Ding Heng
no flags Details
xorg log 0509 (19.83 KB, text/plain)
2015-05-11 08:27 UTC, Ding Heng
no flags Details

Description Ding Heng 2015-04-20 07:25:42 UTC
Created attachment 115206 [details]
dmesg

System Environment:
--------------------------
Regression: yes

Platform: BSW
kernel: drm-intel-nightly: 5ea91de4ff45adb60031853d64314c3405378fbd(2015-04-15)

Bisect result 

1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e is the first bad commit commit 1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 7 16:20:32 2015 +0100

    drm/i915: Deminish contribution of wait-boosting from clients

    With boosting for missed pageflips, we have a much stronger indication
    of when we need to (temporarily) boost GPU frequency to ensure smooth
    delivery of frames. So now only allow each client to perform one RPS boost
    in each period of GPU activity due to stalling on results.

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Deepak S <deepak.s@linux.intel.com>
    Reviewed-by: Deepak S <deepak.s@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 1712eed4c716325d29cf054c64636c06a8e5fb14 ff5070b8d3d083464faa37a4712e2a9b3084fb24 M drivers 


Bug detailed description:
-----------------------------

lightsmark performance decreased by about 15-20%.


Reproduce steps:
---------------------------

run lightsmark case and check the performance difference.
Comment 1 Ding Heng 2015-04-20 07:26:01 UTC
Created attachment 115207 [details]
xorg log
Comment 2 Chris Wilson 2015-04-20 07:35:45 UTC
Created attachment 115208 [details] [review]
Don't downclock if clients are waiting for GPU results

Please try this.
Comment 3 Chris Wilson 2015-04-20 08:48:41 UTC
The other aspect to be aware of is that the RPS selection is obviously suboptimal for this workload on BSW.
Comment 4 Ding Heng 2015-04-21 07:10:55 UTC
(In reply to Chris Wilson from comment #2)
> Created attachment 115208 [details] [review] [review]
> Don't downclock if clients are waiting for GPU results
> 
> Please try this.

I installed this patch on nightly-2015-04-15 d600654ab94b325f253e267422dcf60302120ea0 and the result seems not stable, I run this case 3 times but only 1 result is near the expect result.
Comment 5 Chris Wilson 2015-04-21 08:05:21 UTC
Created attachment 115242 [details] [review]
Use infinite wait instead of set-domain for explicit throttling
Comment 6 Chris Wilson 2015-04-21 08:06:20 UTC
Created attachment 115243 [details] [review]
Use coarse throttling first

This patch should work around the change in behaviour for very, very slow render clients. But it would be more interesting to measure the impact of the libdrm patch first.
Comment 7 Chris Wilson 2015-04-21 08:10:34 UTC
(In reply to Chris Wilson from comment #6)
> Created attachment 115243 [details] [review] [review]
> Use coarse throttling first
> 
> This patch should work around the change in behaviour for very, very slow
> render clients. But it would be more interesting to measure the impact of
> the libdrm patch first.

Also requires

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c
index 5a9207a..4dc54e5 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -174,8 +174,10 @@ intel_dri2_flush_with_flags(__DRIcontext *cPriv,
    if (flags & __DRI2_FLUSH_DRAWABLE)
       intel_resolve_for_dri2_flush(brw, dPriv);
 
-   if (reason == __DRI2_THROTTLE_SWAPBUFFER)
+   if (reason == __DRI2_THROTTLE_SWAPBUFFER) {
+      brw->need_flush_throttle = true;
       brw->need_swap_throttle = true;
+   }
    if (reason == __DRI2_THROTTLE_FLUSHFRONT)
       brw->need_flush_throttle = true;
Comment 8 Chris Wilson 2015-04-21 08:50:52 UTC
Created attachment 115245 [details] [review]
Always apply RPS boosts for severely delayed work

This should do the same as the mesa patch with less fuss.
Comment 9 Ding Heng 2015-04-22 05:15:37 UTC
(In reply to Chris Wilson from comment #8)
> Created attachment 115245 [details] [review] [review]
> Always apply RPS boosts for severely delayed work
> 
> This should do the same as the mesa patch with less fuss.

I see line-through  attachment 115243 [details] [review], do I need this patch still, or just install the 3 patches you left in this page and the patch you mentioned in comment 7? Besides, patch in comment 8 installed fail on latest nightly branch.
Comment 10 Ding Heng 2015-04-22 05:16:56 UTC
Created attachment 115258 [details]
rej files when install patch
Comment 11 wendy.wang 2015-04-23 05:43:51 UTC
Because of the 1st bad commit:
Lightsmark v2008 perf dropped by 16%
CS game perf dropped by 36%
Half life2 perf dropped by 45%
Portal game perf dropped by 28%
Comment 12 Chris Wilson 2015-04-23 13:33:29 UTC
I've put the patches up to and including the RPS boost for laggards at:

http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=ac4c854260bc4c9117733c48d442d550a9e15036
Comment 13 Chris Wilson 2015-04-27 08:05:54 UTC
Updated patches at http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=bug90137
Comment 14 Ding Heng 2015-04-28 06:04:28 UTC
(In reply to Chris Wilson from comment #13)
> Updated patches at
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=bug90137

I failed to install this patch on latest nightly branch. So does the patch in comment 12. I also tried to modify the code refer to your patch, but some of the variable or struct could not be found in the latest code.
Comment 15 Chris Wilson 2015-04-28 06:52:48 UTC
(In reply to Ding Heng from comment #14)
> (In reply to Chris Wilson from comment #13)
> > Updated patches at
> > http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=bug90137
> 
> I failed to install this patch on latest nightly branch. So does the patch
> in comment 12. I also tried to modify the code refer to your patch, but some
> of the variable or struct could not be found in the latest code.

It's not a single patch, but a branch.
Comment 16 Ding Heng 2015-04-28 08:21:58 UTC
(In reply to Chris Wilson from comment #15)
> (In reply to Ding Heng from comment #14)
> > (In reply to Chris Wilson from comment #13)
> > > Updated patches at
> > > http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=bug90137
> > 
> > I failed to install this patch on latest nightly branch. So does the patch
> > in comment 12. I also tried to modify the code refer to your patch, but some
> > of the variable or struct could not be found in the latest code.
> 
> It's not a single patch, but a branch.

What's this branch name? How could I verify this bug with your patch?
Comment 17 Chris Wilson 2015-04-28 09:04:20 UTC
http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug90112

git://people.freedesktop.org/~ickle/linux-2.6 bug90112
Comment 18 Ding Heng 2015-04-29 07:54:11 UTC
(In reply to Chris Wilson from comment #17)
> http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug90112
> 
> git://people.freedesktop.org/~ickle/linux-2.6 bug90112

I downloaded this branch compiled it and install your patch for mesa and libdrm, there was still no performance increase.
Comment 19 Chris Wilson 2015-04-29 08:18:35 UTC
Bleh. Do you have a graph of GPU frequency for the run?

Try "trace-cmd record -e i915 ./benchmark; trace-cmd report | bz2 > trace.bz2" and attach the trace.bz2.
Comment 20 Eero Tamminen 2015-04-29 14:38:07 UTC
kernel Pstate driver "powersave" governor can currently do funky stuff (switch tasks from high freq core to low freq core for no apparent reason) on BSW for workloads that are both CPU & GPU bound and TDP limited, like I think the indicated Source engine games are.  And CPU running at low speed can cause also GPU to run at low speed.

Unless one is tracking both GPU & CPU frequencies and task migration in tests, it might be better to check these kind of optimizations first with:
- test being fixed on single core with "taskset" command, and/or
- both CPU & GPU being fixed to (non-turbo) speed (in BSW C0 case, one could try e.g. 1.5Ghz for CPU and 500Mhz for GPU)
Comment 21 Ding Heng 2015-05-04 09:04:48 UTC
(In reply to Chris Wilson from comment #19)
> Bleh. Do you have a graph of GPU frequency for the run?
> 
> Try "trace-cmd record -e i915 ./benchmark; trace-cmd report | bz2 >
> trace.bz2" and attach the trace.bz2.


"benchmark" in this command means the command to reproduce  this issue, right? I tried this and found this command will cause call trace.
Comment 22 Chris Wilson 2015-05-04 12:54:47 UTC
(In reply to Ding Heng from comment #21)
> "benchmark" in this command means the command to reproduce  this issue,
> right? I tried this and found this command will cause call trace.

Right, but what call trace?
Comment 23 Ding Heng 2015-05-05 02:32:19 UTC
Created attachment 115538 [details]
call trace dmesg

seems the dmesg before call trace has been cleared. I can't get more  than this.
Comment 24 Chris Wilson 2015-05-05 08:20:41 UTC
(In reply to Ding Heng from comment #23)
> Created attachment 115538 [details]
> call trace dmesg
> 
> seems the dmesg before call trace has been cleared. I can't get more  than
> this.

http://patchwork.freedesktop.org/patch/48529/
Comment 25 Eero Tamminen 2015-05-05 14:34:07 UTC
According to Wendy, drop in SynMark GSCloth test on *BYT* is also due to this change.
Comment 26 Chris Wilson 2015-05-05 19:53:16 UTC
(In reply to Eero Tamminen from comment #25)
> According to Wendy, drop in SynMark GSCloth test on *BYT* is also due to
> this change.

At least that is one I can test. In all honesty, it just means that we were reliant on the waitboost mechanism too much i.e. we were not submitting work fast enough to keep the GPU busy enough to maintain high clocks.
Comment 27 Ding Heng 2015-05-06 07:43:10 UTC
Created attachment 115581 [details]
trace.bz2

call trace still exist, please refer to the latest dmesg. Output.txt shows the outpput of the command.
Comment 28 Ding Heng 2015-05-06 07:44:17 UTC
Created attachment 115582 [details]
output.txt
Comment 29 Ding Heng 2015-05-06 07:44:39 UTC
Created attachment 115583 [details]
dmesg
Comment 30 Chris Wilson 2015-05-06 08:17:40 UTC
(In reply to Ding Heng from comment #27)
> Created attachment 115581 [details]
> trace.bz2
> 
> call trace still exist, please refer to the latest dmesg. Output.txt shows
> the outpput of the command.

The calltraces are noise from modesetting errors, shouldn't be impacting the benchmark.

You managed to bzip the output of running trace-cmd on the benchmark and not the output of "trace-cmd report"
Comment 31 Chris Wilson 2015-05-06 19:59:14 UTC
Ok, I have seen an interesting drop on byt with OglGSCloth. First look says it is not a GPU frequency issue - coarse sampling of the frequency implies that it remains throughout the test. But the render %busy along with completion interrupts are both higher for the preceding commit, confirming the higher throughput measured by the test.

Have test system, I can dig.
Comment 32 Chris Wilson 2015-05-07 15:21:03 UTC
Finally! It's mutex contention on the rps.hw_lock.
Comment 33 Chris Wilson 2015-05-08 16:54:00 UTC
New patches pushed to git://people.freedesktop.org/~ickle/linux-2.6 branch bug90112 (http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug90112)
Comment 34 Ding Heng 2015-05-11 08:26:55 UTC
(In reply to Chris Wilson from comment #33)
> New patches pushed to git://people.freedesktop.org/~ickle/linux-2.6 branch
> bug90112 (http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug90112)

still not performance increase. Please refer to the latest Xorg log and dmesg.
Comment 35 Ding Heng 2015-05-11 08:27:26 UTC
Created attachment 115691 [details]
dmesg_0509
Comment 36 Ding Heng 2015-05-11 08:27:48 UTC
Created attachment 115692 [details]
xorg log 0509
Comment 37 Chris Wilson 2015-06-01 13:37:49 UTC
The issue I was able to reproduce on BYT should be fixed in -nightly. So please confirm, and then test BSW.
Comment 38 Ding Heng 2015-06-08 06:53:09 UTC
(In reply to Chris Wilson from comment #37)
> The issue I was able to reproduce on BYT should be fixed in -nightly. So
> please confirm, and then test BSW.

Which case did you use to verify this issue? What's the result? I didn't see performance increase with latest kernel. For example, the result is still 39FPS when I test with lightsmark, while it was about 47FPS before the first bad commit.
Comment 39 Chris Wilson 2015-06-08 07:48:50 UTC
byt OglGSCloth

Note that chv doesn't use the full RPS autotuning. You can try

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 5eed3caba483..ef733d164cec 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4120,7 +4120,7 @@ static bool valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val)
                if (vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val))
                        return false;
 
-               if (!IS_CHERRYVIEW(dev_priv))
+               if (1)
                        gen6_set_rps_thresholds(dev_priv, val);
 
                dev_priv->rps.cur_freq = val;

and see if that makes any difference
Comment 40 Ding Heng 2015-06-09 02:50:27 UTC
(In reply to Chris Wilson from comment #39)
> byt OglGSCloth
> 
> Note that chv doesn't use the full RPS autotuning. You can try
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> index 5eed3caba483..ef733d164cec 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -4120,7 +4120,7 @@ static bool valleyview_set_rps(struct drm_i915_private
> *dev_priv, u8 val)
>                 if (vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val))
>                         return false;
>  
> -               if (!IS_CHERRYVIEW(dev_priv))
> +               if (1)
>                         gen6_set_rps_thresholds(dev_priv, val);
>  
>                 dev_priv->rps.cur_freq = val;
> 
> and see if that makes any difference

I can see OglGSCloth performance increased by about 10% on BYT. But lightsmark performance is still lower than before on BSW.
Comment 41 Eero Tamminen 2015-06-10 15:32:10 UTC
On BSW, it's probably better to do testing of this kind of issues also with ACPI ondemand governor, in case issue is related to process scheduler / power management.

Please test performance of the problematic commit and commit preceeding it after booting kernel with following kernel bootup option: "intel_pstate=disable".  How large the BSW perf difference is with that configuration?
Comment 42 Ding Heng 2015-06-11 06:28:32 UTC
(In reply to Eero Tamminen from comment #41)
> On BSW, it's probably better to do testing of this kind of issues also with
> ACPI ondemand governor, in case issue is related to process scheduler /
> power management.
> 
> Please test performance of the problematic commit and commit preceeding it
> after booting kernel with following kernel bootup option:
> "intel_pstate=disable".  How large the BSW perf difference is with that
> configuration?

On BSW, there is still about 17% performance difference between the first bad commit and its parent commit. Seems adding intel_pstate=disable in kernel option didn't make any difference.
Comment 43 Eero Tamminen 2015-06-11 11:28:13 UTC
(In reply to Ding Heng from comment #42)
> On BSW, there is still about 17% performance difference between the first
> bad commit and its parent commit. Seems adding intel_pstate=disable in
> kernel option didn't make any difference.

In which test-case?  Comment 11 states that difference in HL2 was 45%...
Comment 44 Humberto Israel Perez Rodriguez 2015-08-11 16:31:48 UTC
Hi Ding Heng, could you please provide information to last comment
Thanks
Comment 45 wendy.wang 2015-08-12 08:31:49 UTC
GFX QA has been transfer to France, Dinghengx has moved out from this project.
Wendy temporarily backup gfx performance before France take up gfx performance testing, and wendy try to update this bug tomorrow after retest the intel_pstate=disable parameter.
Comment 46 wendy.wang 2015-08-14 08:34:19 UTC
Add intel_pstate=disable parameter does not fix this bug on BSW
test with cs game,
(bad- parent) vs. parent commit: -33%

Bad next-queued kernel commit:
1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 7 16:20:32 2015 +0100

    drm/i915: Deminish contribution of wait-boosting from clients


parent next-queued kernel commit of 1854d5ca0
commit 6ad790c0f5ac55fd13f322c23519f0d6f0721864
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 7 16:20:31 2015 +0100

drm/i915: Boost GPU frequency if we detect outstanding pageflips
Comment 47 Chris Wilson 2015-08-15 08:44:06 UTC
(In reply to wendy.wang from comment #46)
> Add intel_pstate=disable parameter does not fix this bug on BSW
> test with cs game,
> (bad- parent) vs. parent commit: -33%
> 
> Bad next-queued kernel commit:
> 1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Apr 7 16:20:32 2015 +0100
> 
>     drm/i915: Deminish contribution of wait-boosting from clients
> 
> 
> parent next-queued kernel commit of 1854d5ca0
> commit 6ad790c0f5ac55fd13f322c23519f0d6f0721864
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Apr 7 16:20:31 2015 +0100
> 
> drm/i915: Boost GPU frequency if we detect outstanding pageflips

The open question was whether the regression remains after 
commit 8d3afd7d0e666b932e6fa15901e6280fe829a786
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu May 21 21:01:47 2015 +0100

    drm/i915: Use spinlocks for checking when to waitboost
Comment 48 wendy.wang 2015-08-24 07:47:48 UTC
(In reply to Chris Wilson from comment #47)
> (In reply to wendy.wang from comment #46)
> > Add intel_pstate=disable parameter does not fix this bug on BSW
> > test with cs game,
> > (bad- parent) vs. parent commit: -33%
> > 
> > Bad next-queued kernel commit:
> > 1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Tue Apr 7 16:20:32 2015 +0100
> > 
> >     drm/i915: Deminish contribution of wait-boosting from clients
> > 
> > 
> > parent next-queued kernel commit of 1854d5ca0
> > commit 6ad790c0f5ac55fd13f322c23519f0d6f0721864
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Tue Apr 7 16:20:31 2015 +0100
> > 
> > drm/i915: Boost GPU frequency if we detect outstanding pageflips
> 
> The open question was whether the regression remains after 
> commit 8d3afd7d0e666b932e6fa15901e6280fe829a786
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu May 21 21:01:47 2015 +0100
> 
>     drm/i915: Use spinlocks for checking when to waitboost
After commit 8d3afd7d0e666b932e6fa15901e6280fe829a786, the failed cases FPS did not recovered previous good performance, still have -12% gap vs. good fps.
test on BSW with CS game.
Comment 49 wendy.wang 2015-08-27 02:14:57 UTC
Update more:
When this issue open the FPS drop % as below:
Lightsmark v2008 perf dropped by 16% vs. good commit
CS game perf dropped by 36% vs. good commit
Half life2 perf dropped by 45% vs. good commit
Portal game perf dropped by 28% vs. good commit

After commit 8d3afd7d0e666b932e6fa15901e6280fe829a786
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu May 21 21:01:47 2015 +0100

     drm/i915: Use spinlocks for checking when to waitboost

Lightsmark v2008 perf dropped by 2% vs. good commit
CS game perf dropped by 12% vs. good commit
Half life2 perf dropped by 13% vs. good commit
Portal game perf dropped by 11% vs. good commit
Comment 50 Chris Wilson 2015-08-27 07:40:33 UTC
That's more consistent with mesa relying on wait-boosting to overcome its inability to submit batches fast enough. If you trace the gpufreq do you see it dip below max often?
Comment 51 wendy.wang 2015-09-02 03:42:58 UTC
(In reply to Chris Wilson from comment #50)
> That's more consistent with mesa relying on wait-boosting to overcome its
> inability to submit batches fast enough. If you trace the gpufreq do you see
> it dip below max often?

Yes, test on commit 8d3afd7d0e666b932e6fa15901e6280fe829a786 + BSW+ half life 2 case, most of time the actual/currentent gpufreq is equals to min GPU freq,  rare chance will observe actual/currentent gpufreq up to bigger than min GPU freq or max GPU freq.
Comment 52 Chris Wilson 2015-09-02 15:19:47 UTC
The theory is that http://cgit.freedesktop.org/~ickle/mesa/log/?h=brw-batch should help.
Comment 53 wendy.wang 2015-09-15 02:59:12 UTC
(In reply to Chris Wilson from comment #52)
> The theory is that http://cgit.freedesktop.org/~ickle/mesa/log/?h=brw-batch
> should help.

Hello Chris,
We failed to clone your branch:
[root@x-ivb2 ickle]# tsocks git clone git://people.freedesktop.org/~ickle/mesa 
Cloning into 'mesa'... 
remote: Counting objects: 642602, done. 
remote: Compressing objects: 100% (101583/101583), done. 
remote: Total 642602 (delta 544069), reused 635991 (delta 537501) 
Receiving objects: 100% (642602/642602), 148.71 MiB | 27.00 KiB/s, done. 
Resolving deltas: 100% (544069/544069), done. 
warning: remote HEAD refers to nonexistent ref, unable to checkout.
Comment 54 cprigent 2015-11-17 17:51:39 UTC
Bug scrub:
Hi Chris,
Could you help Wendy to access to this tree.
Thanks
Comment 55 Chris Wilson 2015-11-17 21:28:57 UTC
Sure, it is a remote:

git remote add <id> <tree>

then it will only pull down the delta and not the full tree from slow fdo.
Comment 56 Jari Tahvanainen 2016-10-06 12:21:27 UTC
Proposing this bug to be resolved+closed due to commit 8d3afd7. Please comment if you disagree (or agree).

IMHO: Confirming regression or fixing by executing and testing related these old bugs will not have ROI. 

 --- Git Log data ---
    commit 8d3afd7d0e666b932e6fa15901e6280fe829a786
    Author: Chris Wilson <chris@chris-wilson.co.uk
    Date:   Thu May 21 21:01:47 2015 +0100
     drm/i915: Use spinlocks for checking when to waitboost
     In commit 1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e
     Author: Chris Wilson <chris@chris-wilson.co.uk
     Date:   Tue Apr 7 16:20:32 2015 +0100
         drm/i915: Deminish contribution of wait-boosting from clients
     we removed an atomic timer based check for allowing waitboosting and
     moved it below the mutex taken during RPS. However, that mutex can be
     held for long periods of time on Vallyview/Cherryview as communication
     with the PCU is slow. As clients may frequently wait for results (e.g.
     such as tranform feedback) we introduced contention between the client
     and the RPS worker. We can take advantage of the RPS worker, by
     switching the wait boost decision to use spin locks and defer the
     actual reclocking to the worker.
     Fixes a regression of up to 45% on Baytrail and Baswell!
     v2 (Daniel):
     - Use max_freq_softlimit instead of the not-yet-merged boost
       frequency.
     - Don't inject a fake irq into the boost work, instead treat
       client_boost as just another legit waker.
     v3: Drop the now unused mask (Chris).
     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112
     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk (v1)
     Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch
 --- Eof Git Log ---
Comment 57 Chris Wilson 2016-10-06 12:28:57 UTC
The RPS tuning hasn't changed and we have users (such as kodi) who have complained about the frequency selection on bsw. So I think there is some merit in fixing RPS issues on bsw.
Comment 58 Jari Tahvanainen 2017-03-28 10:05:13 UTC
Hello Chris, any news or plans on this? I'm also wondering if there is mircobenchmarks that one can execute in order to see when it would be good time to do more laborious tests with games (e.g. CS). Lightsmark "use case" seems already be back to original level.
Comment 59 Chris Wilson 2017-03-28 10:07:52 UTC
The challenge here is generating realistic loads (including microsleeps). Note that we have now applied all the outstanding ideas wrt RPS on BSW (to make kodi happy), but we are still none the wiser if we are as good across all benchmarks as we have historically been.

I am happy to close this if there no one is able to reproduce the old benchmarks indicating whether or not we are still regressing.
Comment 60 Chris Wilson 2017-04-08 14:30:10 UTC
We appear to be content...
Comment 61 Ricardo 2017-06-27 15:50:47 UTC
closing bug


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.