Summary: | [BSW bisected] OglGSCloth/Lightsmark/CS/ Portal/ Half Life 2 games performance decreased by 15%-45% | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Ding Heng <hengx.ding> | ||||||||||||||||||||||||||||
Component: | DRM/Intel | Assignee: | Chris Wilson <chris> | ||||||||||||||||||||||||||||
Status: | CLOSED WORKSFORME | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||||||||||||||
Severity: | major | ||||||||||||||||||||||||||||||
Priority: | high | CC: | christophe.prigent, eero.t.tamminen, intel-gfx-bugs | ||||||||||||||||||||||||||||
Version: | DRI git | Keywords: | bisected, regression | ||||||||||||||||||||||||||||
Hardware: | Other | ||||||||||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||||||||||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=90134 | ||||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||||
i915 platform: | BSW/CHT | i915 features: | display/atomic | ||||||||||||||||||||||||||||
Attachments: |
|
Description
Ding Heng
2015-04-20 07:25:42 UTC
Created attachment 115207 [details]
xorg log
Created attachment 115208 [details] [review] Don't downclock if clients are waiting for GPU results Please try this. The other aspect to be aware of is that the RPS selection is obviously suboptimal for this workload on BSW. (In reply to Chris Wilson from comment #2) > Created attachment 115208 [details] [review] [review] > Don't downclock if clients are waiting for GPU results > > Please try this. I installed this patch on nightly-2015-04-15 d600654ab94b325f253e267422dcf60302120ea0 and the result seems not stable, I run this case 3 times but only 1 result is near the expect result. Created attachment 115242 [details] [review] Use infinite wait instead of set-domain for explicit throttling Created attachment 115243 [details] [review] Use coarse throttling first This patch should work around the change in behaviour for very, very slow render clients. But it would be more interesting to measure the impact of the libdrm patch first. (In reply to Chris Wilson from comment #6) > Created attachment 115243 [details] [review] [review] > Use coarse throttling first > > This patch should work around the change in behaviour for very, very slow > render clients. But it would be more interesting to measure the impact of > the libdrm patch first. Also requires diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 5a9207a..4dc54e5 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -174,8 +174,10 @@ intel_dri2_flush_with_flags(__DRIcontext *cPriv, if (flags & __DRI2_FLUSH_DRAWABLE) intel_resolve_for_dri2_flush(brw, dPriv); - if (reason == __DRI2_THROTTLE_SWAPBUFFER) + if (reason == __DRI2_THROTTLE_SWAPBUFFER) { + brw->need_flush_throttle = true; brw->need_swap_throttle = true; + } if (reason == __DRI2_THROTTLE_FLUSHFRONT) brw->need_flush_throttle = true; Created attachment 115245 [details] [review] Always apply RPS boosts for severely delayed work This should do the same as the mesa patch with less fuss. (In reply to Chris Wilson from comment #8) > Created attachment 115245 [details] [review] [review] > Always apply RPS boosts for severely delayed work > > This should do the same as the mesa patch with less fuss. I see line-through attachment 115243 [details] [review], do I need this patch still, or just install the 3 patches you left in this page and the patch you mentioned in comment 7? Besides, patch in comment 8 installed fail on latest nightly branch. Created attachment 115258 [details]
rej files when install patch
Because of the 1st bad commit: Lightsmark v2008 perf dropped by 16% CS game perf dropped by 36% Half life2 perf dropped by 45% Portal game perf dropped by 28% I've put the patches up to and including the RPS boost for laggards at: http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=ac4c854260bc4c9117733c48d442d550a9e15036 Updated patches at http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=bug90137 (In reply to Chris Wilson from comment #13) > Updated patches at > http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=bug90137 I failed to install this patch on latest nightly branch. So does the patch in comment 12. I also tried to modify the code refer to your patch, but some of the variable or struct could not be found in the latest code. (In reply to Ding Heng from comment #14) > (In reply to Chris Wilson from comment #13) > > Updated patches at > > http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=bug90137 > > I failed to install this patch on latest nightly branch. So does the patch > in comment 12. I also tried to modify the code refer to your patch, but some > of the variable or struct could not be found in the latest code. It's not a single patch, but a branch. (In reply to Chris Wilson from comment #15) > (In reply to Ding Heng from comment #14) > > (In reply to Chris Wilson from comment #13) > > > Updated patches at > > > http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=bug90137 > > > > I failed to install this patch on latest nightly branch. So does the patch > > in comment 12. I also tried to modify the code refer to your patch, but some > > of the variable or struct could not be found in the latest code. > > It's not a single patch, but a branch. What's this branch name? How could I verify this bug with your patch? http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug90112 git://people.freedesktop.org/~ickle/linux-2.6 bug90112 (In reply to Chris Wilson from comment #17) > http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug90112 > > git://people.freedesktop.org/~ickle/linux-2.6 bug90112 I downloaded this branch compiled it and install your patch for mesa and libdrm, there was still no performance increase. Bleh. Do you have a graph of GPU frequency for the run? Try "trace-cmd record -e i915 ./benchmark; trace-cmd report | bz2 > trace.bz2" and attach the trace.bz2. kernel Pstate driver "powersave" governor can currently do funky stuff (switch tasks from high freq core to low freq core for no apparent reason) on BSW for workloads that are both CPU & GPU bound and TDP limited, like I think the indicated Source engine games are. And CPU running at low speed can cause also GPU to run at low speed. Unless one is tracking both GPU & CPU frequencies and task migration in tests, it might be better to check these kind of optimizations first with: - test being fixed on single core with "taskset" command, and/or - both CPU & GPU being fixed to (non-turbo) speed (in BSW C0 case, one could try e.g. 1.5Ghz for CPU and 500Mhz for GPU) (In reply to Chris Wilson from comment #19) > Bleh. Do you have a graph of GPU frequency for the run? > > Try "trace-cmd record -e i915 ./benchmark; trace-cmd report | bz2 > > trace.bz2" and attach the trace.bz2. "benchmark" in this command means the command to reproduce this issue, right? I tried this and found this command will cause call trace. (In reply to Ding Heng from comment #21) > "benchmark" in this command means the command to reproduce this issue, > right? I tried this and found this command will cause call trace. Right, but what call trace? Created attachment 115538 [details]
call trace dmesg
seems the dmesg before call trace has been cleared. I can't get more than this.
(In reply to Ding Heng from comment #23) > Created attachment 115538 [details] > call trace dmesg > > seems the dmesg before call trace has been cleared. I can't get more than > this. http://patchwork.freedesktop.org/patch/48529/ According to Wendy, drop in SynMark GSCloth test on *BYT* is also due to this change. (In reply to Eero Tamminen from comment #25) > According to Wendy, drop in SynMark GSCloth test on *BYT* is also due to > this change. At least that is one I can test. In all honesty, it just means that we were reliant on the waitboost mechanism too much i.e. we were not submitting work fast enough to keep the GPU busy enough to maintain high clocks. Created attachment 115581 [details]
trace.bz2
call trace still exist, please refer to the latest dmesg. Output.txt shows the outpput of the command.
Created attachment 115582 [details]
output.txt
Created attachment 115583 [details]
dmesg
(In reply to Ding Heng from comment #27) > Created attachment 115581 [details] > trace.bz2 > > call trace still exist, please refer to the latest dmesg. Output.txt shows > the outpput of the command. The calltraces are noise from modesetting errors, shouldn't be impacting the benchmark. You managed to bzip the output of running trace-cmd on the benchmark and not the output of "trace-cmd report" Ok, I have seen an interesting drop on byt with OglGSCloth. First look says it is not a GPU frequency issue - coarse sampling of the frequency implies that it remains throughout the test. But the render %busy along with completion interrupts are both higher for the preceding commit, confirming the higher throughput measured by the test. Have test system, I can dig. Finally! It's mutex contention on the rps.hw_lock. New patches pushed to git://people.freedesktop.org/~ickle/linux-2.6 branch bug90112 (http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug90112) (In reply to Chris Wilson from comment #33) > New patches pushed to git://people.freedesktop.org/~ickle/linux-2.6 branch > bug90112 (http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug90112) still not performance increase. Please refer to the latest Xorg log and dmesg. Created attachment 115691 [details]
dmesg_0509
Created attachment 115692 [details]
xorg log 0509
The issue I was able to reproduce on BYT should be fixed in -nightly. So please confirm, and then test BSW. (In reply to Chris Wilson from comment #37) > The issue I was able to reproduce on BYT should be fixed in -nightly. So > please confirm, and then test BSW. Which case did you use to verify this issue? What's the result? I didn't see performance increase with latest kernel. For example, the result is still 39FPS when I test with lightsmark, while it was about 47FPS before the first bad commit. byt OglGSCloth Note that chv doesn't use the full RPS autotuning. You can try diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 5eed3caba483..ef733d164cec 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -4120,7 +4120,7 @@ static bool valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val) if (vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val)) return false; - if (!IS_CHERRYVIEW(dev_priv)) + if (1) gen6_set_rps_thresholds(dev_priv, val); dev_priv->rps.cur_freq = val; and see if that makes any difference (In reply to Chris Wilson from comment #39) > byt OglGSCloth > > Note that chv doesn't use the full RPS autotuning. You can try > > diff --git a/drivers/gpu/drm/i915/intel_pm.c > b/drivers/gpu/drm/i915/intel_pm.c > index 5eed3caba483..ef733d164cec 100644 > --- a/drivers/gpu/drm/i915/intel_pm.c > +++ b/drivers/gpu/drm/i915/intel_pm.c > @@ -4120,7 +4120,7 @@ static bool valleyview_set_rps(struct drm_i915_private > *dev_priv, u8 val) > if (vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val)) > return false; > > - if (!IS_CHERRYVIEW(dev_priv)) > + if (1) > gen6_set_rps_thresholds(dev_priv, val); > > dev_priv->rps.cur_freq = val; > > and see if that makes any difference I can see OglGSCloth performance increased by about 10% on BYT. But lightsmark performance is still lower than before on BSW. On BSW, it's probably better to do testing of this kind of issues also with ACPI ondemand governor, in case issue is related to process scheduler / power management. Please test performance of the problematic commit and commit preceeding it after booting kernel with following kernel bootup option: "intel_pstate=disable". How large the BSW perf difference is with that configuration? (In reply to Eero Tamminen from comment #41) > On BSW, it's probably better to do testing of this kind of issues also with > ACPI ondemand governor, in case issue is related to process scheduler / > power management. > > Please test performance of the problematic commit and commit preceeding it > after booting kernel with following kernel bootup option: > "intel_pstate=disable". How large the BSW perf difference is with that > configuration? On BSW, there is still about 17% performance difference between the first bad commit and its parent commit. Seems adding intel_pstate=disable in kernel option didn't make any difference. (In reply to Ding Heng from comment #42) > On BSW, there is still about 17% performance difference between the first > bad commit and its parent commit. Seems adding intel_pstate=disable in > kernel option didn't make any difference. In which test-case? Comment 11 states that difference in HL2 was 45%... Hi Ding Heng, could you please provide information to last comment Thanks GFX QA has been transfer to France, Dinghengx has moved out from this project. Wendy temporarily backup gfx performance before France take up gfx performance testing, and wendy try to update this bug tomorrow after retest the intel_pstate=disable parameter. Add intel_pstate=disable parameter does not fix this bug on BSW test with cs game, (bad- parent) vs. parent commit: -33% Bad next-queued kernel commit: 1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 7 16:20:32 2015 +0100 drm/i915: Deminish contribution of wait-boosting from clients parent next-queued kernel commit of 1854d5ca0 commit 6ad790c0f5ac55fd13f322c23519f0d6f0721864 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 7 16:20:31 2015 +0100 drm/i915: Boost GPU frequency if we detect outstanding pageflips (In reply to wendy.wang from comment #46) > Add intel_pstate=disable parameter does not fix this bug on BSW > test with cs game, > (bad- parent) vs. parent commit: -33% > > Bad next-queued kernel commit: > 1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue Apr 7 16:20:32 2015 +0100 > > drm/i915: Deminish contribution of wait-boosting from clients > > > parent next-queued kernel commit of 1854d5ca0 > commit 6ad790c0f5ac55fd13f322c23519f0d6f0721864 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue Apr 7 16:20:31 2015 +0100 > > drm/i915: Boost GPU frequency if we detect outstanding pageflips The open question was whether the regression remains after commit 8d3afd7d0e666b932e6fa15901e6280fe829a786 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu May 21 21:01:47 2015 +0100 drm/i915: Use spinlocks for checking when to waitboost (In reply to Chris Wilson from comment #47) > (In reply to wendy.wang from comment #46) > > Add intel_pstate=disable parameter does not fix this bug on BSW > > test with cs game, > > (bad- parent) vs. parent commit: -33% > > > > Bad next-queued kernel commit: > > 1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Tue Apr 7 16:20:32 2015 +0100 > > > > drm/i915: Deminish contribution of wait-boosting from clients > > > > > > parent next-queued kernel commit of 1854d5ca0 > > commit 6ad790c0f5ac55fd13f322c23519f0d6f0721864 > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Tue Apr 7 16:20:31 2015 +0100 > > > > drm/i915: Boost GPU frequency if we detect outstanding pageflips > > The open question was whether the regression remains after > commit 8d3afd7d0e666b932e6fa15901e6280fe829a786 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu May 21 21:01:47 2015 +0100 > > drm/i915: Use spinlocks for checking when to waitboost After commit 8d3afd7d0e666b932e6fa15901e6280fe829a786, the failed cases FPS did not recovered previous good performance, still have -12% gap vs. good fps. test on BSW with CS game. Update more: When this issue open the FPS drop % as below: Lightsmark v2008 perf dropped by 16% vs. good commit CS game perf dropped by 36% vs. good commit Half life2 perf dropped by 45% vs. good commit Portal game perf dropped by 28% vs. good commit After commit 8d3afd7d0e666b932e6fa15901e6280fe829a786 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu May 21 21:01:47 2015 +0100 drm/i915: Use spinlocks for checking when to waitboost Lightsmark v2008 perf dropped by 2% vs. good commit CS game perf dropped by 12% vs. good commit Half life2 perf dropped by 13% vs. good commit Portal game perf dropped by 11% vs. good commit That's more consistent with mesa relying on wait-boosting to overcome its inability to submit batches fast enough. If you trace the gpufreq do you see it dip below max often? (In reply to Chris Wilson from comment #50) > That's more consistent with mesa relying on wait-boosting to overcome its > inability to submit batches fast enough. If you trace the gpufreq do you see > it dip below max often? Yes, test on commit 8d3afd7d0e666b932e6fa15901e6280fe829a786 + BSW+ half life 2 case, most of time the actual/currentent gpufreq is equals to min GPU freq, rare chance will observe actual/currentent gpufreq up to bigger than min GPU freq or max GPU freq. The theory is that http://cgit.freedesktop.org/~ickle/mesa/log/?h=brw-batch should help. (In reply to Chris Wilson from comment #52) > The theory is that http://cgit.freedesktop.org/~ickle/mesa/log/?h=brw-batch > should help. Hello Chris, We failed to clone your branch: [root@x-ivb2 ickle]# tsocks git clone git://people.freedesktop.org/~ickle/mesa Cloning into 'mesa'... remote: Counting objects: 642602, done. remote: Compressing objects: 100% (101583/101583), done. remote: Total 642602 (delta 544069), reused 635991 (delta 537501) Receiving objects: 100% (642602/642602), 148.71 MiB | 27.00 KiB/s, done. Resolving deltas: 100% (544069/544069), done. warning: remote HEAD refers to nonexistent ref, unable to checkout. Bug scrub: Hi Chris, Could you help Wendy to access to this tree. Thanks Sure, it is a remote: git remote add <id> <tree> then it will only pull down the delta and not the full tree from slow fdo. Proposing this bug to be resolved+closed due to commit 8d3afd7. Please comment if you disagree (or agree). IMHO: Confirming regression or fixing by executing and testing related these old bugs will not have ROI. --- Git Log data --- commit 8d3afd7d0e666b932e6fa15901e6280fe829a786 Author: Chris Wilson <chris@chris-wilson.co.uk Date: Thu May 21 21:01:47 2015 +0100 drm/i915: Use spinlocks for checking when to waitboost In commit 1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e Author: Chris Wilson <chris@chris-wilson.co.uk Date: Tue Apr 7 16:20:32 2015 +0100 drm/i915: Deminish contribution of wait-boosting from clients we removed an atomic timer based check for allowing waitboosting and moved it below the mutex taken during RPS. However, that mutex can be held for long periods of time on Vallyview/Cherryview as communication with the PCU is slow. As clients may frequently wait for results (e.g. such as tranform feedback) we introduced contention between the client and the RPS worker. We can take advantage of the RPS worker, by switching the wait boost decision to use spin locks and defer the actual reclocking to the worker. Fixes a regression of up to 45% on Baytrail and Baswell! v2 (Daniel): - Use max_freq_softlimit instead of the not-yet-merged boost frequency. - Don't inject a fake irq into the boost work, instead treat client_boost as just another legit waker. v3: Drop the now unused mask (Chris). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk (v1) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch --- Eof Git Log --- The RPS tuning hasn't changed and we have users (such as kodi) who have complained about the frequency selection on bsw. So I think there is some merit in fixing RPS issues on bsw. Hello Chris, any news or plans on this? I'm also wondering if there is mircobenchmarks that one can execute in order to see when it would be good time to do more laborious tests with games (e.g. CS). Lightsmark "use case" seems already be back to original level. The challenge here is generating realistic loads (including microsleeps). Note that we have now applied all the outstanding ideas wrt RPS on BSW (to make kodi happy), but we are still none the wiser if we are as good across all benchmarks as we have historically been. I am happy to close this if there no one is able to reproduce the old benchmarks indicating whether or not we are still regressing. We appear to be content... closing bug |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.