Since v4.20, it seems that the i915 hasn't been waitboosting the GPU frequently enough to prevent shell animations in gnome-shell from stuttering where on 4.19 I don't seem to have any issues. Continuously opening and closing the activies overlay with gnome-shell on v4.19 seems to keep the GPU on my laptop around an 800-1000 MHz RPS boost: RPS enabled? 1 GPU busy? yes [1 requests] CPU waiting? 1 Boosts outstanding? 0 Interactive? 1 Frequency requested 983 min hard:300, soft:300; max soft:1050, hard:1050 idle:300, efficient:300, boost:1050 systemd-logind [789]: 55 boosts Xwayland [5962]: 1 boosts Xwayland [5962]: 0 boosts Xwayland [5962]: 0 boosts Xwayland [5962]: 0 boosts Xwayland [5962]: 0 boosts Xwayland [5962]: 0 boosts Xwayland [5962]: 0 boosts Xwayland [5962]: 0 boosts Xwayland [5962]: 0 boosts Kernel (anonymous) boosts: 63 RPS Autotuning (current "high power" window): Avg. up: 100% [above threshold? 85%] Avg. down: 67% [below threshold? 60%] While v4.20 and higher seem to only boost the GPU briefly under load, then fall back to around 300MHz: (taken from 5.0.0-0.rc2) RPS enabled? 1 GPU busy? yes [2 requests] CPU waiting? 0 Boosts outstanding? 0 Interactive? 1 Frequency requested 317, actual 317 min hard:300, soft:300; max soft:1050, hard:1050 idle:300, efficient:300, boost:1050 systemd-logind [909]: 26 boosts Xwayland [1814]: 1 boosts Xwayland [1814]: 0 boosts Xwayland [1814]: 0 boosts Xwayland [1814]: 0 boosts Xwayland [1814]: 0 boosts Xwayland [1814]: 0 boosts Xwayland [1814]: 0 boosts Xwayland [1814]: 0 boosts Xwayland [1814]: 0 boosts Kernel (anonymous) boosts: 67 RPS Autotuning (current "high power" window): Avg. up: 3% [above threshold? 85%] Avg. down: 47% [below threshold? 60%] Stuttering observed with gnome-shell 3.20.2 on Fedora 29, worked fine on kernel v4.19.15. Display configuration is a single built-in 4K LCD, with a Kabylake H620: 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02)
One positive aspect here is that it seems both systems do employ the "interactive" mode: that is a fixed set of EI thresholds biased to upclocking the GPU. That is positive as it implies that the low frequencies may be more to do with low EI busyness than anything else. The last tweak to waitboosting that I recall was the very same interactive mode, commit 027063b1606fea6df15c270e5f2a072d1dfa8fef [v4.19] Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jul 31 14:26:29 2018 +0100 drm/i915: Interactive RPS mode and before that commit e9af4ea2b9e7e5d3caa6354be14de06b678ed0fa [v4.17] Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jan 18 13:16:09 2018 +0000 drm/i915: Avoid waitboosting on the active request So I think this may have been an accident. :) Taking a wild guess: commit 08e3e21a24d23db6a4adca90f7cb40d69e09d35c Author: Lucas De Marchi <lucas.demarchi@intel.com> Date: Fri Aug 3 16:24:43 2018 -0700 drm/i915: kill resource streamer support (It's both new to v4.20 and had unexpected perf implications.)
(In reply to Chris Wilson from comment #1) > One positive aspect here is that it seems both systems do employ the > "interactive" mode: that is a fixed set of EI thresholds biased to > upclocking the GPU. That is positive as it implies that the low frequencies > may be more to do with low EI busyness than anything else. The last tweak to > waitboosting that I recall was the very same interactive mode, > > commit 027063b1606fea6df15c270e5f2a072d1dfa8fef [v4.19] > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue Jul 31 14:26:29 2018 +0100 > > drm/i915: Interactive RPS mode > > and before that > > commit e9af4ea2b9e7e5d3caa6354be14de06b678ed0fa [v4.17] > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu Jan 18 13:16:09 2018 +0000 > > drm/i915: Avoid waitboosting on the active request > > So I think this may have been an accident. :) > > Taking a wild guess: > > commit 08e3e21a24d23db6a4adca90f7cb40d69e09d35c > Author: Lucas De Marchi <lucas.demarchi@intel.com> > Date: Fri Aug 3 16:24:43 2018 -0700 > > drm/i915: kill resource streamer support > > (It's both new to v4.20 and had unexpected perf implications.) Unfortunately reverting 08e3e21a24d23db6a4adca90f7cb40d69e09d35c doesn't seem to have made any noticeable difference, would you like me to try anything else?
2 more caught my eye in the diff as potentials, commit 90098efacc4c3e2e4f6262a657d6b520ecfb2555 Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Wed Dec 5 11:33:24 2018 +0000 drm/i915: Introduce per-engine workarounds (looks like you need to jump to just before commit 009367791f31afa0842854e7ea0acc9edf70ccaf Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Wed Dec 5 11:33:23 2018 +0000 drm/i915: Record GT workarounds in a list as it doesn't look like a clean revert) and commit 11abf0c5a021af683b8fe12b0d30fb1226d60e0f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Sep 14 09:00:15 2018 +0100 drm/i915: Limit the backpressure for i915_request allocation Other than that, not much appears to have happened in i915-land between v4.19 and v4.20 :|
The other angle I was thinking is about is maybe we simply are not hitting i915_request_wait as often (and so not saying "please boost me!"). Compare and contrast "perf record -a -c 1 -e i915:i915_request_wait_begin"?
https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=prescheduler contains commit 7b581cf26a4042e9bbb8410a31647e41cacafada Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jan 25 18:01:44 2019 +0000 drm/i915: Apply rps waitboosting for dma_fence_wait_timeout() As time goes by, usage of generic ioctls such as drm_syncobj and sync_file are on the increase bypassing i915-specific ioctls like GEM_WAIT. Currently, we only apply waitboosting to our driver ioctls as we track the file/client and account the waitboosting to them. However, since commit 7b92c1bd0540 ("drm/i915: Avoid keeping waitboost active for signaling threads"), we no longer have been applying the client ratelimiting on waitboosts and so that information has only been used for debug tracking. Push the application of waitboosting down to the common i915_request_wait, and apply it to all foreign fence waits as well. which might make a difference if the system has switched over to sync_file/syncobj interfaces in preference to the i915 ioctls.
Reporter, can you verify the issue with latest drmtip?
(In reply to Lakshmi from comment #6) > Reporter, can you verify the issue with latest drmtip? Yes, Lyude already has. And we've found so far that it isn't missing waitboosts, as we've unconditionally applied the boost to all request waits and still the performance is jittery on -tip. My suspicion lies towards something else inducing latency causing the GPU to idle enough for rps downclocking to take hold; that's likely to be vblanks (nothing else matters?). Optimistically Lyude might be able to bisect, but it'll be slow, tedious and prone to mistaking good/bad results. And there may well be more than one change leading to this effect. But for now the investigation mostly says it's not waitboosting per-se that's changed between the versions.
Based on traces provided by Lyude, we can say that in 4.20 the rps downclocking is much more rapid than in 4.19, and that is due to commit 0d55babc8392754352f1058866dd4182ae587d11 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Aug 2 11:06:28 2018 +0100 drm/i915: Drop stray clearing of rps->last_adj We used to reset last_adj to 0 on crossing a power domain boundary, to slow down our rate of change. However, commit 60548c554be2 ("drm/i915: Interactive RPS mode") accidentally caused it to be reset on every frequency update, nerfing the fast response granted by the slow start algorithm. Fixes: 60548c554be2 ("drm/i915: Interactive RPS mode") Testcase: igt/pm_rps/mix-max-config-loaded Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180802100631.31305-1-chris@chris-wilson.co.uk (Just waiting on Lyude having the opportunity to confirm that.) What to do? This does mean that the GPU load is too insubstantial to justify high clocks by itself, but it is required for a smooth UX. I am thinking along the lines of not using the rapid downlocking within the HIGH_POWER zone (and not the rapid uplocking within the LOW_POWER zone?), and with the observation that we seem to stick to the interactive mode that should be enough to keep us to high clocks. I dread putting a powermeter on this... We shall have to look at the rapl figures for a standardised Lyude-test.
*Rings bell, fires confetti* looks like this is the patch! Additionally, I ended up writing up some scripts yesterday because I got tired of eyeballing things: before reverting 0d55babc8392754352f1058866dd4182ae587d11: 35 measurements Average: 33.65657142857143 FPS FPS observed: 20.8 - 46.87 FPS Percentage under 60 FPS: 100.0% Percentage under 55 FPS: 100.0% Percentage under 50 FPS: 100.0% Percentage under 45 FPS: 97.14285714285714% Percentage under 40 FPS: 97.14285714285714% Percentage under 35 FPS: 45.714285714285715% Percentage under 30 FPS: 11.428571428571429% Percentage under 25 FPS: 2.857142857142857% After reverting: 30 measurements Average: 49.833666666666666 FPS FPS observed: 33.85 - 60.0 FPS Percentage under 60 FPS: 86.66666666666667% Percentage under 55 FPS: 70.0% Percentage under 50 FPS: 53.333333333333336% Percentage under 45 FPS: 20.0% Percentage under 40 FPS: 6.666666666666667% Percentage under 35 FPS: 6.666666666666667% Percentage under 30 FPS: 0% Percentage under 25 FPS: 0% Visibly as well, the stutter is significantly improved and seems like it's back to what it was on 4.19, hooray!
See https://patchwork.freedesktop.org/series/56740/
commit 2a8862d2f3da4f2576c34f66647127b3bb77c316 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Feb 19 12:22:03 2019 +0000 drm/i915: Reduce the RPS shock Limit deboosting and boosting to keep ourselves at the extremes when in the respective power modes (i.e. slowly decrease frequencies while in the HIGH_POWER zone and slowly increase frequencies while in the LOW_POWER zone). On idle, we will hit the timeout and drop to the next level quickly, and conversely if busy we expect to hit a waitboost and rapidly switch into max power. This should improve the UX experience by keeping the GPU clocks higher than they ostensibly should be (based on simple busyness) by switching into the INTERACTIVE mode (due to waiting for pageflips) and increasing clocks via waitboosting. This will incur some additional power, our saving grace should be rc6 and powergating to keep the extra current draw in check. Food for future thought would be deadline scheduling? If we know certain contexts (high priority compositors) absolutely must hit the next vblank then we can raise the frequencies ahead of time. Part of this is covered by per-context frequencies, where userspace is given control over the frequency range they want the GPU to execute at (for largely the same problem as this, where the workload is very latency sensitive but at the EI level appears mostly idle). Indeed, the per-context series does extend the modeset boosting to include a frequency range tweak which seems applicable to solving this jittery UX behaviour. Reported-by: Lyude Paul <lyude@redhat.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109408 References: 0d55babc8392 ("drm/i915: Drop stray clearing of rps->last_adj") References: 60548c554be2 ("drm/i915: Interactive RPS mode") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lyude Paul <lyude@redhat.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Quoting Lyude Paul: > Before reverting 0d55babc8392754352f1058866dd4182ae587d11: [4.20] > > 35 measurements [of gnome-shell animations] > Average: 33.65657142857143 FPS > FPS observed: 20.8 - 46.87 FPS > Percentage under 60 FPS: 100.0% > Percentage under 55 FPS: 100.0% > Percentage under 50 FPS: 100.0% > Percentage under 45 FPS: 97.14285714285714% > Percentage under 40 FPS: 97.14285714285714% > Percentage under 35 FPS: 45.714285714285715% > Percentage under 30 FPS: 11.428571428571429% > Percentage under 25 FPS: 2.857142857142857% > > After reverting: [4.19 behaviour] > > 30 measurements > Average: 49.833666666666666 FPS > FPS observed: 33.85 - 60.0 FPS > Percentage under 60 FPS: 86.66666666666667% > Percentage under 55 FPS: 70.0% > Percentage under 50 FPS: 53.333333333333336% > Percentage under 45 FPS: 20.0% > Percentage under 40 FPS: 6.666666666666667% > Percentage under 35 FPS: 6.666666666666667% > Percentage under 30 FPS: 0% > Percentage under 25 FPS: 0% > > Patched: > 42 measurements > Average: 46.05428571428571 FPS > FPS observed: 1.82 - 59.98 FPS > Percentage under 60 FPS: 88.09523809523809% > Percentage under 55 FPS: 61.904761904761905% > Percentage under 50 FPS: 45.23809523809524% > Percentage under 45 FPS: 35.714285714285715% > Percentage under 40 FPS: 33.33333333333333% > Percentage under 35 FPS: 19.047619047619047% > Percentage under 30 FPS: 7.142857142857142% > Percentage under 25 FPS: 4.761904761904762% Tested-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190219122215.8941-13-chris@chris-wilson.co.uk
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.