Bug 109408 - [Kabylake] RPS waitboost regression since v4.20
Summary: [Kabylake] RPS waitboost regression since v4.20
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-21 21:58 UTC by Lyude Paul
Modified: 2019-02-20 20:36 UTC (History)
1 user (show)

See Also:
i915 platform: KBL
i915 features: power/GT


Attachments

Description Lyude Paul 2019-01-21 21:58:57 UTC
Since v4.20, it seems that the i915 hasn't been waitboosting the GPU frequently enough to prevent shell animations in gnome-shell from stuttering where on 4.19 I don't seem to have any issues.

Continuously opening and closing the activies overlay with gnome-shell on v4.19 seems to keep the GPU on my laptop around an 800-1000 MHz RPS boost:

RPS enabled? 1
GPU busy? yes [1 requests]
CPU waiting? 1
Boosts outstanding? 0
Interactive? 1
Frequency requested 983
  min hard:300, soft:300; max soft:1050, hard:1050
  idle:300, efficient:300, boost:1050
systemd-logind [789]: 55 boosts
Xwayland [5962]: 1 boosts
Xwayland [5962]: 0 boosts
Xwayland [5962]: 0 boosts
Xwayland [5962]: 0 boosts
Xwayland [5962]: 0 boosts
Xwayland [5962]: 0 boosts
Xwayland [5962]: 0 boosts
Xwayland [5962]: 0 boosts
Xwayland [5962]: 0 boosts
Kernel (anonymous) boosts: 63

RPS Autotuning (current "high power" window):
  Avg. up: 100% [above threshold? 85%]
  Avg. down: 67% [below threshold? 60%]

While v4.20 and higher seem to only boost the GPU briefly under load, then fall back to around 300MHz:
(taken from 5.0.0-0.rc2)

RPS enabled? 1
GPU busy? yes [2 requests]
CPU waiting? 0
Boosts outstanding? 0
Interactive? 1
Frequency requested 317, actual 317
  min hard:300, soft:300; max soft:1050, hard:1050
  idle:300, efficient:300, boost:1050
systemd-logind [909]: 26 boosts
Xwayland [1814]: 1 boosts
Xwayland [1814]: 0 boosts
Xwayland [1814]: 0 boosts
Xwayland [1814]: 0 boosts
Xwayland [1814]: 0 boosts
Xwayland [1814]: 0 boosts
Xwayland [1814]: 0 boosts
Xwayland [1814]: 0 boosts
Xwayland [1814]: 0 boosts
Kernel (anonymous) boosts: 67

RPS Autotuning (current "high power" window):
  Avg. up: 3% [above threshold? 85%]
  Avg. down: 47% [below threshold? 60%]

Stuttering observed with gnome-shell 3.20.2 on Fedora 29, worked fine on kernel v4.19.15. Display configuration is a single built-in 4K LCD, with a Kabylake H620:

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02)
Comment 1 Chris Wilson 2019-01-21 22:07:56 UTC
One positive aspect here is that it seems both systems do employ the "interactive" mode: that is a fixed set of EI thresholds biased to upclocking the GPU. That is positive as it implies that the low frequencies may be more to do with low EI busyness than anything else. The last tweak to waitboosting that I recall was the very same interactive mode,

commit 027063b1606fea6df15c270e5f2a072d1dfa8fef [v4.19]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 31 14:26:29 2018 +0100

    drm/i915: Interactive RPS mode

and before that

commit e9af4ea2b9e7e5d3caa6354be14de06b678ed0fa [v4.17]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jan 18 13:16:09 2018 +0000

    drm/i915: Avoid waitboosting on the active request

So I think this may have been an accident. :)

Taking a wild guess:

commit 08e3e21a24d23db6a4adca90f7cb40d69e09d35c
Author: Lucas De Marchi <lucas.demarchi@intel.com>
Date:   Fri Aug 3 16:24:43 2018 -0700

    drm/i915: kill resource streamer support

(It's both new to v4.20 and had unexpected perf implications.)
Comment 2 Lyude Paul 2019-01-22 20:18:26 UTC
(In reply to Chris Wilson from comment #1)
> One positive aspect here is that it seems both systems do employ the
> "interactive" mode: that is a fixed set of EI thresholds biased to
> upclocking the GPU. That is positive as it implies that the low frequencies
> may be more to do with low EI busyness than anything else. The last tweak to
> waitboosting that I recall was the very same interactive mode,
> 
> commit 027063b1606fea6df15c270e5f2a072d1dfa8fef [v4.19]
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Jul 31 14:26:29 2018 +0100
> 
>     drm/i915: Interactive RPS mode
> 
> and before that
> 
> commit e9af4ea2b9e7e5d3caa6354be14de06b678ed0fa [v4.17]
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Jan 18 13:16:09 2018 +0000
> 
>     drm/i915: Avoid waitboosting on the active request
> 
> So I think this may have been an accident. :)
> 
> Taking a wild guess:
> 
> commit 08e3e21a24d23db6a4adca90f7cb40d69e09d35c
> Author: Lucas De Marchi <lucas.demarchi@intel.com>
> Date:   Fri Aug 3 16:24:43 2018 -0700
> 
>     drm/i915: kill resource streamer support
> 
> (It's both new to v4.20 and had unexpected perf implications.)

Unfortunately reverting 08e3e21a24d23db6a4adca90f7cb40d69e09d35c doesn't seem to have made any noticeable difference, would you like me to try anything else?
Comment 3 Chris Wilson 2019-01-22 20:45:59 UTC
2 more caught my eye in the diff as potentials,

commit 90098efacc4c3e2e4f6262a657d6b520ecfb2555
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Wed Dec 5 11:33:24 2018 +0000

    drm/i915: Introduce per-engine workarounds

(looks like you need to jump to just before

commit 009367791f31afa0842854e7ea0acc9edf70ccaf
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Wed Dec 5 11:33:23 2018 +0000

    drm/i915: Record GT workarounds in a list

as it doesn't look like a clean revert)

and

commit 11abf0c5a021af683b8fe12b0d30fb1226d60e0f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Sep 14 09:00:15 2018 +0100

    drm/i915: Limit the backpressure for i915_request allocation


Other than that, not much appears to have happened in i915-land between v4.19 and v4.20 :|
Comment 4 Chris Wilson 2019-01-28 08:52:37 UTC
The other angle I was thinking is about is maybe we simply are not hitting i915_request_wait as often (and so not saying "please boost me!").

Compare and contrast "perf record -a -c 1 -e i915:i915_request_wait_begin"?
Comment 5 Chris Wilson 2019-01-30 11:16:48 UTC
https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=prescheduler

contains

commit 7b581cf26a4042e9bbb8410a31647e41cacafada
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jan 25 18:01:44 2019 +0000

    drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()
    
    As time goes by, usage of generic ioctls such as drm_syncobj and
    sync_file are on the increase bypassing i915-specific ioctls like
    GEM_WAIT. Currently, we only apply waitboosting to our driver ioctls as
    we track the file/client and account the waitboosting to them. However,
    since commit 7b92c1bd0540 ("drm/i915: Avoid keeping waitboost active for
    signaling threads"), we no longer have been applying the client
    ratelimiting on waitboosts and so that information has only been used
    for debug tracking.
    
    Push the application of waitboosting down to the common
    i915_request_wait, and apply it to all foreign fence waits as well.

which might make a difference if the system has switched over to sync_file/syncobj interfaces in preference to the i915 ioctls.
Comment 6 Lakshmi 2019-02-13 12:00:57 UTC
Reporter, can you verify the issue with latest drmtip?
Comment 7 Chris Wilson 2019-02-13 12:06:01 UTC
(In reply to Lakshmi from comment #6)
> Reporter, can you verify the issue with latest drmtip?

Yes, Lyude already has. And we've found so far that it isn't missing waitboosts, as we've unconditionally applied the boost to all request waits and still the performance is jittery on -tip. My suspicion lies towards something else inducing latency causing the GPU to idle enough for rps downclocking to take hold; that's likely to be vblanks (nothing else matters?). Optimistically Lyude might be able to bisect, but it'll be slow, tedious and prone to mistaking good/bad results. And there may well be more than one change leading to this effect.

But for now the investigation mostly says it's not waitboosting per-se that's changed between the versions.
Comment 8 Chris Wilson 2019-02-15 15:24:50 UTC
Based on traces provided by Lyude, we can say that in 4.20 the rps downclocking is much more rapid than in 4.19, and that is due to

commit 0d55babc8392754352f1058866dd4182ae587d11
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Aug 2 11:06:28 2018 +0100

    drm/i915: Drop stray clearing of rps->last_adj
    
    We used to reset last_adj to 0 on crossing a power domain boundary, to
    slow down our rate of change. However, commit 60548c554be2 ("drm/i915:
    Interactive RPS mode") accidentally caused it to be reset on every
    frequency update, nerfing the fast response granted by the slow start
    algorithm.
    
    Fixes: 60548c554be2 ("drm/i915: Interactive RPS mode")
    Testcase: igt/pm_rps/mix-max-config-loaded
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180802100631.31305-1-chris@chris-wilson.co.uk

(Just waiting on Lyude having the opportunity to confirm that.)

What to do? This does mean that the GPU load is too insubstantial to justify high  clocks by itself, but it is required for a smooth UX.

I am thinking along the lines of not using the rapid downlocking within the HIGH_POWER zone (and not the rapid uplocking within the LOW_POWER zone?), and with the observation that we seem to stick to the interactive mode that should be enough to keep us to high clocks.

I dread putting a powermeter on this... We shall have to look at the rapl figures for a standardised Lyude-test.
Comment 9 Lyude Paul 2019-02-15 18:39:08 UTC
*Rings bell, fires confetti* looks like this is the patch! Additionally, I ended up writing up some scripts yesterday because I got tired of eyeballing things:

before reverting 0d55babc8392754352f1058866dd4182ae587d11:

35 measurements
Average: 33.65657142857143 FPS
FPS observed: 20.8 - 46.87 FPS
Percentage under 60 FPS: 100.0%
Percentage under 55 FPS: 100.0%
Percentage under 50 FPS: 100.0%
Percentage under 45 FPS: 97.14285714285714%
Percentage under 40 FPS: 97.14285714285714%
Percentage under 35 FPS: 45.714285714285715%
Percentage under 30 FPS: 11.428571428571429%
Percentage under 25 FPS: 2.857142857142857%

After reverting:

30 measurements
Average: 49.833666666666666 FPS
FPS observed: 33.85 - 60.0 FPS
Percentage under 60 FPS: 86.66666666666667%
Percentage under 55 FPS: 70.0%
Percentage under 50 FPS: 53.333333333333336%
Percentage under 45 FPS: 20.0%
Percentage under 40 FPS: 6.666666666666667%
Percentage under 35 FPS: 6.666666666666667%
Percentage under 30 FPS: 0%
Percentage under 25 FPS: 0%

Visibly as well, the stutter is significantly improved and seems like it's back to what it was on 4.19, hooray!
Comment 10 Chris Wilson 2019-02-15 19:02:24 UTC
See https://patchwork.freedesktop.org/series/56740/
Comment 11 Chris Wilson 2019-02-20 20:36:45 UTC
commit 2a8862d2f3da4f2576c34f66647127b3bb77c316 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 19 12:22:03 2019 +0000

    drm/i915: Reduce the RPS shock
    
    Limit deboosting and boosting to keep ourselves at the extremes
    when in the respective power modes (i.e. slowly decrease frequencies
    while in the HIGH_POWER zone and slowly increase frequencies while
    in the LOW_POWER zone). On idle, we will hit the timeout and drop
    to the next level quickly, and conversely if busy we expect to
    hit a waitboost and rapidly switch into max power.
    
    This should improve the UX experience by keeping the GPU clocks higher
    than they ostensibly should be (based on simple busyness) by switching
    into the INTERACTIVE mode (due to waiting for pageflips) and increasing
    clocks via waitboosting. This will incur some additional power, our
    saving grace should be rc6 and powergating to keep the extra current
    draw in check.
    
    Food for future thought would be deadline scheduling? If we know certain
    contexts (high priority compositors) absolutely must hit the next vblank
    then we can raise the frequencies ahead of time. Part of this is covered
    by per-context frequencies, where userspace is given control over the
    frequency range they want the GPU to execute at (for largely the same
    problem as this, where the workload is very latency sensitive but at the
    EI level appears mostly idle). Indeed, the per-context series does
    extend the modeset boosting to include a frequency range tweak which
    seems applicable to solving this jittery UX behaviour.
    
    Reported-by: Lyude Paul <lyude@redhat.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109408
    References: 0d55babc8392 ("drm/i915: Drop stray clearing of rps->last_adj")
    References: 60548c554be2 ("drm/i915: Interactive RPS mode")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Lyude Paul <lyude@redhat.com>
    Cc: Eero Tamminen <eero.t.tamminen@intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Michel Thierry <michel.thierry@intel.com>
    
    Quoting Lyude Paul:
    > Before reverting 0d55babc8392754352f1058866dd4182ae587d11: [4.20]
    >
    > 35 measurements [of gnome-shell animations]
    > Average: 33.65657142857143 FPS
    > FPS observed: 20.8 - 46.87 FPS
    > Percentage under 60 FPS: 100.0%
    > Percentage under 55 FPS: 100.0%
    > Percentage under 50 FPS: 100.0%
    > Percentage under 45 FPS: 97.14285714285714%
    > Percentage under 40 FPS: 97.14285714285714%
    > Percentage under 35 FPS: 45.714285714285715%
    > Percentage under 30 FPS: 11.428571428571429%
    > Percentage under 25 FPS: 2.857142857142857%
    >
    > After reverting: [4.19 behaviour]
    >
    > 30 measurements
    > Average: 49.833666666666666 FPS
    > FPS observed: 33.85 - 60.0 FPS
    > Percentage under 60 FPS: 86.66666666666667%
    > Percentage under 55 FPS: 70.0%
    > Percentage under 50 FPS: 53.333333333333336%
    > Percentage under 45 FPS: 20.0%
    > Percentage under 40 FPS: 6.666666666666667%
    > Percentage under 35 FPS: 6.666666666666667%
    > Percentage under 30 FPS: 0%
    > Percentage under 25 FPS: 0%
    >
    > Patched:
    > 42 measurements
    > Average: 46.05428571428571 FPS
    > FPS observed: 1.82 - 59.98 FPS
    > Percentage under 60 FPS: 88.09523809523809%
    > Percentage under 55 FPS: 61.904761904761905%
    > Percentage under 50 FPS: 45.23809523809524%
    > Percentage under 45 FPS: 35.714285714285715%
    > Percentage under 40 FPS: 33.33333333333333%
    > Percentage under 35 FPS: 19.047619047619047%
    > Percentage under 30 FPS: 7.142857142857142%
    > Percentage under 25 FPS: 4.761904761904762%
    
    Tested-by: Lyude Paul <lyude@redhat.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190219122215.8941-13-chris@chris-wilson.co.uk


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.