Created attachment 140459 [details]
Monitoring script and results
We've had reports of the Kodi GUI "stuttering" since LibreELEC switched from kernel 4.14.49 to 4.17.3.
I ran a script that monitors /sys/kernel/debug/dri/0/i915_rps_boost_info and noticed that with 4.17.3 the GPU never boosts out of the "low power" window. It's the same when watching a video - the GPU is now almost always stuck in the "low power" window.
git bisecting the kernel between 4.14.49 and 4.17.3 reveals e9af4ea2b9e7e5d3caa6354be14de06b678ed0fa ("drm/i915: Avoid waitboosting on the active request") as the bad commit.
In the attached file you can see the script, and also the results of testing many different LibreELEC/Kodi builds (corresponding to the builds here).
With this script the first 30 seconds of a 4K video (Elysium trailer, on a Skylake NUC6i5SYH outputting at 1080p) is played from local storage in Kodi. At the end of the script a summary is output which includes the percentage of "LOW" (ie. low power) and "!LOW" (ie. high + mixed) window samples. The script is executed 5 times per build.
What is very clear is that from build 180616 (ie. #0616) onwards - which is when the LibreELEC kernel switched from 4.14.49 to 4.17.3 - the number of "!LOW" samples is reduced to almost zero - ie. the GPU no longer boosts, remaining locked at the "idle" frequency.
To rule out any changes in Kodi that might also be responsible I rebuilt build 180702 replacing only the 4.17.3 kernel with 4.14.49, creating build 180702a, where now the GPU boosts as expected.
Reverting the bad commit from kernel 4.17.3 resulted in build 180702b, and again the GPU is now boosting as expected.
Kernel 4.18-rc1 also has this non-boosting issue (same as 4.17.3).
With this commit reverted, the reports of "stutter" in Kodi with 4.17.3 (and now also 4.17.4) are resolved[5, 6, 7].
Can this commit be reverted, or is there a better solution that doesn't hobble the GPU?
What it is telling us is that the GPU is never utilised enough for it to upclock by itself -- Kodi should not need more power to deliver the frames on time, and that boosting is overkill.
One thing to confirm is which boost Kodi is depending on. So try just reverting the intel_display chunk so that it always boost after the vblank miss.
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 681e0710a467..0d4c61e53a6c 100644
@@ -12913,7 +12913,8 @@ static int do_rps_boost(struct wait_queue_entry *_wait,
* is reasonable to assume that it will complete before the next
* vblank without our intervention, so leave RPS alone.
- if (!i915_request_started(rq))
+ if (!i915_request_started(rq) ||
+ rq != i915_gem_active_peek(&rq->timeline->last_request))
might be an interesting compromise. But what I think is that we want is more of a nudge.
Created attachment 140462 [details] [review]
Created attachment 140463 [details] [review]
Since it's not the vblank waitboosting in effect here, it must be the i915_gem_object_wait (used by glFinish et al): try the second nudge on top of the first.
Apologies for the delay in responding, I've been collecting information.
I produced 4 test builds for users that are experiencing the visual stutter. These builds are based on 4.17.4 where e9af4ea2b9e7 is not reverted:
#0703b, includes patch "reverting the intel_display chunk so that it always boost after the vblank miss."
#0703c, includes patch "might be an interesting compromise"
#0703d, includes patch "nudge"
#0703e, includes patches "nudge" + "nudge2"
Results (from 2 users):
#0703b: visual stutter still present
#0703c: no visual stutter, but audio issues (both users) - delayed and/or stuttering audio
#0703d: no visual stutter and no audio issues
#0703e: no visual stutter and no audio issues
Regarding the audio issues, one user hasn't been able to reproduce and the other won't be able to test for a while, so this could be a red herring.
Hopefully this information is useful when deciding in which direction to go!
4. http://ix.io/1fYX (4.17.4 compatible version of id=140463)
Created attachment 140513 [details] [review]
Same nudge as before (v1), but with a bugfix so worth double checking that fixing the bug didn't bring the problem back.
Thanks Chris. I've posted a new #0708b build with "Tidier nudge" and will post again once I have feedback.
With #0708b on a Skylake NUC when playing 30 seconds of the Elysium 4K video, I see the GPU boosting frequently throughout playback (sampling i915_rps_boost_info every 0.05 seconds). Other less "stressful" videos (eg. 1080p HEVC) barely boost at all.
The GPU also boosts while navigating the Kodi GUI, so fingers crossed.
If anyone also has a wattsup/killawatt or other power meter, I would be interested in knowing the total power consumed with waitboost-vs-nudge.
kodi can measure the GPU power used by using perf + rapl, e.g. https://cgit.freedesktop.org/drm/igt-gpu-tools/tree/overlay/power.c which I think might be of interest.
> Same nudge as before (v1), but with a bugfix so worth double checking that fixing the bug didn't bring the problem back.
I added "Tidier nudge" in test build #0708b and received confirmation[1,2] that it is working well.
commit 60548c554be2830d29d2533dad0ac8133347ee51 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <email@example.com>
Date: Tue Jul 31 14:26:29 2018 +0100
drm/i915: Interactive RPS mode
RPS provides a feedback loop where we use the load during the previous
evaluation interval to decide whether to up or down clock the GPU
frequency. Our responsiveness is split into 3 regimes, a high and low
plateau with the intent to keep the gpu clocked high to cover occasional
stalls under high load, and low despite occasional glitches under steady
low load, and inbetween. However, we run into situations like kodi where
we want to stay at low power (video decoding is done efficiently
inside the fixed function HW and doesn't need high clocks even for high
bitrate streams), but just occasionally the pipeline is more complex
than a video decode and we need a smidgen of extra GPU power to present
on time. In the high power regime, we sample at sub frame intervals with
a bias to upclocking, and conversely at low power we sample over a few
frames worth to provide what we consider to be the right levels of
responsiveness respectively. At low power, we more or less expect to be
kicked out to high power at the start of a busy sequence by waitboosting.
Prior to commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active
request") whenever we missed the frame or stalled, we would immediate go
full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we
relaxed the waitboosting to only apply if the pipeline was deep to avoid
over-committing resources for a near miss. Sadly though, a near miss is
still a miss, and perceptible as jitter in the frame delivery.
To try and prevent the near miss before having to resort to boosting
after the fact, we use the pageflip queue as an indication that we are
in an "interactive" regime and so should sample the load more frequently
to provide power before the frame misses it vblank. This will make us
more favorable to providing a small power increase (one or two bins) as
required rather than going all the way to maximum and then having to
work back down again. (We still keep the waitboosting mechanism around
just in case a dramatic change in system load requires urgent uplocking,
faster than we can provide in a few evaluation intervals.)
v2: Reduce rps_set_interactive to a boolean parameter to avoid the
confusion of what if they wanted a new power mode after pinning to a
different mode (which to choose?)
v3: Only reprogram RPS while the GT is awake, it will be set when we
wake the GT, and while off warns about being used outside of rpm.
v4: Fix deferred application of interactive mode
v6: Group the mutex with its principle in a substruct
Fixes: e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")
Signed-off-by: Chris Wilson <firstname.lastname@example.org>
Cc: Joonas Lahtinen <email@example.com>
Cc: Tvrtko Ursulin <firstname.lastname@example.org>
Cc: Radoslaw Szwichtenberg <email@example.com>
Cc: Ville Syrjälä <firstname.lastname@example.org>
Reviewed-by: Joonas Lahtinen <email@example.com>
Reporter, can you verify that the issue is resolved for you on latest drm-tip?
I'll have to get back to you on that, as I've just today had a report of stutter from a user after they upgraded from 4.17.14 to 4.18.4 (both kernels include this fix). I'm trying to get more information.
I don't normally build with drm-tip, is testing with 4.18.y sufficient?
(In reply to freedesktop from comment #11)
> I don't normally build with drm-tip, is testing with 4.18.y sufficient?
Currently this commit is not available in 4.18.4 yet, but you can get it from Linus T mainline. Most likely this change would be available from 4.19.
> Currently this commit is not available in 4.18.4 yet
Yes, we've been including the v4 commit on top of the stable kernel releases in the LibreELEC build system since it was initially proposed - we're currently creating test builds based on 4.18.4 (now 4.18.5, actually).
> Most likely this change would be available from 4.19.
It looks like the commit (or a variation of it) is now included in 4.19-rc1. I'll produce a 4.19-rc1 test build of LibreELEC and Kodi later this week.
So far I've only had one user refer to ongoing "stutter" issues which makes me suspect it may not be related to this issue, and I'm happy for this issue to be closed as "resolved". If necessary I'll reopen at a later date should it be necessary (hopefully not!)
Many thanks to everyone involved.
Reporter, were you able to reproduce this? Can you confirm if I can close this bug?
No feedback for more than a month.
I assume this bug has been fixed. Closing this bug.