Created attachment 142419 [details]
parameter file for sample_media_transcode
* Ubuntu 18.04
* git head build of drm-tip kernel
* git head build of Mesa & X and their main deps
* git head build of Intel MediaSDK and their main dependencies
Good drm-tip version:
a4e9f377a9: 2018-11-03 01:29:29: drm-tip: 2018y-11m-03d-01h-28m-29s UTC integration manifest
Bad drm-tip version:
1a4a6dafa1: 2018-11-05 16:07:52: drm-tip: 2018y-11m-05d-16h-07m-05s UTC integration manifest
* Run (mostly) CPU bound GfxBench v4 Driver2 test
* Run MediaSDK provided tool with the attached parameter file (does 50 streams which lower H264 video frame & bit rates, size and adds filtering):
sample_multi_transcode -par inputs.par
* Sum FPS of all streams together
Outcome on HW that is TDP limited:
* Test-case 1 performance drops 15%
* Test-case 2 performance drops 5%
* Performance of other CPU bound GPU tests regress also, but less
Outcome on HW that isn't TDP limited:
* RAPL reports marginally larger CPU power consumption for test-case 1
* RAPL reports 1.5-2.5x higher CPU power consumption for test-case 2
There were no performance improvements in other tests we run on these devices.
Large CPU power usage increase without perf change is visible on:
* SKL i5-6600K (GT2)
* KBL i7-7500U (GT2)
* KBL i7-8809G (GT2)
(And on pre-production CFL-S device we had)
TDP limit caused performance to drop (with increased CPU usage) on:
* KBL i7-7567U (GT3e)
* SKL i7-6770HQ (GT4e)
There was one device where performance increases with the much higher CPU power usage, but it's only by 1-2% and only in test-case 2:
* SKL i5-6260U (GT3e)
Neither perf nor power usage changed on BXT devices, so I guess this change concerns only Core devices.
On BDW GT2 the CPU usage increase was clearly smaller than on GEN9 Core devices (and there was no noticeable performance change). MediaSDK doesn't support older devices, so I don't have data from them.
Drm-tip seems to have rebased from v4.19 to v4.20-rc1 during that 1 day interval.
Yeah, can you bisect Eero ;)
(In reply to Jani Saarinen from comment #2)
> Yeah, can you bisect Eero ;)
I don't have anything set up that would automate bisecting kernel well enough (reboots, boot failures, handling drm-tip rebases etc).
However, if you have in mind few commits in that range, I could manually check whether they give good or bad performance.
And I can of course (internally) provide ready-made SW setup and reserve suitable HW for whomever is going to look into this.
I haven't yet tried the exact tests as cited here, all I've found so far is a remarkable improvement from 08e3e21a24d23db6a4adca90f7cb40d69e09d35c ("drm/i915: kill resource streamer support") in the -rc1 merge.
The report would suggest we were looking for a pstate or scheduler change.
(In reply to Chris Wilson from comment #4)
> I haven't yet tried the exact tests as cited here, all I've found so far is
> a remarkable improvement from 08e3e21a24d23db6a4adca90f7cb40d69e09d35c
> ("drm/i915: kill resource streamer support") in the -rc1 merge.
In CPU bound Driver2 GL tests? On which device?
(In reply to Eero Tamminen from comment #5)
> (In reply to Chris Wilson from comment #4)
> > I haven't yet tried the exact tests as cited here, all I've found so far is
> > a remarkable improvement from 08e3e21a24d23db6a4adca90f7cb40d69e09d35c
> > ("drm/i915: kill resource streamer support") in the -rc1 merge.
> In CPU bound Driver2 GL tests? On which device?
kbl + glxgears; basic context switch exercise.
In light of the rc1 controversy, do you have spectre/meltdown migrations enabled on your test systems?
We don't specifically enable any mitigations, just use drm-tip kernel defaults.
It seems to have enabled an additional one when it was rebased to 4.20-rc1:
Spectre V2 : Spectre v2 cross-process SMT mitigation: Enabling STIBP
Threading in the listed test-cases:
* MediaSDK 50 stream transcode case has 250 threads
* I thought GfxBench Driver2 doesn't thread, as only single CPU is busy, but it actually uses 3 threads of which 2 use as much CPU as they can, and apparently kernel just sticks them to same core, so they seem hyperthreaded
-> I think that SMT mitigation is very likely cause for the drop instead of i915.
Could you point out suitable drm-tip commit IDs before and after enabling the mitigation so that I could verify it?
STIBP fixes in drm-tip v4.20-rc5 fix the CPU bound 3D cases performance (test-case 1).
However, those fixes, nor disabling Spectre mitigation completely from kernel command line (checked by David), do NOT have any impact on the Media performance regression (test-case 2).
David will try to bisect the Media perf regression.
Reverting 01bad1c6896db021db82042e71c2bf1f97cc026b seems to resolve at least part of the performance regression; the CPU power usage seems to have been a separate issue.
(In reply to David Weinehall from comment #9)
> Reverting 01bad1c6896db021db82042e71c2bf1f97cc026b seems to resolve at least
> part of the performance regression
Author:Rafael J. Wysocki <email@example.com>
Committer: Rafael J. Wysocki <firstname.lastname@example.org>
cpuidle: poll_state: Revise loop termination condition
If need_resched() returns "false", breaking out of the loop in
poll_idle() will cause a new idle state to be selected, so in fact
it usually doesn't make sense to spin in it longer than the target
residency of the second state. [Note that the "polling" state is
used only if there is at least one "real" state defined in addition
to it, so the second state is always there.] On the other hand,
breaking out of it early (say in case the next state is disabled)
shouldn't hurt as it is polling anyway.
For this reason, make the loop in poll_idle() break if the CPU has
been spinning longer than the target residency of the second state
(the "polling" state can only be state).
> the CPU power usage seems to have been a separate issue.
Before the issue, CPU core(s)s power usage was less than GPU power usage, and afterwards it was >2x GPU power usage, in a (mostly) *GPU limited* Media test-case.
STIBP fix didn't improve >2x CPU power usage increase in Media test-case 2) at all, so I think at this point it's more interesting than the small perf drop in it. It's like to explain rest of the perf drop too.