drm-tip RC kernel integration drops offscreen 3D tests performance to 1/3rd: kernel git://anongit.freedesktop.org/drm-tip at 10de1e17faaab452782e5a1baffd1b30a639a261 2017-07-18_10-09-14 drm-tip: 2017y-07m-18d-10h-08m-42s UTC integration manifest Because kernel doesn't anymore raise GPU speed from minimum (to maximum) with those tests, as it should. These tests include offscreen versions of following: * GfxBench 4.0 ALU2, Tessellation, T-Rex, CarChase (neither Manhattan, nor CPU bound driver tests were affected) * GLBenchmark 2.7 Egypt, T-Rex and Fill tests (Fill least) This issue is BYT specific.
Maxing out the GPU (at least GAM) results in 50% c0 activity. I'm guessing out cdlck is incorrect, or slightly different to the c0 cycles: diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 32c62442c9d8..4b36ed2290b9 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1076,7 +1076,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir) time = ktime_us_delta(now.ktime, prev->ktime); - time *= dev_priv->czclk_freq; + time *= dev_priv->czclk_freq / 2; /* Workload can be split between render + media, * e.g. SwapBuffers being blitted in X after being rendered in
In last nightly [1] things got worse: * Unigine (onscreen) Heaven & Valley demos now run mostly with minimum GPU frequency too - I think that as result the CPU side is also run slower as demo startup and scene changes take longer, but it's hard to say which one is cause and which an effect * There are now GPU hangs (in 2 different test-cases) Last nightly was run with modesetting driver instead of Intel DDX, but I don't think that's related as there's no similar Unigine demo issue in Mesa testing. [1] I.e. from changes between: kernel git://anongit.freedesktop.org/drm-tip at dbfb2f62576e1c3550d10398b097589959356db3 2017-08-21_08-14-04 drm-tip: 2017y-08m-21d-08h-13m-34s UTC integration manifest kernel git://anongit.freedesktop.org/drm-tip at 017fec5c2e57672a8c2a350376070e6c6a5ae950 2017-08-22_16-23-32 drm-tip: 2017y-08m-22d-16h-23m-11s UTC integration manifest
Mika spotted that tkr_raw was 2x faster than tkr_mono. The culprit is: commit fc6eead7c1e2e5376c25d2795d4539fdacbc0648 Author: John Stultz <john.stultz@linaro.org> Date: Mon May 22 17:20:20 2017 -0700 time: Clean up CLOCK_MONOTONIC_RAW time handling Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW, remove the duplicitive tk->raw_time.tv_nsec, which can be stored in tk->tkr_raw.xtime_nsec (similarly to how its handled for monotonic time). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Daniel Mentz <danielmentz@google.com> Tested-by: Daniel Mentz <danielmentz@google.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
Applied to topic/core-for-CI: commit 5567b808e5681f742856245bc1e34d40475cb89d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Aug 25 14:46:41 2017 +0100 Revert "time: Clean up CLOCK_MONOTONIC_RAW time handling" This reverts commit fc6eead7c1e2e5376c25d2795d4539fdacbc0648. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102336
Potential fix posted for testing here: https://lkml.org/lkml/2017/8/25/792
Swapped out the revert for commit 177776dba04e4e02d46ec46d7927580eaeb106b6 Author: John Stultz <john.stultz@linaro.org> Date: Fri Aug 25 15:57:04 2017 -0700 time: Fix ktime_get_raw() issues caused by incorrect base accumulation In commit fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling"), I mistakenly added the following: /* Update the monotonic raw base */ seconds = tk->raw_sec; nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift); tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec); Which adds the raw_sec value and the shifted down raw xtime_nsec to the base value. This is problematic as when calling ktime_get_raw(), we add the tk->tkr_raw.xtime_nsec and current offset, shift it down and add it to the raw base. This results in the shifted down tk->tkr_raw.xtime_nsec being added twice. My mistake, was that I was matching the monotonic base logic above: seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec); nsec = (u32) tk->wall_to_monotonic.tv_nsec; tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec); Which adds the wall_to_monotonic.tv_nsec value, but not the tk->tkr_mono.xtime_nsec value to the base. The result of this is that ktime_get_raw() users (which are all internal users) see the raw time move faster then it should (the rate at which can vary with the current size of tkr_raw.xtime_nsec), which has resulted in at least problems with graphics rendering performance. To fix this, we simplify the tkr_raw.base accumulation to only accumulate the raw_sec portion, and do not include the tkr_raw.xtime_nsec portion, which will be added at read time. in topic/core-for-CI
Now fixed in x86/tip. If you want to report some details of the hang, otherwise there's not much I can do. The reason why it suddenly became worse is all due to switching to modesetting, which only used rcs and so we end up interesting behaviour with X getting stuck behind extra frames, and not reporting the buffer as idle as early (so the client would be waiting on the CPU fence being signaled and not waiting on the GPU, so being invisible to the waitboost mechanism). Interesting effect.
(In reply to Chris Wilson from comment #7) > Now fixed in x86/tip. Verified, all the regressions are no fixed, thanks! > If you want to report some details of the hang, otherwise there's not much I > can do. Was the issue BYT specific, like the related perf drop was? > The reason why it suddenly became worse is all due to switching to > modesetting, which only used rcs and so we end up interesting behaviour with > X getting stuck behind extra frames, and not reporting the buffer as idle as > early (so the client would be waiting on the CPU fence being signaled and > not waiting on the GPU, so being invisible to the waitboost mechanism). > Interesting effect. Modesetting drop isn't BYT specific. When Martin tested modesetting ~1 year ago against Intel DDX, modesetting was same speed or slightly better in 3D cases (and lost in 2D ones). However, now switching from Intel DDX to modesetting drops onscreen 3D test-cases perf on all platforms for Unigine demos (on anything after BYT), GfxBench test-cases and high FPS cases. Any idea why?
(In reply to Eero Tamminen from comment #8) > (In reply to Chris Wilson from comment #7) > > Now fixed in x86/tip. > > Verified, all the regressions are no fixed, thanks! > > > > If you want to report some details of the hang, otherwise there's not much I > > can do. > > Was the issue BYT specific, like the related perf drop was? The hangs? Haven't seen any, so I don't yet know what's going on there. The perf issue here is for byt only (as it is the only one that cannot use the HW EI thresholds and so we calculate those manually). > > The reason why it suddenly became worse is all due to switching to > > modesetting, which only used rcs and so we end up interesting behaviour with > > X getting stuck behind extra frames, and not reporting the buffer as idle as > > early (so the client would be waiting on the CPU fence being signaled and > > not waiting on the GPU, so being invisible to the waitboost mechanism). > > Interesting effect. > > Modesetting drop isn't BYT specific. > > When Martin tested modesetting ~1 year ago against Intel DDX, modesetting > was same speed or slightly better in 3D cases (and lost in 2D ones). > > However, now switching from Intel DDX to modesetting drops onscreen 3D > test-cases perf on all platforms for Unigine demos (on anything after BYT), > GfxBench test-cases and high FPS cases. > > Any idea why? Hmm, from my inspection on e.g. Unigine -modesetting was and still is around 10% slower. As you are very aware, there are so many different factors at play -- but under ideal circumstances, the ddx is irrelevant to game throughput/latency. Do you have a specific workload that I can use as an example to see what changes may have impacted it?
(In reply to Chris Wilson from comment #9) > (In reply to Eero Tamminen from comment #8) > > Was the issue BYT specific, like the related perf drop was? Sorry, I meant the fix. > The hangs? Haven't seen any, so I don't yet know what's going on there. The > perf issue here is for byt only (as it is the only one that cannot use the > HW EI thresholds and so we calculate those manually). Thans, so it was really BYT specific. It wasn't clear from the fix description. :-) > > When Martin tested modesetting ~1 year ago against Intel DDX, modesetting > > was same speed or slightly better in 3D cases (and lost in 2D ones). > > > > However, now switching from Intel DDX to modesetting drops onscreen 3D > > test-cases perf on all platforms for Unigine demos (on anything after BYT), > > GfxBench test-cases and high FPS cases. > > > > Any idea why? > > Hmm, from my inspection on e.g. Unigine -modesetting was and still is around > 10% slower. Good to know. Interesting that Martin got different results back then. > As you are very aware, there are so many different factors at > play -- but under ideal circumstances, the ddx is irrelevant to game > throughput/latency. Yea, only think it does it the copy for non-Vsync fullscreen contents and some Vblank etc synchronization. > Do you have a specific workload that I can use as an example to see what > changes may have impacted it? GpuTest Triangle and SynMark Batch0 are the simplest cases. Now that I have BXT data, Triangle actually went up on BYT & BXT with modesetting, although for other tested machines it went down. However, the cases I'm more concerned are Unigine tests, because they're low FPS and therefore the issue looks more like MOCS / tiling issue than just some (less interesting) high FPS overhead. I'd suggest checking things either with SKL or KBL (say, GT2).
Lets close this.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.