Summary: | [ilk] IPS frequently downclocks the GPU when playing quakelive | ||
---|---|---|---|
Product: | DRI | Reporter: | dimon |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED WORKSFORME | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | intel-gfx-bugs |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | ILK | i915 features: | power/Other |
Description
dimon
2015-04-22 10:26:19 UTC
So I think something like: http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly should help lots. Updated patch at http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=bug90137 Still waiting for the machine to be cool enough to actually test GPU boosts! Now with an extra twist: commit 5b14d1431782861d1bf5cfd7b295b93bb7269d0a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Apr 27 09:50:48 2015 +0100 drm/i915: Allow RPS waitboosting to use max GPU frequency Ignore the restriction imposed by the user for when the GPU is stalling the clients and dropping frames. We will return back to the user limits immediately once the stall is over. and we ignore the intel_rps limits. Sssh, don't tell anyone! Hi Chris, I've did some tests with your current bug90137 branch. Last commit: 3529bfdeb1891efe9f87145f64316ae1b66cbd3b Unfortunately the GPU doesn' reclock at all, it stays always in the slowest P-state. I monitored it trough /sys/kernel/debug/dri/0/i915_frequency_info On 3.19.5 the GPU reclocks - but in a 'strange' way. Furthermore, ips seems not to work properly on my machine. > cat /sys/kernel/debug/dri/0/i915_emon_status GMCH temp: 53 Chipset power: 2916 GFX power: 21803 Total power: 24719 This is on an idling machine. The value of GFX power seems to be wrong, it's a way to high for an idling machine. Ips gets this value from the i915 driver and bases its decisions on it - reclocking can't work properly without correct power values. The power calculation is done in __i915_gfx_val - it does some magic based on empirical values. Is there any corresponding documentation available? So I could look whats actually going wrong on my machine. The IPS values are not documented as they were considered to be highly proprietary information by the hardware designers. Hmm, I missed the tracepoint. Can you apply: diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index d4d8305c73f5..f84b33bfb724 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -3736,6 +3736,9 @@ static bool __ironlake_set_rps(struct drm_i915_private *dev_priv, u8 val) if (WARN_ON(val > dev_priv->rps.max_freq)) return false; + if (val == dev_priv->rps.cur_freq) + return true; + assert_spin_locked(&mchdev_lock); if (wait_for_atomic((I915_READ(MEMSWCTL) & MEMCTL_CMD_STS) == 0, 10)) { @@ -3744,6 +3747,8 @@ static bool __ironlake_set_rps(struct drm_i915_private *dev_priv, u8 val) } dev_priv->rps.cur_freq = val; + trace_intel_gpu_freq_change(val); or just add a printk there. If you could review the freq<->delay conversion in the patch, I would be grateful - as that is where the bug most likely lies. We think in frequency for setting the RPS, but the hw talks delays. I had some freezes but... After startx of X [ 31.698489] intel_gpu_freq_change: 10 [ 32.572807] intel_gpu_freq_change: 0 [ 37.164415] intel_gpu_freq_change: 10 [ 38.569815] intel_gpu_freq_change: 0 starting glxgears n-times - no new transition logs, low framerates Ok, that says all you had were busy/idle calls and no interrupts. Or the interrupt change requests were invalid. In i915_irq.c, line 2050 we queue the RPS worker from the interrupt. That would be a good place for the first printk to verifiy we get interrupts. The next interesting function is ilk_compute_pm_iir() where we respond to the interrupt and see which way to reclock the GPU. Oh... gen6_pm_rps_work(): if (!dev_priv->rps.interrupts_enabled) return; that'll explain the lack of reclocking! Pushed v2 of intel_ips + waitboosting patches to #bug90137, head is now commit 3365a5a11c9f38cde398ffdd4e1dc66cbc31b4c7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 28 14:02:28 2015 +0100 drm/i915: Remove KMS Kconfig option still no reclocking :) Hmm, I had just pushed another update to fix a related bug later in that branch, 5de1f1c1bdc5de41cca48164ee234b6800f8f904 Time to update again, and start sprinkling printks. A couple of other debug files of interest: i915_drpc_info: hardware limits i915_frequency_info: current GPU frequency i915_rps_boost_info: who's boosting when On my ilk, I have reclocking again. Great news, your ilk machine is running, that's a big advantage. I did some printk work... The problem is I'm not getting any DE_PCU_EVENT interrupts. They are unmasked but not set in deiir. Here are some logs: static void ivb_display_irq_handler(struct drm_device *dev, u32 de_iir) @@ -2145,7 +2158,8 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg) } de_iir = I915_READ(DEIIR); - if (de_iir) { + printk("de_iir 0x%x deimr 0x%x deier 0x%x\n", de_iir, I915_READ(DEIMR), I915_READ(DEIER)); + if (de_iir) { [ 619.658081] ilk_display_irq_handler [ 619.658832] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.663854] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.664917] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.665909] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.666984] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.669067] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.673124] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.674803] de_iir 0x4000080 deimr 0x714bfb7b deier 0xeb48585 [ 619.674806] ilk_display_irq_handler [ 619.676151] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.677224] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.680201] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.681298] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.682252] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.683301] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585 [ 619.691442] de_iir 0x4000080 deimr 0x714bfb7b deier 0xeb48585 [ 619.691448] ilk_display_irq_handler [ 619.708108] de_iir 0x4000080 deimr 0x714bfb7b deier 0xeb48585 [ 619.708128] ilk_display_irq_handler [ 619.724845] de_iir 0x4000080 deimr 0x714bfb7b deier 0xeb48585 [ 619.724899] ilk_display_irq_handler [ 619.741503] de_iir 0x4000080 deimr 0x714bfb7b deier 0xeb48585 [ 619.741530] ilk_display_irq_handler Wow, after a fresh reboot reclocking worked for a short period of time. I was getting DE_PCU_EVENTs, pstates alternated between 0 and 10. Then it stopped generating DE_PCU_EVENTs and stopped reclocking. Enabled, unmasked, but never trigger. Huh. I guess add a sanity check that ironlake_enable_drps() is indeed being called. Though that should be shown "SW mode enabled" in /sys/kernel/debug/dri/0/i915_drpc_info * spots EI intervals and thresholds in there. At least I'm getting to know this code better, in theory at least. Can you compare 92bb36c80e561f82b1f4b63cc269a71833137841 - baseline 94afa384a6cddb565d7944c79aa3a2d536e4fe54 - preparation 86a8ed9560350c7ca12d2fb7e8d694a2ea4a60cc - conversion to gen6+ RPS routines and verify where the bug gets introduced. (In reply to dimon from comment #13) > Wow, after a fresh reboot reclocking worked for a short period of time. > I was getting DE_PCU_EVENTs, pstates alternated between 0 and 10. > Then it stopped generating DE_PCU_EVENTs and stopped reclocking. Ah, that's probably my runtime pm experiments to disable the interrupts when idle. If you jump back to 57c6b89256000f038a5bbc65eb745a6ad3858a6c does it continue to work even after idle? Some more data... In the case when there are no DE_PCU_EVENTs I'm getting this sequence. [ 1786.650289] 1 __ironlake_set_rps 0 min 0 max 10 [ 1788.651514] 1 __ironlake_set_rps 0 min 0 max 10 [ 1790.652747] 1 __ironlake_set_rps 0 min 0 max 10 start glxgears [ 1791.092472] 1 __ironlake_set_rps 10 min 0 max 10 [ 1791.106327] gpu busy, RCS change rejected stop glxgears [ 1798.657697] 1 __ironlake_set_rps 0 min 0 max 10 [ 1800.678924] 1 __ironlake_set_rps 0 min 0 max 10 [ 1802.660146] 1 __ironlake_set_rps 0 min 0 max 10 [ 1804.668038] 1 __ironlake_set_rps 0 min 0 max 10 [ 1808.670501] 1 __ironlake_set_rps 0 min 0 max 10 this sequence repeats on every glxgears start (In reply to Chris Wilson from comment #14) > Enabled, unmasked, but never trigger. Huh. > > I guess add a sanity check that ironlake_enable_drps() is indeed being > called. Though that should be shown "SW mode enabled" in > /sys/kernel/debug/dri/0/i915_drpc_info Sw mode is enabled > > * spots EI intervals and thresholds in there. > > At least I'm getting to know this code better, in theory at least. > > Can you compare > > 92bb36c80e561f82b1f4b63cc269a71833137841 - baseline > 94afa384a6cddb565d7944c79aa3a2d536e4fe54 - preparation > 86a8ed9560350c7ca12d2fb7e8d694a2ea4a60cc - conversion to gen6+ RPS routines > > and verify where the bug gets introduced. 86a8ed9560350c7ca12d2fb7e8d694a2ea4a60cc > > (In reply to dimon from comment #13) > > Wow, after a fresh reboot reclocking worked for a short period of time. > > I was getting DE_PCU_EVENTs, pstates alternated between 0 and 10. > > Then it stopped generating DE_PCU_EVENTs and stopped reclocking. > > Ah, that's probably my runtime pm experiments to disable the interrupts when > idle. If you jump back to 57c6b89256000f038a5bbc65eb745a6ad3858a6c does it > continue to work even after idle? no, it doesn't We seem to have neglected this bug a bit. Apologies. Does the problem persist with latest kernels? I going to close this bug since has not been a response for several months if the problem still exist please open a new bug with current information and we can look at it The situation has not changed. (In reply to Chris Wilson from comment #19) > The situation has not changed. Hello Dimon, Chris, Any update on this case? Thank you. Hi Elizabeth, I don't have the time now to test it on the latest kernel. Thx for asking. bugzilla-daemon@freedesktop.org writes: > https://bugs.freedesktop.org/show_bug.cgi?id=90137 > > Elizabeth <elizabethx.de.la.torre.mena@intel.com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|REOPENED |NEEDINFO > > --- Comment #20 from Elizabeth <elizabethx.de.la.torre.mena@intel.com> --- > (In reply to Chris Wilson from comment #19) >> The situation has not changed. > Hello Dimon, Chris, > Any update on this case? > Thank you. > > -- > You are receiving this mail because: > You reported the bug. It's still a valid bug, we haven't done anything to progress the ips interaction. My first plan is to still try and get ilk using the same rps framework as we have for gen6+, so that it is coupled into the waitboosting framework. Tamo. Friendly ping, Chris, Dimon, any progress? Thank you. Fwiw, latest code drop is https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=ilk-ips That restores rs/rc6 on ilk and gives us waitboosting. However, it may cause more frequent thermal throttling. First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug. Has anyone tested branch mentioned on comment #24? Closing, please re-open if occurs again. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.