Bug 90137

Summary: [ilk] IPS frequently downclocks the GPU when playing quakelive
Product: DRI Reporter: dimon
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: ILK i915 features: power/Other

Description dimon 2015-04-22 10:26:19 UTC
This leads to an inconsistent/delayed mouse movement in the game.
Comment 1 Chris Wilson 2015-04-22 11:23:04 UTC
So I think something like:  http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly should help lots.
Comment 2 Chris Wilson 2015-04-26 22:30:37 UTC
Updated patch at http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=bug90137

Still waiting for the machine to be cool enough to actually test GPU boosts!
Comment 3 Chris Wilson 2015-04-27 08:56:29 UTC
Now with an extra twist:

commit 5b14d1431782861d1bf5cfd7b295b93bb7269d0a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 27 09:50:48 2015 +0100

    drm/i915: Allow RPS waitboosting to use max GPU frequency
    
    Ignore the restriction imposed by the user for when the GPU is stalling
    the clients and dropping frames. We will return back to the user limits
    immediately once the stall is over.

and we ignore the intel_rps limits. Sssh, don't tell anyone!
Comment 4 dimon 2015-04-28 15:25:32 UTC
Hi Chris,

I've did some tests with your current bug90137 branch.
Last commit: 3529bfdeb1891efe9f87145f64316ae1b66cbd3b

Unfortunately the GPU doesn' reclock at all, it stays always in the slowest P-state.
I monitored it trough /sys/kernel/debug/dri/0/i915_frequency_info
On 3.19.5 the GPU reclocks - but in a 'strange' way. 

Furthermore, ips seems not to work properly on my machine.
> cat /sys/kernel/debug/dri/0/i915_emon_status
GMCH temp: 53
Chipset power: 2916
GFX power: 21803
Total power: 24719

This is on an idling machine. The value of GFX power seems to be wrong, it's a way to high for an idling machine.
Ips gets this value from the i915 driver and bases its decisions on it - reclocking can't work properly without correct power values.
The power calculation is done in __i915_gfx_val - it does some magic based on empirical values.
Is there any corresponding documentation available? So I could look whats actually going wrong on my machine.
Comment 5 Chris Wilson 2015-04-28 16:00:41 UTC
The IPS values are not documented as they were considered to be highly proprietary information by the hardware designers.

Hmm, I missed the tracepoint. Can you apply:

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index d4d8305c73f5..f84b33bfb724 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3736,6 +3736,9 @@ static bool __ironlake_set_rps(struct drm_i915_private *dev_priv, u8 val)
        if (WARN_ON(val > dev_priv->rps.max_freq))
                return false;
 
+       if (val == dev_priv->rps.cur_freq)
+               return true;
+
        assert_spin_locked(&mchdev_lock);
 
        if (wait_for_atomic((I915_READ(MEMSWCTL) & MEMCTL_CMD_STS) == 0, 10)) {
@@ -3744,6 +3747,8 @@ static bool __ironlake_set_rps(struct drm_i915_private *dev_priv, u8 val)
        }
 
        dev_priv->rps.cur_freq = val;
+       trace_intel_gpu_freq_change(val);

or just add a printk there. If you could review the freq<->delay conversion in the patch, I would be grateful - as that is where the bug most likely lies. We think in frequency for setting the RPS, but the hw talks delays.
Comment 6 dimon 2015-04-28 16:36:28 UTC
I had some freezes but...

After startx of X
[   31.698489] intel_gpu_freq_change: 10
[   32.572807] intel_gpu_freq_change: 0
[   37.164415] intel_gpu_freq_change: 10
[   38.569815] intel_gpu_freq_change: 0


starting glxgears n-times - no new transition logs, low framerates
Comment 7 Chris Wilson 2015-04-28 20:04:01 UTC
Ok, that says all you had were busy/idle calls and no interrupts. Or the interrupt change requests were invalid.

In i915_irq.c, line 2050 we queue the RPS worker from the interrupt. That would be a good place for the first printk to verifiy we get interrupts. The next interesting function is ilk_compute_pm_iir() where we respond to the interrupt and see which way to reclock the GPU.

Oh... 

gen6_pm_rps_work():
if (!dev_priv->rps.interrupts_enabled) return;

that'll explain the lack of reclocking!
Comment 8 Chris Wilson 2015-04-28 21:14:11 UTC
Pushed v2 of intel_ips + waitboosting patches to #bug90137, head is now

commit 3365a5a11c9f38cde398ffdd4e1dc66cbc31b4c7
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 28 14:02:28 2015 +0100

    drm/i915: Remove KMS Kconfig option
Comment 9 dimon 2015-04-28 22:12:49 UTC
still no reclocking :)
Comment 10 Chris Wilson 2015-04-28 22:18:38 UTC
Hmm, I had just pushed another update to fix a related bug later in that branch, 5de1f1c1bdc5de41cca48164ee234b6800f8f904

Time to update again, and start sprinkling printks.
Comment 11 Chris Wilson 2015-04-29 09:24:26 UTC
A couple of other debug files of interest:

i915_drpc_info: hardware limits
i915_frequency_info: current GPU frequency
i915_rps_boost_info: who's boosting when

On my ilk, I have reclocking again.
Comment 12 dimon 2015-04-29 13:01:58 UTC
Great news, your ilk machine is running, that's a big advantage.

I did some printk work...
The problem is I'm not getting any DE_PCU_EVENT interrupts.
They are unmasked but not set in deiir.

Here are some logs:

static void ivb_display_irq_handler(struct drm_device *dev, u32 de_iir)
@@ -2145,7 +2158,8 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
        }
 
        de_iir = I915_READ(DEIIR);
-       if (de_iir) {
+        printk("de_iir 0x%x deimr 0x%x deier 0x%x\n", de_iir, I915_READ(DEIMR), I915_READ(DEIER));
+        if (de_iir) {


[  619.658081] ilk_display_irq_handler
[  619.658832] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.663854] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.664917] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.665909] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.666984] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.669067] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.673124] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.674803] de_iir 0x4000080 deimr 0x714bfb7b deier 0xeb48585
[  619.674806] ilk_display_irq_handler
[  619.676151] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.677224] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.680201] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.681298] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.682252] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.683301] de_iir 0x0 deimr 0x714bfb7b deier 0xeb48585
[  619.691442] de_iir 0x4000080 deimr 0x714bfb7b deier 0xeb48585
[  619.691448] ilk_display_irq_handler
[  619.708108] de_iir 0x4000080 deimr 0x714bfb7b deier 0xeb48585
[  619.708128] ilk_display_irq_handler
[  619.724845] de_iir 0x4000080 deimr 0x714bfb7b deier 0xeb48585
[  619.724899] ilk_display_irq_handler
[  619.741503] de_iir 0x4000080 deimr 0x714bfb7b deier 0xeb48585
[  619.741530] ilk_display_irq_handler
Comment 13 dimon 2015-04-29 13:19:38 UTC
Wow, after a fresh reboot reclocking worked for a short period of time.
I was getting DE_PCU_EVENTs, pstates alternated between 0 and 10.
Then it stopped generating DE_PCU_EVENTs and stopped reclocking.
Comment 14 Chris Wilson 2015-04-29 13:28:37 UTC
Enabled, unmasked, but never trigger. Huh.

I guess add a sanity check that ironlake_enable_drps() is indeed being called. Though that should be shown "SW mode enabled" in /sys/kernel/debug/dri/0/i915_drpc_info

* spots EI intervals and thresholds in there.

At least I'm getting to know this code better, in theory at least.

Can you compare

92bb36c80e561f82b1f4b63cc269a71833137841 - baseline
94afa384a6cddb565d7944c79aa3a2d536e4fe54 - preparation
86a8ed9560350c7ca12d2fb7e8d694a2ea4a60cc - conversion to gen6+ RPS routines

and verify where the bug gets introduced.


(In reply to dimon from comment #13)
> Wow, after a fresh reboot reclocking worked for a short period of time.
> I was getting DE_PCU_EVENTs, pstates alternated between 0 and 10.
> Then it stopped generating DE_PCU_EVENTs and stopped reclocking.

Ah, that's probably my runtime pm experiments to disable the interrupts when idle. If you jump back to 57c6b89256000f038a5bbc65eb745a6ad3858a6c does it continue to work even after idle?
Comment 15 dimon 2015-04-29 17:25:45 UTC
Some more data...

In the case when there are no DE_PCU_EVENTs I'm getting this sequence.

[ 1786.650289] 1 __ironlake_set_rps 0 min 0 max 10
[ 1788.651514] 1 __ironlake_set_rps 0 min 0 max 10
[ 1790.652747] 1 __ironlake_set_rps 0 min 0 max 10
start glxgears
[ 1791.092472] 1 __ironlake_set_rps 10 min 0 max 10
[ 1791.106327] gpu busy, RCS change rejected
stop glxgears
[ 1798.657697] 1 __ironlake_set_rps 0 min 0 max 10
[ 1800.678924] 1 __ironlake_set_rps 0 min 0 max 10
[ 1802.660146] 1 __ironlake_set_rps 0 min 0 max 10
[ 1804.668038] 1 __ironlake_set_rps 0 min 0 max 10
[ 1808.670501] 1 __ironlake_set_rps 0 min 0 max 10

this sequence repeats on every glxgears start
Comment 16 dimon 2015-04-30 14:13:55 UTC
(In reply to Chris Wilson from comment #14)
> Enabled, unmasked, but never trigger. Huh.
> 
> I guess add a sanity check that ironlake_enable_drps() is indeed being
> called. Though that should be shown "SW mode enabled" in
> /sys/kernel/debug/dri/0/i915_drpc_info

Sw mode is enabled

> 
> * spots EI intervals and thresholds in there.
> 
> At least I'm getting to know this code better, in theory at least.
> 
> Can you compare
> 
> 92bb36c80e561f82b1f4b63cc269a71833137841 - baseline
> 94afa384a6cddb565d7944c79aa3a2d536e4fe54 - preparation
> 86a8ed9560350c7ca12d2fb7e8d694a2ea4a60cc - conversion to gen6+ RPS routines
> 
> and verify where the bug gets introduced.

86a8ed9560350c7ca12d2fb7e8d694a2ea4a60cc
 
> 
> (In reply to dimon from comment #13)
> > Wow, after a fresh reboot reclocking worked for a short period of time.
> > I was getting DE_PCU_EVENTs, pstates alternated between 0 and 10.
> > Then it stopped generating DE_PCU_EVENTs and stopped reclocking.
> 
> Ah, that's probably my runtime pm experiments to disable the interrupts when
> idle. If you jump back to 57c6b89256000f038a5bbc65eb745a6ad3858a6c does it
> continue to work even after idle?

no, it doesn't
Comment 17 Jani Nikula 2016-04-21 12:32:11 UTC
We seem to have neglected this bug a bit. Apologies.

Does the problem persist with latest kernels?
Comment 18 Ricardo 2017-02-21 16:07:41 UTC
I going to close this bug since has not been a response for several months if the problem still exist please open a new bug with current information and we can look at it
Comment 19 Chris Wilson 2017-02-21 16:11:11 UTC
The situation has not changed.
Comment 20 Elizabeth 2017-07-28 22:34:10 UTC
(In reply to Chris Wilson from comment #19)
> The situation has not changed.
Hello Dimon, Chris, 
Any update on this case?
Thank you.
Comment 21 dimon 2017-08-19 00:11:30 UTC
Hi Elizabeth,

I don't have the time now to test it on the latest kernel.
Thx for asking.

bugzilla-daemon@freedesktop.org writes:

> https://bugs.freedesktop.org/show_bug.cgi?id=90137
>
> Elizabeth <elizabethx.de.la.torre.mena@intel.com> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|REOPENED                    |NEEDINFO
>
> --- Comment #20 from Elizabeth <elizabethx.de.la.torre.mena@intel.com> ---
> (In reply to Chris Wilson from comment #19)
>> The situation has not changed.
> Hello Dimon, Chris, 
> Any update on this case?
> Thank you.
>
> -- 
> You are receiving this mail because:
> You reported the bug.
Comment 22 Chris Wilson 2017-08-19 10:05:58 UTC
It's still a valid bug, we haven't done anything to progress the ips interaction. My first plan is to still try and get ilk using the same rps framework as we have for gen6+, so that it is coupled into the waitboosting framework. Tamo.
Comment 23 Elizabeth 2017-12-08 23:45:10 UTC
Friendly ping, Chris, Dimon, any progress? Thank you.
Comment 24 Chris Wilson 2017-12-11 11:37:24 UTC
Fwiw, latest code drop is https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=ilk-ips

That restores rs/rc6 on ilk and gives us waitboosting. However, it may cause more frequent thermal throttling.
Comment 25 Jani Saarinen 2018-03-29 07:10:22 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 26 Jani Saarinen 2018-04-25 06:24:51 UTC
Has anyone tested branch mentioned on comment #24?
Comment 27 Jani Saarinen 2018-05-04 07:54:32 UTC
Closing, please re-open if occurs again.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.