Bug 99345 - [KBL] [drm:skl_set_cdclk] *ERROR* failed to inform PCU about cdclk change
Summary: [KBL] [drm:skl_set_cdclk] *ERROR* failed to inform PCU about cdclk change
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-10 13:07 UTC by Andrey
Modified: 2017-03-20 15:20 UTC (History)
1 user (show)

See Also:
i915 platform: KBL
i915 features:


Attachments
dmesg (282.93 KB, text/plain)
2017-01-10 13:07 UTC, Andrey
no flags Details
Fix PCU CDCLK change notification (8.45 KB, patch)
2017-01-10 13:22 UTC, Imre Deak
no flags Details | Splinter Review
dmesg_patch_applied (888.07 KB, text/plain)
2017-01-10 14:03 UTC, Andrey
no flags Details
Fix with increased timeout (8.45 KB, patch)
2017-01-10 14:41 UTC, Imre Deak
no flags Details | Splinter Review
dmesg_patch_applied_v2 (775.21 KB, text/plain)
2017-01-10 16:15 UTC, Andrey
no flags Details

Description Andrey 2017-01-10 13:07:15 UTC
Created attachment 128854 [details]
dmesg

Commands
   # xset dpms force suspend && xset dpms force on
sometimes trigger the following error in dmesg:
    [drm:skl_set_cdclk] *ERROR* failed to inform PCU about cdclk change

Sometimes "*ERROR* CPU pipe A FIFO underrun" follows, in that case screen remains black. See https://bugs.freedesktop.org/show_bug.cgi?id=99344


dmesg with drm.debug=1e attached.

Kernel version - 4.9.2
Hardware - Dell XPS 13 9360
Comment 1 Imre Deak 2017-01-10 13:22:31 UTC
Created attachment 128855 [details] [review]
Fix PCU CDCLK change notification

Looks like bug 97929. Could you try the attached patch, I'm planning to send this - rebased - version to -stable.
Comment 2 Andrey 2017-01-10 14:03:36 UTC
Created attachment 128856 [details]
dmesg_patch_applied

Seems it triggers less often with a patch, but it didn't help.
Comment 3 Imre Deak 2017-01-10 14:41:49 UTC
Created attachment 128859 [details] [review]
Fix with increased timeout

(In reply to Andrey from comment #2)
> Created attachment 128856 [details]
> dmesg_patch_applied
> 
> Seems it triggers less often with a patch, but it didn't help.

Thanks. Looks like the poll succeeds at times only after ~10ms. Could you try and provide a dmesg again with the attached patch that increases the timeout?
Comment 4 Andrey 2017-01-10 16:15:42 UTC
Created attachment 128863 [details]
dmesg_patch_applied_v2

This new patch has a minor conflict against 4.9.2 kernel (I fixed it up locally),
but it fixes the issue for me.
Also, I don't see FIFO underruns anymore, so https://bugs.freedesktop.org/show_bug.cgi?id=99344 probably should be closed too.
Comment 5 Imre Deak 2017-01-12 13:33:21 UTC
(In reply to Andrey from comment #4)
> Created attachment 128863 [details]
> dmesg_patch_applied_v2
> 
> This new patch has a minor conflict against 4.9.2 kernel (I fixed it up
> locally),
> but it fixes the issue for me.
> Also, I don't see FIFO underruns anymore, so
> https://bugs.freedesktop.org/show_bug.cgi?id=99344 probably should be closed
> too.

Ok, please let me know if you see the 'failed to inform PCU' error with this patch after longer use. Would be also good to know the maximum poll duration you see based on the timestamp diff of the lines like the following two:

[   65.704858] [drm:skl_pcode_request] PCODE timeout, retrying with preemption disabled
[   65.707102] [drm:intel_update_cdclk] Current CD clock rate: 337500 kHz, VCO: 8100000 kHz, ref: 24000 kHz

In your first dmesg it was ~10ms in the second ~4ms.

Thanks.
Comment 6 Andrey 2017-01-17 09:26:25 UTC
(In reply to Imre Deak from comment #5)
> (In reply to Andrey from comment #4)
> Ok, please let me know if you see the 'failed to inform PCU' error with this
> patch after longer use. Would be also good to know the maximum poll duration
> you see based on the timestamp diff of the lines like the following two:
> 
> [   65.704858] [drm:skl_pcode_request] PCODE timeout, retrying with
> preemption disabled
> [   65.707102] [drm:intel_update_cdclk] Current CD clock rate: 337500 kHz,
> VCO: 8100000 kHz, ref: 24000 kHz
> 
> In your first dmesg it was ~10ms in the second ~4ms.
> 


I've been running suspend-resume display script for couple hours and haven't 
seen any 'failed to inform PCU' errors. 

There were 46 PCODE timeouts. The maximum timestamp diff was 34ms:
[ 6689.473863] [drm:skl_pcode_request] PCODE timeout, retrying with preemption disabled
[ 6689.508758] [drm:intel_update_cdclk] Current CD clock rate: 337500 kHz, VCO: 8100000 kHz, ref: 24000 kHz

There was also one 15ms. All other timeouts >=4ms.
Comment 7 yann 2017-02-20 16:03:40 UTC
Reference to Imre's patch to increase timeout value: https://patchwork.freedesktop.org/series/19961/
Comment 8 Imre Deak 2017-03-01 11:19:07 UTC
The fix is pushed to drm-tip.
Comment 9 Imre Deak 2017-03-20 14:54:47 UTC
Andrey, could you still provide the output of the following to get a better idea of the firmware version you are using?:
# rdmsr 0x8b
Comment 10 Andrey 2017-03-20 15:20:15 UTC
(In reply to Imre Deak from comment #9)
> Andrey, could you still provide the output of the following to get a better
> idea of the firmware version you are using?:
> # rdmsr 0x8b

0x8b is microcode version, right?
Today it's 0x48:
 $ rdmsr 0x8b
 4800000000

But there was an update since I reported this bug.
And according to dmesg the revision was 0x42:
[    1.087759] microcode: sig=0x806e9, pf=0x80, revision=0x42


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.