Bug 99345

Summary: [KBL] [drm:skl_set_cdclk] *ERROR* failed to inform PCU about cdclk change
Product: DRI Reporter: Andrey <ryabinin.a.a>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: KBL i915 features:
Attachments:
Description Flags
dmesg
none
Fix PCU CDCLK change notification
none
dmesg_patch_applied
none
Fix with increased timeout
none
dmesg_patch_applied_v2 none

Description Andrey 2017-01-10 13:07:15 UTC
Created attachment 128854 [details]
dmesg

Commands
   # xset dpms force suspend && xset dpms force on
sometimes trigger the following error in dmesg:
    [drm:skl_set_cdclk] *ERROR* failed to inform PCU about cdclk change

Sometimes "*ERROR* CPU pipe A FIFO underrun" follows, in that case screen remains black. See https://bugs.freedesktop.org/show_bug.cgi?id=99344


dmesg with drm.debug=1e attached.

Kernel version - 4.9.2
Hardware - Dell XPS 13 9360
Comment 1 Imre Deak 2017-01-10 13:22:31 UTC
Created attachment 128855 [details] [review]
Fix PCU CDCLK change notification

Looks like bug 97929. Could you try the attached patch, I'm planning to send this - rebased - version to -stable.
Comment 2 Andrey 2017-01-10 14:03:36 UTC
Created attachment 128856 [details]
dmesg_patch_applied

Seems it triggers less often with a patch, but it didn't help.
Comment 3 Imre Deak 2017-01-10 14:41:49 UTC
Created attachment 128859 [details] [review]
Fix with increased timeout

(In reply to Andrey from comment #2)
> Created attachment 128856 [details]
> dmesg_patch_applied
> 
> Seems it triggers less often with a patch, but it didn't help.

Thanks. Looks like the poll succeeds at times only after ~10ms. Could you try and provide a dmesg again with the attached patch that increases the timeout?
Comment 4 Andrey 2017-01-10 16:15:42 UTC
Created attachment 128863 [details]
dmesg_patch_applied_v2

This new patch has a minor conflict against 4.9.2 kernel (I fixed it up locally),
but it fixes the issue for me.
Also, I don't see FIFO underruns anymore, so https://bugs.freedesktop.org/show_bug.cgi?id=99344 probably should be closed too.
Comment 5 Imre Deak 2017-01-12 13:33:21 UTC
(In reply to Andrey from comment #4)
> Created attachment 128863 [details]
> dmesg_patch_applied_v2
> 
> This new patch has a minor conflict against 4.9.2 kernel (I fixed it up
> locally),
> but it fixes the issue for me.
> Also, I don't see FIFO underruns anymore, so
> https://bugs.freedesktop.org/show_bug.cgi?id=99344 probably should be closed
> too.

Ok, please let me know if you see the 'failed to inform PCU' error with this patch after longer use. Would be also good to know the maximum poll duration you see based on the timestamp diff of the lines like the following two:

[   65.704858] [drm:skl_pcode_request] PCODE timeout, retrying with preemption disabled
[   65.707102] [drm:intel_update_cdclk] Current CD clock rate: 337500 kHz, VCO: 8100000 kHz, ref: 24000 kHz

In your first dmesg it was ~10ms in the second ~4ms.

Thanks.
Comment 6 Andrey 2017-01-17 09:26:25 UTC
(In reply to Imre Deak from comment #5)
> (In reply to Andrey from comment #4)
> Ok, please let me know if you see the 'failed to inform PCU' error with this
> patch after longer use. Would be also good to know the maximum poll duration
> you see based on the timestamp diff of the lines like the following two:
> 
> [   65.704858] [drm:skl_pcode_request] PCODE timeout, retrying with
> preemption disabled
> [   65.707102] [drm:intel_update_cdclk] Current CD clock rate: 337500 kHz,
> VCO: 8100000 kHz, ref: 24000 kHz
> 
> In your first dmesg it was ~10ms in the second ~4ms.
> 


I've been running suspend-resume display script for couple hours and haven't 
seen any 'failed to inform PCU' errors. 

There were 46 PCODE timeouts. The maximum timestamp diff was 34ms:
[ 6689.473863] [drm:skl_pcode_request] PCODE timeout, retrying with preemption disabled
[ 6689.508758] [drm:intel_update_cdclk] Current CD clock rate: 337500 kHz, VCO: 8100000 kHz, ref: 24000 kHz

There was also one 15ms. All other timeouts >=4ms.
Comment 7 yann 2017-02-20 16:03:40 UTC
Reference to Imre's patch to increase timeout value: https://patchwork.freedesktop.org/series/19961/
Comment 8 Imre Deak 2017-03-01 11:19:07 UTC
The fix is pushed to drm-tip.
Comment 9 Imre Deak 2017-03-20 14:54:47 UTC
Andrey, could you still provide the output of the following to get a better idea of the firmware version you are using?:
# rdmsr 0x8b
Comment 10 Andrey 2017-03-20 15:20:15 UTC
(In reply to Imre Deak from comment #9)
> Andrey, could you still provide the output of the following to get a better
> idea of the firmware version you are using?:
> # rdmsr 0x8b

0x8b is microcode version, right?
Today it's 0x48:
 $ rdmsr 0x8b
 4800000000

But there was an update since I reported this bug.
And according to dmesg the revision was 0x42:
[    1.087759] microcode: sig=0x806e9, pf=0x80, revision=0x42

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.