Bug 102243 - [IGT][BDW] spurious dmesg "Switching to FCLK failed" vairous kms_cursor_crc and/or kms_flip tests
Summary: [IGT][BDW] spurious dmesg "Switching to FCLK failed" vairous kms_cursor_crc a...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Marta Löfstedt
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-16 09:38 UTC by Marta Löfstedt
Modified: 2017-09-20 12:12 UTC (History)
1 user (show)

See Also:
i915 platform: BDW
i915 features: display/Other


Attachments
patch used for printing time taken by waiting (3.11 KB, patch)
2017-08-24 11:37 UTC, Marta Löfstedt
no flags Details | Splinter Review
dmesg snipped with logs where timeout is less than 1us (12.26 KB, text/plain)
2017-08-24 11:42 UTC, Marta Löfstedt
no flags Details
dmesg snipped with logs where timeout is more than 1us (12.55 KB, text/plain)
2017-08-24 11:42 UTC, Marta Löfstedt
no flags Details
times waiting for fclk update to stick (4.43 KB, text/plain)
2017-08-24 11:53 UTC, Marta Löfstedt
no flags Details
times waiting for fclk update to stick with more debug prints (1.19 KB, text/plain)
2017-08-24 11:58 UTC, Marta Löfstedt
no flags Details
suggestion for fix (908 bytes, patch)
2017-08-24 13:00 UTC, Marta Löfstedt
no flags Details | Splinter Review

Description Marta Löfstedt 2017-08-16 09:38:00 UTC
When running a list of all kms tests I am getting the following dmesg: 
"[drm:bdw_set_cdclk [i915]] *ERROR* Switching to FCLK failed"

The failure rate is about 1/5 runs and the affected tests so far are:
igt@kms_cursor_crc@cursor-256x256-suspend
igt@kms_cursor_crc@cursor-128x128-suspend
igt@kms_flip@vblank-vs-suspend
igt@kms_flipv@blank-vs-dpms-suspend-interruptible
igt@kms_flip@flip-vs-dpms-interruptible

I have not seen this on drm-tip 4.12-0.rc7+, but I now get this staring from ~4.13-0-rc3+
Comment 1 Marta Löfstedt 2017-08-17 06:01:11 UTC
Now also reproduced drm-tip@5118c7fa9 igt@eda8cc9f 1/5 runs of kms-all testlist.
Comment 2 Marta Löfstedt 2017-08-21 07:09:42 UTC
Also reproduce 170818 on:
igt@kms_flipv@blank-vs-dpms-suspend
gt@kms_flipv@flip-vs-modeset-interruptible

1/25 runs.
Comment 3 Marta Löfstedt 2017-08-21 07:58:16 UTC
I also have gotten once:
[drm:gen8_irq_handler [i915]] *ERROR* The master control interrupt lied (DE PIPE)!
Comment 4 Marta Löfstedt 2017-08-23 07:35:38 UTC
run 10 times with increased timeout:

+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -673,7 +673,7 @@ static void bdw_set_cdclk(struct drm_i915_private *dev_priv,
        I915_WRITE(LCPLL_CTL, val);
 
        if (wait_for_us(I915_READ(LCPLL_CTL) &
-                       LCPLL_CD_SOURCE_FCLK_DONE, 1))
+                       LCPLL_CD_SOURCE_FCLK_DONE, 1000))
                DRM_ERROR("Switching to FCLK failed\n");

The issue can not be reproduced.
Comment 5 Marta Löfstedt 2017-08-24 11:37:50 UTC
Created attachment 133739 [details] [review]
patch used for printing time taken by waiting

I have done some runs while spinning a list of all igt@kms_cursor_crc subtests, while printing the time spent in "wait_for_us". Attached patch was used for the printing.
I increased timeout to 10us and hit the issue (i.e. waited longer that 1us) 28 times during 1247 invocations of the bdw_set_cdclk.
Comment 6 Marta Löfstedt 2017-08-24 11:42:29 UTC
Created attachment 133741 [details]
dmesg snipped with logs where timeout is less than 1us
Comment 7 Marta Löfstedt 2017-08-24 11:42:58 UTC
Created attachment 133742 [details]
dmesg snipped with logs where timeout is more than 1us
Comment 8 Marta Löfstedt 2017-08-24 11:53:34 UTC
Created attachment 133743 [details]
times waiting for fclk update to stick
Comment 9 Marta Löfstedt 2017-08-24 11:58:55 UTC
Created attachment 133744 [details]
times waiting for fclk update to stick with more debug prints
Comment 10 Marta Löfstedt 2017-08-24 12:02:48 UTC
I have been able to reproduce the issue with more logs but the reproduction rate appear significantly lower 2/306 runs.
I seen no difference in the dmesg snippets where we have waited < 1 us or above 1 us.

The average waited time without logs: 263 ns
The average waited time with logs: 263 ns
Comment 11 Marta Löfstedt 2017-08-24 12:55:02 UTC
(In reply to Marta Löfstedt from comment #10)
> I have been able to reproduce the issue with more logs but the reproduction
> rate appear significantly lower 2/306 runs.
> I seen no difference in the dmesg snippets where we have waited < 1 us or
> above 1 us.
> 
> The average waited time without logs: 263 ns
> The average waited time with logs: 263 ns

Sorry bad copy-paste:
The average waited time with logs: 259 ns
Comment 12 Marta Löfstedt 2017-08-24 13:00:17 UTC
Created attachment 133746 [details] [review]
suggestion for fix

Added patch for a suggested fix, but I need to research a bit more before sending to list.
Comment 13 Marta Löfstedt 2017-09-06 10:22:40 UTC
reproduce 1/10 runs with same list as above with:

+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -665,6 +665,7 @@ static void bdw_set_cdclk(struct drm_i915_private *dev_priv,
                return;
        }
 
+       pm_qos_update_request(&dev_priv->pm_qos, 0);
        val = I915_READ(LCPLL_CTL);
        val |= LCPLL_CD_SOURCE_FCLK;
        I915_WRITE(LCPLL_CTL, val);
@@ -672,7 +673,8 @@ static void bdw_set_cdclk(struct drm_i915_private *dev_priv,
        if (wait_for_us(I915_READ(LCPLL_CTL) &
                        LCPLL_CD_SOURCE_FCLK_DONE, 1))
                DRM_ERROR("Switching to FCLK failed\n");
-
+       pm_qos_update_request(&dev_priv->pm_qos, PM_QOS_DEFAULT_VALUE);
Comment 14 Marta Löfstedt 2017-09-07 10:53:39 UTC
Tested this which doesn't work either.

+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -669,10 +669,13 @@ static void bdw_set_cdclk(struct drm_i915_private *dev_priv,
        val |= LCPLL_CD_SOURCE_FCLK;
        I915_WRITE(LCPLL_CTL, val);
 
+       preempt_disable();
        if (wait_for_us(I915_READ(LCPLL_CTL) &
                        LCPLL_CD_SOURCE_FCLK_DONE, 1))
                DRM_ERROR("Switching to FCLK failed\n");
 
+       preempt_enable();
Comment 15 Marta Löfstedt 2017-09-08 09:34:56 UTC
sent up the workaround with increased polltime:
https://patchwork.freedesktop.org/patch/175688/

All though I believe 10us would be enough, after discussing with Imre I decided to go for 100us.
Comment 16 Marta Löfstedt 2017-09-12 09:43:28 UTC
My workaround has been merged.

commit 3164888a40469c102b5d6d1b756c7646e7eb19e7
Author: Marta Lofstedt <marta.lofstedt@intel.com>
Date:   Fri Sep 8 16:28:29 2017 +0300

    drm/i915: Increase poll time for BDW FCLK_DONE
Comment 17 Marta Löfstedt 2017-09-13 06:56:17 UTC
This is now tested OK.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.