Bug 102359

Summary: [BAT][KBL] *ERROR* failed to enable link training
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: shashank.sharma <shashank.sharma>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: highest CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: KBL i915 features: display/DP

Description Martin Peres 2017-08-22 14:43:34 UTC
On CI_DRM_2989, the machine fi-kbl-7260u produced the following wardning when running igt@gem_exec_suspend@basic-s3:

[  299.955251] [drm:intel_enable_shared_dpll [i915]] enabling DPLL 1
[  299.957459] [drm:intel_power_well_enable [i915]] enabling DDI B IO power well
[  299.958752] [drm:intel_dp_set_signal_levels [i915]] Using signal levels 00000000
[  299.958796] [drm:intel_dp_set_signal_levels [i915]] Using vswing level 0
[  299.958835] [drm:intel_dp_set_signal_levels [i915]] Using pre-emphasis level 0
[  299.958875] [drm:intel_dp_program_link_training_pattern [i915]] Using DP training pattern TPS1
[  299.995390] [drm:drm_dp_dpcd_access] Too many retries, giving up. First error: -5
[  299.995415] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[  299.995433] [drm:intel_dp_start_link_train [i915]] [CONNECTOR:58:DP-1] Link Training failed at link rate = 162000, lane count = 1
[  299.995451] [drm:intel_dp_get_link_train_fallback_values [i915]] *ERROR* Link Training Unsuccessful

This error can also be triggered easily when running full IGT on shard-kbl: https://intel-gfx-ci.01.org/tree/drm-tip/shards.html

Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2989/fi-kbl-7260u/igt@gem_exec_suspend@basic-s3.html
Comment 1 Jani Saarinen 2017-09-05 11:51:17 UTC
Latest investigations are directing more to LSPCON.
Comment 2 Manasi 2017-09-05 18:51:30 UTC
@Shashank let me know if I can help, I have the KBL system set up here as well.
Comment 3 Marta Löfstedt 2017-10-09 10:28:02 UTC
Note I also file KBL-shards issues with this dmesg pattern to this bug:
	
[  150.524032] Setting dangerous option reset - tainting kernel
[  150.524993] Setting dangerous option reset - tainting kernel
[  166.763592] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[  166.836780] [drm:intel_dp_set_idle_link_train [i915]] *ERROR* Timed out waiting for DP idle patterns
[  166.972001] Setting dangerous option reset - tainting kernel

See:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3190/shard-kbl1/igt@kms_atomic_transition@1x-modeset-transitions-nonblocking-fencing.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3190/shard-kbl1/igt@kms_cursor_legacy@long-nonblocking-modeset-vs-cursor-atomic.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3190/shard-kbl1/igt@kms_busy@extended-modeset-hang-oldfb-render-B.html
Comment 4 shashank.sharma@intel.com 2017-10-11 08:28:11 UTC
We have discussed this failure with MCA, and they will investigate this issue further and update us soon, we are helping them to reproduce this issue.
Comment 5 shashank.sharma@intel.com 2017-10-12 10:56:06 UTC
We are not seeing these failures after this series:
https://patchwork.freedesktop.org/series/31639/

The results are here:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5967/shards.html 
Lets wait for merge of this series

Please note that [drm:intel_dp_set_idle_link_train [i915]] *ERROR* Timed out waiting for DP idle patterns doesn't cause Link training failure, and the display is correct even after this message.
Comment 6 shashank.sharma@intel.com 2017-10-12 14:50:35 UTC
(In reply to shashank.sharma@intel.com from comment #5)
> We are not seeing these failures after this series:
> https://patchwork.freedesktop.org/series/31639/
> 
> The results are here:
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5967/shards.html 
> Lets wait for merge of this series
> 
> Please note that [drm:intel_dp_set_idle_link_train [i915]] *ERROR* Timed out
> waiting for DP idle patterns doesn't cause Link training failure, and the
> display is correct even after this message.

All tests in this series:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5967/shards-all.html
Comment 7 Jani Nikula 2017-10-13 09:21:05 UTC
Probably fixed by these commits in drm-intel-next-queued:

commit f687e25a7a245952349f1f9f9cc238ac5a3be258
Author: Shashank Sharma <shashank.sharma@intel.com>
Date:   Thu Oct 12 22:10:08 2017 +0530

    drm: Add retries for lspcon mode detection

commit d18aef0f75436abb95894a230b504432df26c167
Author: Shashank Sharma <shashank.sharma@intel.com>
Date:   Tue Oct 10 15:37:43 2017 +0530

    drm/i915: Don't give up waiting on INVALID_MODE

commit a2fc4bd61e7ec3bb1f7c8b3d47272be813f88aea
Author: Shashank Sharma <shashank.sharma@intel.com>
Date:   Tue Oct 10 15:37:44 2017 +0530

    drm/i915: Add retries for LSPCON detection
Comment 8 Marta Löfstedt 2017-10-13 12:38:34 UTC
The fix was integrated in CI_DRM_3228, the issue is not reproduced. However, I would like to see results from a couple of more runs before I call it verified.
Comment 9 Jani Saarinen 2017-10-13 13:12:07 UTC
Lets resolve as patches are merged.
But follow will be done and only after few runs will be verified

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.