Bug 92878 - eDP link clock recovery fails with *ERROR* too many full retries, give up
Summary: eDP link clock recovery fails with *ERROR* too many full retries, give up
Status: CLOSED DUPLICATE of bug 96436
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-09 23:58 UTC by Jeronimo Martinez
Modified: 2017-07-24 22:44 UTC (History)
8 users (show)

See Also:
i915 platform: BDW
i915 features: display/DP


Attachments
dmesg(dmr.debug=0xe) lspci lshw dmidecode (41.58 KB, text/plain)
2015-11-09 23:58 UTC, Jeronimo Martinez
no flags Details
plain text dmesg (148.20 KB, text/plain)
2015-11-10 10:51 UTC, Jani Nikula
no flags Details
dmesg(dmr.debug=0xe) 20151110 (148.20 KB, text/plain)
2015-11-11 01:10 UTC, Jeronimo Martinez
no flags Details
dmesg(dmr.debug=0xe) 20151110 (148.69 KB, text/plain)
2015-11-11 01:15 UTC, Jeronimo Martinez
no flags Details
dmesg 20160203 with patch for logging link state (156.64 KB, text/plain)
2016-02-03 22:14 UTC, Jeronimo Martinez
no flags Details
The patch (883 bytes, patch)
2016-02-03 22:18 UTC, Jeronimo Martinez
no flags Details | Splinter Review
dmesg for v4.7-rc1 (Clevo P651RA, annotated with commands and drm.debug=0xe) (131.52 KB, text/plain)
2016-06-01 22:38 UTC, Peter Wu
no flags Details

Description Jeronimo Martinez 2015-11-09 23:58:45 UTC
Created attachment 119525 [details]
dmesg(dmr.debug=0xe) lspci lshw dmidecode

I have a Toshiba Satellite P50t-C-104, it has an Intel Broadwell i7-5500U CPU with an itegrated graphics card and a nvidia GPU 950M. For the moment I'm just trying to get the intel card working, all the linux distributions I have tried fail to initialize the kernel modesetting, including after updating to the official 4.3 kernel. It works though when using the latest drm-intel-nightly complided from git, but there is a lot of functionality missing.

The biggest problem is that, if the backlight goes off (either by suspending the laptop or by waiting for the screen to power off after a few minutes of no use), then when the backlight goes on again the screen stays black until I reboot.

Other issues are:
 - I had to switch back to uxa "Accel method" in xorg.conf, the default sna did't draw most of the text characters in the desktop.
 - Function keys don't work.
 
- system architecture: x86_64
- kernel version: 4.3.0-drm-intel-nightly-20151109+
   commit 62699a66482e7b56e8bfc0683d11d509b022147d
   Author: Matt Roper <matthew.d.roper@intel.com>
   Date:   Mon Nov 9 10:34:57 2015 -0800
    drm-intel-nightly: 2015y-11m-09d-18h-34m-23s UTC integration manifest

- Linux distribution: Linux Mint 17.2
- Display connector: laptop lcd screen
Comment 1 Jani Nikula 2015-11-10 10:51:08 UTC
Created attachment 119536 [details]
plain text dmesg
Comment 2 Jani Nikula 2015-11-10 11:20:48 UTC
AFAICT this has nothing to do with backlight.

From the logs:

[    0.900606] [drm:intel_dp_get_dpcd] DPCD: 12 14 84 41 00 00 01 01 02 00 00 00 00 0b 00

[    0.900903] [drm:intel_dp_get_dpcd] Display Port TPS3 support: source yes, sink no
[    0.901190] [drm:intel_dp_print_rates] source rates: 162000, 270000, 540000
[    0.901192] [drm:intel_dp_print_rates] sink rates: 162000, 270000, 540000
[    0.901194] [drm:intel_dp_print_rates] common rates: 162000, 270000, 540000

[  244.397787] [drm:intel_dp_start_link_train [i915]] *ERROR* 5.4 Gbps link rate without HBR2/TPS3 support

The display reports it has DPCD rev 1.2, and supports max link rate of 5.4 Gbps per lane i.e. HBR2. However, it reports it does not support TPS3, even though that support is mandatory for downstream devices that support HBR2.

*sigh*
Comment 3 Jani Nikula 2015-11-10 11:28:13 UTC
Please try this patch on top of current drm-intel-nightly:

diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
index 88887938e0bf..ea83d02b3528 100644
--- a/drivers/gpu/drm/i915/intel_dp_link_training.c
+++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
@@ -234,8 +234,10 @@ intel_dp_link_training_channel_equalization(struct intel_dp *intel_dp)
 	if (intel_dp_source_supports_hbr2(intel_dp) &&
 	    drm_dp_tps3_supported(intel_dp->dpcd))
 		training_pattern = DP_TRAINING_PATTERN_3;
-	else if (intel_dp->link_rate == 540000)
+	else if (intel_dp->link_rate == 540000) {
+		training_pattern = DP_TRAINING_PATTERN_3;
 		DRM_ERROR("5.4 Gbps link rate without HBR2/TPS3 support\n");
+	}
 
 	/* channel equalization */
 	if (!intel_dp_set_link_train(intel_dp,
Comment 4 Jeronimo Martinez 2015-11-11 01:09:30 UTC
I applied the patch on top of

commit 6d186c7f0397fde53b04cf916d9b1692a5ace304
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Tue Nov 10 22:33:47 2015 +0200

    drm-intel-nightly: 2015y-11m-10d-20h-33m-26s UTC integration manifest


But I don't seen any diffence, the screen is still black after resuming from suspend. Please find the new dmesg attached.
Comment 5 Jeronimo Martinez 2015-11-11 01:10:30 UTC
Created attachment 119557 [details]
dmesg(dmr.debug=0xe) 20151110
Comment 6 Jeronimo Martinez 2015-11-11 01:15:59 UTC
Created attachment 119558 [details]
dmesg(dmr.debug=0xe) 20151110

Correct dmesg log, ignore previous one
Comment 7 Jesse Barnes 2015-11-24 17:36:36 UTC
Looks like we are trying to drive it that way:

[   81.896111] [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* too many full retries, give up
[   81.896122] [drm:intel_dp_start_link_train [i915]] *ERROR* 5.4 Gbps link rate without HBR2/TPS3 support

I guess we could try assuming that sinks w/o TPS3 can only support up to 2.7
Comment 8 cprigent 2015-11-24 17:37:11 UTC
Bug scrub:
Assigned to Jani Nikula
Comment 9 Jani Nikula 2015-11-24 17:41:50 UTC
(In reply to Jesse Barnes from comment #7)
> Looks like we are trying to drive it that way:
> 
> [   81.896111] [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR*
> too many full retries, give up
> [   81.896122] [drm:intel_dp_start_link_train [i915]] *ERROR* 5.4 Gbps link
> rate without HBR2/TPS3 support
> 
> I guess we could try assuming that sinks w/o TPS3 can only support up to 2.7

I asked Sivakumar, he says it's possible to do HBR2 with TPS2.
Comment 10 Jeronimo Martinez 2015-12-01 22:48:59 UTC
The laptop has a 4K display, so I think it needs to support 5.4 Gbps. In any case the first error in the log is in the training clock recovery phase. 

Is there anything I can do to help with this issue, like testing patches or debugging? This laptop is basically unusable for me until this bug is resolved.
Comment 11 Jeronimo Martinez 2015-12-07 21:29:21 UTC
I tried with a newer kernel version (4.4.0-rc3 commit af938ab8...) and I started getting the "eDP link clock recovery fails" while booting, not after resuming from suspending. Then I did a bisect on changes related to i915, finding commit 7383123.. as the one which introduced the issue, so if I use i915.fastboot=1 in the latest version of the kernel it boots again. If you can fastboot skipping the  training process, can you do the same when resuming?
Comment 12 Ander Conselvan de Oliveira 2016-02-03 14:11:35 UTC
Could you apply attachment 118562 [details] [review] (from bug 90963) and attach a dmesg with debug again. This would show what exactly the sink requests when the driver sets vswing to level 2 and pre-emphasis to 1.

This could be related to an issue Sivakumar  once mentioned, that the driver handles the request from the sink wrong, using the maximum supported pre-emphasis for the requested voltage swing, instead of using the requested pre-emphasis with the maximum support vswing.
Comment 13 Jeronimo Martinez 2016-02-03 22:14:52 UTC
Created attachment 121502 [details]
dmesg 20160203 with patch for logging link state
Comment 14 Jeronimo Martinez 2016-02-03 22:18:46 UTC
Created attachment 121503 [details] [review]
The patch

I changed the patch, because the original link training code was moved to a separate file. It is applied on top of:

commit 1ff67c58aa3d7cacd37451397c740b8df27994e6
Author: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date:   Wed Feb 3 10:22:53 2016 -0800

    drm-intel-nightly: 2016y-02m-03d-18h-22m-38s UTC integration manifest
Comment 15 Ander Conselvan de Oliveira 2016-02-04 08:06:57 UTC
(In reply to Ander Conselvan de Oliveira from comment #12)
> This could be related to an issue Sivakumar  once mentioned, that the driver
> handles the request from the sink wrong, using the maximum supported
> pre-emphasis for the requested voltage swing, instead of using the requested
> pre-emphasis with the maximum support vswing.

[  204.649626] [drm:intel_dp_set_signal_levels] Using signal levels 00000000
[  204.649627] [drm:intel_dp_set_signal_levels] Using vswing level 0
[  204.649628] [drm:intel_dp_set_signal_levels] Using pre-emphasis level 0
[  204.650229] [drm:intel_dp_link_training_clock_recovery] link status: 00 00 80 00 66 66
[  204.650230] [drm:intel_dp_set_signal_levels] Using signal levels 08000000
[  204.650231] [drm:intel_dp_set_signal_levels] Using vswing level 2
[  204.650231] [drm:intel_dp_set_signal_levels] Using pre-emphasis level 1
[  204.650813] [drm:intel_dp_link_training_clock_recovery] link status: 00 00 00 00 66 66

Unfortunately that's not the issue. The sink really does request vswing 2 and pre emphasis 1. One interesting thing is in the first try the LINK_STATUS_UPDATED bit is set. It is clear in all other attempts. I wonder if the device expects us to iterate at max voltage.

Another possibility is that the hardware is just not programmed correctly, so the signal the sink expects is not going through the link. That would help explain bug 93517 too.
Comment 16 Jeronimo Martinez 2016-03-29 21:30:05 UTC
Is there any way to test that hypothesis? 

If the hardware is not programmed correctly, the windows drivers knows how to get around that, it works fine, no problem powering off/on the display.
Comment 17 Marco Trevisan (Treviño) 2016-04-20 23:00:18 UTC
I'm getting this also when docking / undocking a few times my T460p (skylake):

[78290.365783] thinkpad_acpi: docked into hotplug port replicator
[78291.075896] [drm:intel_dp_link_training_clock_recovery [i915_bpo]] ERROR too many voltage retries, give up [78291.092347] [drm:intel_wait_ddi_buf_idle [i915_bpo]] ERROR Timeout waiting for DDI BUF D idle bit
Comment 18 Peter Wu 2016-06-01 22:38:48 UTC
Created attachment 124247 [details]
dmesg for v4.7-rc1 (Clevo P651RA, annotated with commands and drm.debug=0xe)

Sometimes (with kernels from 4.5.x, 4.6, 4.7-rc1) when my eDP screen goes into sleep, the scren stays black while the backlight is on. DP link training errors show up in dmesg. This happens on both a Clevo P651RA and P671RA (similar models except for the screen size), both with a resolution of 1920x1080.

According to the P671RA user, switching between a text console and the graphical console sometimes works (works too for me). For my P651RA, so far (2x) these commands also revives the screen:

cd /sys/class/drm/card0-eDP-1
echo off > status; sleep .1; echo detect > status

For testing purposes, you can use this instead of waiting for the sleep:
xset dpms force suspend

I can fully reproduce the issue with:
while :; do
    xset dpms force suspend;
    read -p WAIT -s;
    echo REVIVE;
    echo > /sys/class/drm/card0-eDP-1/status off;
    sleep .1;
    echo > /sys/class/drm/card0-eDP-1/status detect;
    echo DONE;
    read -p NEXT -s;
done

When "WAIT" is displayed, press a key. Screen will be black. Press Enter and notice "REVIVE" followed by "DONE". Then the screen will be back. Finally press Enter at "NEXT" to repeat the test. This was tested on v4.7-rc1 + unrelated PCI/PM and nouveau patches on Arch Linux (Xorg 1.18.3, xf86-video-intel 1:2.99.917+645+g88733a7-1).


Jeronimo, if your "function key" issues are related to brightness adjustment, can you try these patches:
https://lists.freedesktop.org/archives/intel-gfx/2016-March/090593.html
https://lists.freedesktop.org/archives/intel-gfx/2016-May/096910.html

otherwise please submit your machine to https://bugs.launchpad.net/bugs/752542
Comment 19 yann 2016-06-08 17:38:05 UTC
Consolidating "clock recovery fails with too many full/voltage retries" bugs into one bug
So continue to track this in bug 96436

*** This bug has been marked as a duplicate of bug 96436 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.