Bug 96614 - [BAT BDW] *ERROR* failed to enable link training/failed to start channel equalization
Summary: [BAT BDW] *ERROR* failed to enable link training/failed to start channel equa...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) All
: highest blocker
Assignee: Manasi
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 96913 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-06-21 12:55 UTC by Imre Deak
Modified: 2017-07-24 22:41 UTC (History)
3 users (show)

See Also:
i915 platform: BDW
i915 features: display/DP, power/suspend-resume


Attachments
dmesg (469.83 KB, application/gzip)
2016-06-21 12:55 UTC, Imre Deak
no flags Details
kernel log when issue occurs (61.06 KB, text/plain)
2016-08-22 17:44 UTC, Clayton Craft
no flags Details

Description Imre Deak 2016-06-21 12:55:28 UTC
Created attachment 124637 [details]
dmesg

http://gfxci.rb.intel.com/archive/results/CI_IGT_test/RO_CI_DRM_531/ro-bdw-i7-5557U/

After suspend-to-ram and resume on an external DP output:

[  291.867864] [drm:intel_enable_shared_dpll] enable LCPLL 1350 (active 1, on? 0) for crtc 26
[  291.867865] [drm:intel_enable_shared_dpll] enabling LCPLL 1350
[  291.870603] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7145000c
...
[  291.950407] [drm:drm_dp_dpcd_access] too many retries, giving up
...
[  292.119813] [drm:intel_dp_sink_dpms] failed to enable sink power state
...
[  292.373085] [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* failed to enable link training
...
[  292.458254] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel equalization
Comment 1 Daniel Vetter 2016-07-14 20:04:24 UTC
BAT is supposed to be highest priority, adjusting ...
Comment 2 Manasi 2016-07-28 18:25:42 UTC
What are the steps to reproduce? Could you also specify the SHA in drm_intel_nightly where this bug was seen.
Comment 3 Clayton Craft 2016-08-22 17:44:02 UTC
Created attachment 125952 [details]
kernel log when issue occurs

I can reproduce this consistently on a Skylake CPU i7-6770HQ w/ Iris Pro 580), using the drm_intel_nightly kernel. Here's the SHA: 64ade7575a59bb90334f4d0e6bb18cfc4e332ed3

I've also attached the kernel log, since the system is responsive over SSH when the issue occurs (but graphical display never loads on resume)
Comment 4 Mika Kahola 2016-08-23 07:03:14 UTC
*** Bug 96913 has been marked as a duplicate of this bug. ***
Comment 5 Clayton Craft 2016-08-24 21:36:15 UTC
Please let me know if there's anything else you need (logs, additional experiments, etc) to help debug/resolve this! Not being able to resume completely from suspend is brutal!
Comment 6 Jim Bride 2016-08-25 21:12:34 UTC
I tried to reproduce this on my SKL NUC running 4.8.0-rc3 pulled today using the following:

jbride@jbride-snuc:~/src/intel-gpu-tools/tests(2001)$ sudo ./kms_pipe_crc_basic --run-subtest suspend-read-crc-pipe-A
IGT-Version: 1.15-g6f002ab (x86_64) (Linux: 4.8.0-rc3-082516+ x86_64)
rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Aug 25 21:08:37 2016
suspend-read-crc-pipe-A: Testing connector DP-1 using pipe A
Subtest suspend-read-crc-pipe-A: SUCCESS (1.197s)

And to make sure things were ok:
jbride@jbride-snuc:~/src/intel-gpu-tools/tests(2003)$ dmesg | grep ERR
[    0.332913] acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM

It would be nice to know the complete configuration of the systems that reproduce this bug, including the make and model of any displays involved (external and internal.)
Comment 7 Clayton Craft 2016-08-26 03:12:15 UTC
System: NUC6i7KYK
External display: Dell UltraSharp U2515H
xorg-server version: 1.8.4
mesa: 12.0.1
xorg-video-intel DDX driver: NOT installed

I installed intel-gpu-tools 1.15 but was unable to find the exact test you ran. I am invoking S3/suspend via systemd (i.e. systemctl suspend)

I will sync with the latest linux-drm-nightly again just to see if it was magically fixed, but if you could tell me how to run the suspend test you ran below I could give that a shot too. Also, let me know if there's anything useful I could run out of the intel-gpu-tools kit that would help...
Comment 8 Clayton Craft 2016-08-26 23:18:48 UTC
After chatting with Jim B, I was able to determine what my problem was.. For one, I had actually plugged in HDMI and NOT DP but was fooled by the i915 modesetting driver calling the HDMI port a "DP" port. I was then pointed to some LSPCON patches but was unable to successfully boot the kernel with those patches applied (no access to serial console on this platform, no display output, no go).
After plugging in a DP cable to my monitor and setting the monitor to DP 1.2 (1.1 didn't work...), I was able to successfully load X AND resume from suspend. 

In summary, HDMI requires more than the LSPCON patches to function, and DP 1.2 seems to work OK.
Comment 9 dog 2016-09-14 15:53:30 UTC
What are the next steps required with this bug?  Does the BAT test still fail?  Given Comment #8, should this bug be closed and a new one opened if needed, focused on HDMI/LSPCON?
Comment 10 Jim Bride 2016-09-14 15:56:02 UTC
I can't speak for the BAT itself, but I do believe that this is a HDMI / LSPCON issue rather than a DP one.
Comment 11 Clayton Craft 2016-09-14 16:57:41 UTC
I was never able to get the LSPCON patches Jim mentioned to build nicely with the latest (at the time) linux-drm-nightly kernel, and gave up because DP was not showing the issue. 
As far as I could tell, the issue seemed isolated to HDMI, the only open question I couldn't answer was whether the LSPCON patches would have made any difference there..
Comment 12 Manasi 2016-09-14 22:17:35 UTC
To try this on the latest kernel, I have tested suspend-resume on the recent kernel 4.8.0-rc5 on my BDW NUC and I dont see link training failures/dpcd access retries. The IGT suspend-resume test passes:

[root@manasi-bdwnuc tests]# ./kms_pipe_crc_basic --run-subtest suspend-read-crc-pipe-A
IGT-Version: 1.15-g9d18866 (x86_64) (Linux: 4.8.0-rc5-din-031116_0912+ x86_64)
rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Sep 14 22:22:21 2016
suspend-read-crc-pipe-A: Testing connector DP-2 using pipe A
Subtest suspend-read-crc-pipe-A: SUCCESS (1.044s)

@Clayton: Could you tell me the configuration you were testing with HDMI that got detected as DP and never worked for you with LSPCON?

The LSPCON patches would make a difference only if the LSPCON chip (Megachips) is present on the motherboard and is enabled in the firmware. This is basically an active level shifter for HDMI signals to DP. 

Please let me know the configurations you tried with HDMI and any debug logs if you have. I am following up on if the final LSPCON patches are landed in our driver.

Regards
Manasi
Comment 13 Jani Nikula 2016-09-15 09:18:12 UTC
(In reply to Manasi from comment #12)
> The LSPCON patches would make a difference only if the LSPCON chip
> (Megachips) is present on the motherboard and is enabled in the firmware.
> This is basically an active level shifter for HDMI signals to DP.

To be more specific, LSPCON stands for Level Shifter / Protocol Converter, with two modes. Level shifter for DP dual-mode to drive HDMI 1.4 signal. Protocol Converter to convert DP to HDMI 2.0.
Comment 14 Jari Tahvanainen 2016-10-31 13:55:05 UTC
Please check out if CI_DRM test
https://intel-gfx-ci.01.org/CI/CI_DRM_1757/fi-skl-6770hq/igt@gem_exec_suspend@basic-s3.html
having same failure (and possible root cause)

It is triggered with different test 
/opt/igt/tests/gem_exec_suspend --run-subtest basic-S3
but the failure symptoms in dmesg are equal to given by Imre
...
[  193.405060] [drm:verify_single_dpll_state.isra.78 [i915]] DPLL 3
[  193.405103] [drm:intel_enable_shared_dpll [i915]] enable DPLL 1 (active 1, on? 0) for crtc 27
[  193.405125] [drm:intel_enable_shared_dpll [i915]] enabling DPLL 1
[  193.416309] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff
...
[   29.132536] [drm:lspcon_init [i915]] No LSPCON detected, found type 1 HDMI
[   29.132556] [drm:lspcon_init [i915]] *ERROR* Failed to probe lspcon
[   29.132579] [drm:intel_ddi_init [i915]] *ERROR* LSPCON init failed on port B
...
[  195.052248] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff
[  195.052256] [drm:drm_dp_dpcd_access] Too many retries, giving up. First error: -110
[  195.052268] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[  195.052278] [drm:intel_dp_program_link_training_pattern [i915]] Using DP training pattern TPS3
[  195.060914] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff
...
[  195.328772] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff
[  195.328801] [drm:drm_dp_dpcd_access] Too many retries, giving up. First error: -110
[  195.328812] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel equalization
[  195.337300] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff
...
Comment 15 Imre Deak 2016-11-23 19:51:44 UTC
There were two separate issues mixed in this bug: the BDW one which vanished from CI logs on the same machine and the SKL one which is hopefully solved by the recent LSPCON patches. Closing for now..


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.