100745 – amdgpu fails to wake up DisplayPort DELL monitors with 'clock recovery failed'

Bug 100745 - amdgpu fails to wake up DisplayPort DELL monitors with 'clock recovery failed'

Summary: amdgpu fails to wake up DisplayPort DELL monitors with 'clock recovery failed'

Status:	RESOLVED MOVED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/AMDgpu (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2017-04-21 04:01 UTC by mr.nuke.me
Modified:	2019-11-19 08:15 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments
log around the time the problem happens (with excessive debug info) (549.77 KB, text/plain) 2017-04-21 04:45 UTC, mr.nuke.me	no flags	Details
dmesg log error (4.05 KB, text/plain) 2017-11-22 18:37 UTC, Benjamin Bellec	no flags	Details
gnome-shell coredump after amdgpu displayport link status failed (1.45 KB, text/plain) 2018-01-29 11:07 UTC, Dimitrios Liappis	no flags	Details
View All

Description mr.nuke.me 2017-04-21 04:01:35 UTC

On a Fedora 25 system, under kernel 4.10.9, I have an RX480 with three Dell P2715Q monitors connected via displayport.

1. The machine is left alone, until the monitors are put into sleep mode.
2. The mouse is moved until the monitors show signs of coming up.

It is expected that all monitors come up cleanly and an unlock screen is presented.

What actually happens is that not all monitors come up. Some monitors indicate that no signal is coming. Which monitor or monitors fail to come up is non-deterministic.

Every time this happens, dmesg shows exactly three entries of the form:
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed

It doesn't matter how many of the three monitors come up, dmesg always shows this message three times.

I've modified the failure point to print the return value of drm_dp_dpcd_read_link_status(), and it comes back as -5. I believe that is -EIEIO

Also, switching to VT2, via Ctrl-Alt-F2 brings up all the monitors with 100% success rate. Switching back to VT1 may either:
* present a working unlock screen (20% of the time)
* present an unlock screen with Xorg being locked up in a poll() call (50% of the time)
* or completely crash Xorg (20% of the time)
* lock up the machine (10% of the time)
This procedure crashes wayland with 100% yield.

Comment 1 Edward O'Callaghan 2017-04-21 04:12:44 UTC

OK, I had a short look into this,

So it seems that we have that,

 amdgpu_atombios_dp_aux_transfer() calls amdgpu_atombios_dp_process_aux_chan()

which has either a ucReplyStatus == 2 or 3 from atombios returned.

If you could please attach dmesg logs after running

 # echo 0xf > /sys/module/drm/parameters/debug

and waiting for the situation to reoccur that would be most useful.

Comment 2 mr.nuke.me 2017-04-21 04:45:45 UTC

Created attachment 130957 [details]
log around the time the problem happens (with excessive debug info)

Comment 3 Edward O'Callaghan 2017-04-21 06:36:04 UTC

(In reply to mr.nuke.me from comment #2)
> Created attachment 130957 [details]
> log around the time the problem happens (with excessive debug info)

yes ok, so we are indeed hitting 'ucReplyStatus == 2' from atombios. Someone from AMD will have to determine the problem with that then because atombios is a closed component.

Comment 4 Harry Wentland 2017-04-21 13:20:10 UTC

Are the monitors set to DP input or to auto-select? If they are in auto-select will setting input to DP help?

I've seen auto-select mode have problems with DP many times, especially with scenarios like coming back from DPMS or S3 resume.

Comment 5 mr.nuke.me 2017-04-22 04:05:26 UTC

P2715Q does not have auto-select mode. They're always listening on the same input.

Comment 6 Eddie Ringle 2017-10-06 00:36:30 UTC

Wanted to add that I'm seeing this issue now under a similar setup. I've seen it in the past, but the last few kernel releases have been pretty smooth. Once I upgraded to GNOME 3.26, however, both 4.13 and now 4.14-rc3 are displaying this issue.

I'm on Arch (using Wayland primarily), with a Fury X and three Dell P2415Q monitors, also connected via DisplayPort. I have MST disabled on all three, since (even Dell has documented) this model has issues hitting 4K@60Hz with it enabled.

Same "displayport link status failed" and "clock recovery failed" messages appear for me, also three times in a row. This more often than not leads to gnome-shell crashing. I see it most often after I've put my computer to sleep when I try to wake it up. Other times when putting it to sleep, one monitor will stay powered and show a backlit blank screen.

Comment 7 Benjamin Bellec 2017-11-22 18:37:51 UTC

Created attachment 135668 [details]
dmesg log error

Comment 8 Benjamin Bellec 2017-11-22 18:41:28 UTC

I hit the same problem today after enabling amdgpu.dc=1
The screen doesn't light up at all if I boot the kernel with amdgpu.dc=1

Config is:
Fedora 27 + kernel 4.15.0-0.rc0.git7.1.fc28.x86_64
Radeon R9 380X
Dell U2414H


dmesg error is:
kernel: [drm:dm_logger_write [amdgpu]] *ERROR* perform_clock_recovery_sequence: Link Training Error, could not                          get CR after 100 tries.

Comment 9 Michel Dänzer 2017-11-23 08:55:43 UTC

(In reply to Benjamin Bellec from comment #8)
> I hit the same problem today after enabling amdgpu.dc=1
> The screen doesn't light up at all if I boot the kernel with amdgpu.dc=1

AFAICT this report is about the non-DC code, please file your own report about the issue with DC.

Comment 10 Dimitrios Liappis 2018-01-29 11:06:25 UTC

This is a real problem for me as well, for some time now, with amdgpu (Radeon RX560), Fedora-27, gnome-shell and Dell P2715Q monitor. It happens both on Xorg and Wayland.

> kernel: [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status failed
> kernel: [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed
> kernel: [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status failed
> kernel: [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed

The nuisance here is this almost always crashes gnome-shell. Attached coredump excerpt.

I used to be able to circumvent the gnome-shell crash by disabling dpms ("xset -dpms" and/or "xset dpms force off") but this doesn't seem to help anymore.

Comment 11 Dimitrios Liappis 2018-01-29 11:07:40 UTC

Created attachment 137017 [details]
gnome-shell coredump after amdgpu displayport link status failed

Comment 12 Michel Dänzer 2018-01-29 11:30:36 UTC

(In reply to Dimitrios Liappis from comment #10)
> The nuisance here is this almost always crashes gnome-shell. Attached
> coredump excerpt.

FWIW, that's most likely a gnome-shell/mutter bug.

Comment 13 Dimitrios Liappis 2018-02-04 20:03:02 UTC

(In reply to Michel Dänzer from comment #12)
> 
> FWIW, that's most likely a gnome-shell/mutter bug.

Thank you, indeed this is a mutter bug; I hunted the bug in https://bugzilla.gnome.org/show_bug.cgi?id=789501 and there is a specific patch for a monitor-manager/kms bug that fixes it, as described in https://bugzilla.gnome.org/show_bug.cgi?id=789501.

Comment 14 Kimmo 2018-08-05 17:07:44 UTC

Confirming still similar problem (Screen stays black while trying resume from suspend)
2x DELL u2415h + display port daisy chain + RX480 (amdgpu 18.0.1-2)

Comment 15 Kimmo 2018-08-06 21:24:06 UTC

(In reply to Kimmo from comment #14)
> Confirming still similar problem (Screen stays black while trying resume
> from suspend)
> 2x DELL u2415h + display port daisy chain + RX480 (amdgpu 18.0.1-2)

Actually need to correct myself. The suspend problem seems to be fixed and working ok for me so far. Problem seems to be more related if Dell monitor is allowed to shutdown by itself due to inactivity, but not sure if it has any relations to amdgpu. Using KDE plasma desktop 5.13.3. Sorry for inconvenience.

Comment 16 Martin Peres 2019-11-19 08:15:44 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/158.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.