Bug 100949 - Black screen, DP link training errors
Summary: Black screen, DP link training errors
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-05 18:43 UTC by Luya Tshimbalanga
Modified: 2019-11-19 08:16 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg boot from X550Z (109.22 KB, text/plain)
2017-05-05 18:43 UTC, Luya Tshimbalanga
no flags Details
dmesg sorted by drm (9.65 KB, text/plain)
2017-05-08 08:12 UTC, Luya Tshimbalanga
no flags Details

Description Luya Tshimbalanga 2017-05-05 18:43:49 UTC
Created attachment 131232 [details]
dmesg boot from X550Z

Recent kernel broke the handling of power management for CIK/SI hybrid laptop using the latest 4.9.25-20170502 from https://cgit.freedesktop.org/~agd5f/linux/?h=amd-staging-4.9

Notable highlight:

[   25.671428] kfd kfd: Allocated 3944480 bytes on gart for device(1002:130d)
[   25.671449] kfd kfd: error getting iommu info. is the iommu enabled?
[   25.671493] kfd kfd: Error initializing iommuv2 for device (1002:130d)
[   25.671558] Creating topology SYSFS entries
[   25.671642] kfd kfd: device (1002:130d) NOT added due to errors

[  695.257618] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status failed
[  695.257654] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed

For a while, suspend/resume will lock up in back screen forcing a hard reset. Laptop is Asus X500ZE equipped with Kaveri and Hainan dual GPU.
Comment 1 Michel Dänzer 2017-05-08 07:54:05 UTC
Can you bisect?
Comment 2 Luya Tshimbalanga 2017-05-08 08:12:50 UTC
Created attachment 131255 [details]
dmesg sorted by drm

dmesg sorted by drm attached, Noticeable problems are

[   28.135157] [drm] Internal thermal controller without fan control

[ 9815.904855] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status failed
[ 9815.904891] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed


Narrowing down to amdgpu which detailed the hardware
$ dmesg | grep amdgpu
[   23.120326] [drm] amdgpu kernel modesetting enabled.
[   23.618949] fb: switching to amdgpudrmfb from EFI VGA
[   23.619716] amdgpu 0000:00:01.0: VM size (-1) must be a power of 2
[   23.619975] amdgpu 0000:00:01.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
[   23.619977] amdgpu 0000:00:01.0: GTT: 3072M 0x0000000040000000 - 0x00000000FFFFFFFF
[   23.620094] [drm] amdgpu: 1024M of VRAM memory ready
[   23.620095] [drm] amdgpu: 3072M of GTT memory ready.
[   23.721480] amdgpu 0000:00:01.0: amdgpu: using MSI.
[   23.721510] [drm] amdgpu: irq initialized.
[   23.721519] [drm] amdgpu: dpm initialized
[   23.724341] [drm] amdgpu atom DIG backlight initialized
[   24.430744] amdgpu 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000040000010, cpu addr 0xffffa02179c2d010
[   24.430790] amdgpu 0000:00:01.0: fence driver on ring 1 use gpu addr 0x0000000040000020, cpu addr 0xffffa02179c2d020
[   24.430843] amdgpu 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000040000030, cpu addr 0xffffa02179c2d030
[   24.430921] amdgpu 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000040000040, cpu addr 0xffffa02179c2d040
[   24.430971] amdgpu 0000:00:01.0: fence driver on ring 4 use gpu addr 0x0000000040000050, cpu addr 0xffffa02179c2d050
[   24.431066] amdgpu 0000:00:01.0: fence driver on ring 5 use gpu addr 0x0000000040000060, cpu addr 0xffffa02179c2d060
[   24.431123] amdgpu 0000:00:01.0: fence driver on ring 6 use gpu addr 0x0000000040000070, cpu addr 0xffffa02179c2d070
[   24.431170] amdgpu 0000:00:01.0: fence driver on ring 7 use gpu addr 0x0000000040000080, cpu addr 0xffffa02179c2d080
[   24.431221] amdgpu 0000:00:01.0: fence driver on ring 8 use gpu addr 0x0000000040000090, cpu addr 0xffffa02179c2d090
[   24.761420] amdgpu 0000:00:01.0: fence driver on ring 9 use gpu addr 0x00000000400000a0, cpu addr 0xffffa02179c2d0a0
[   24.761749] amdgpu 0000:00:01.0: fence driver on ring 10 use gpu addr 0x00000000400000b0, cpu addr 0xffffa02179c2d0b0
[   24.900778] amdgpu 0000:00:01.0: fence driver on ring 11 use gpu addr 0x000000000068cd30, cpu addr 0xffffb70d02238d30
[   24.993027] amdgpu 0000:00:01.0: fence driver on ring 12 use gpu addr 0x00000000400000d0, cpu addr 0xffffa02179c2d0d0
[   24.993099] amdgpu 0000:00:01.0: fence driver on ring 13 use gpu addr 0x00000000400000e0, cpu addr 0xffffa02179c2d0e0
[   26.092550] fbcon: amdgpudrmfb (fb0) is primary device
[   27.935642] amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer device
[   27.956506] [drm] Initialized amdgpu 3.16.0 20150101 for 0000:00:01.0 on minor 0
[   27.956682] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
[   27.957234] amdgpu 0000:01:00.0: VM size (-1) must be a power of 2
[   28.041269] amdgpu 0000:01:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used)
[   28.041274] amdgpu 0000:01:00.0: GTT: 3072M 0x0000000080000000 - 0x000000013FFFFFFF
[   28.041370] [drm] amdgpu: 2048M of VRAM memory ready
[   28.041372] [drm] amdgpu: 3072M of GTT memory ready.
[   28.042108] amdgpu 0000:01:00.0: PCIE GART of 3072M enabled (table at 0x0000000000040000).
[   28.042308] amdgpu 0000:01:00.0: amdgpu: using MSI.
[   28.042355] [drm] amdgpu: irq initialized.
[   28.135174] [drm] amdgpu: dpm initialized
[   28.265927] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000010, cpu addr 0xffffa0217c2eb010
[   28.266183] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000020, cpu addr 0xffffa0217c2eb020
[   28.266262] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000030, cpu addr 0xffffa0217c2eb030
[   28.266352] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000040, cpu addr 0xffffa0217c2eb040
[   28.266546] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000050, cpu addr 0xffffa0217c2eb050
[   28.266982] [drm] enabling PCIE gen 3 link speeds, disable with amdgpu.pcie_gen2=0
[   30.289599] [drm] Initialized amdgpu 3.16.0 20150101 for 0000:01:00.0 on minor 1
[   30.485628] audit: type=1130 audit(1494213691.999:65): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-backlight@backlight:amdgpu_bl0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 3115.901737] [drm] enabling PCIE gen 3 link speeds, disable with amdgpu.pcie_gen2=0
[ 3117.273936] amdgpu 0000:01:00.0: PCIE GART of 3072M enabled (table at 0x0000000000040000).
[ 3136.430662] [drm] enabling PCIE gen 3 link speeds, disable with amdgpu.pcie_gen2=0
[ 3137.813144] amdgpu 0000:01:00.0: PCIE GART of 3072M enabled (table at 0x0000000000040000).
[ 9810.613716] [drm] enabling PCIE gen 3 link speeds, disable with amdgpu.pcie_gen2=0
[ 9811.994909] amdgpu 0000:01:00.0: PCIE GART of 3072M enabled (table at 0x0000000000040000).
[ 9815.206164] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status failed
[ 9815.206197] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed
[ 9815.904855] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status failed
[ 9815.904891] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed
[ 9815.982844] [drm] enabling PCIE gen 3 link speeds, disable with amdgpu.pcie_gen2=0
[ 9817.339313] amdgpu 0000:01:00.0: PCIE GART of 3072M enabled (table at 0x0000000000040000).

Hope the information is useful
Comment 3 Michel Dänzer 2017-05-22 07:46:18 UTC
I'm seeing the same *ERROR* lines on suspend/resume on one of my laptops, and the laptop panel stays black. Does that match your problem, or what did you mean by "broke the handling of power management"?
Comment 4 Luya Tshimbalanga 2017-05-22 08:41:19 UTC
> I'm seeing the same *ERROR* lines on suspend/resume on one of my laptops, and 
> the laptop panel stays black. Does that match your problem, or what did you 
> mean by "broke the handling of power management"?

For the former, that matches the problem I encountered. 
About the line "broke the handling of power management", I noticed more fan noise than usual but it may be something else. It will be nice to optimize that power management if possible.
Comment 5 Michel Dänzer 2017-05-22 08:49:05 UTC
And you're saying suspend/resume broke around May 2nd for you? Do you happen to remember a commit before that where it worked for you?
Comment 6 Luya Tshimbalanga 2017-05-22 16:00:36 UTC
Last commit that worked prior to the breakage was on 20170410. I just updated to the recent 20170511 which seems partially resolving the issue but the error message below still remains.

[22328.935171] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status failed
[22328.935217] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed
Comment 7 Michel Dänzer 2017-05-23 02:54:58 UTC
I'm still unclear as to the symptoms you're seeing. E.g., are the *ERROR* messages always accompanied by a black laptop panel? And does this only happen on suspend/resume, or also under other circumstances?
Comment 8 Luya Tshimbalanga 2017-05-23 16:23:11 UTC
Yes, those *ERROR* message happened on suspend/resume followed by the black screen. Same errors also occurred when leaving the laptop in idle for a while as well.
Comment 9 Michel Dänzer 2017-05-24 06:53:40 UTC
Thanks for the clarification. Sounds like it's the same issue I'm seeing.
Comment 10 Dmitry 2018-12-11 16:43:43 UTC
The XF86-video-ati driver is probably used. Kaveri does not work with this driver via DP. I have a black screen too.
Comment 11 Luya Tshimbalanga 2018-12-12 01:44:47 UTC
I no longer own Asus X550Z which broke few months ago, replaced by HP Envy x360 Ryzen 2500U 15 inch. Feel free to continue or close this report.
Comment 12 Martin Peres 2019-11-19 08:16:07 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/163.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.