Bug 111913

Summary: AMD Navi10 GPU powerplay issues when using two DisplayPort connectors
Product: DRI Reporter: Timur Kristóf <venemo>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: not set CC: stefan
Version: DRI git   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Timur Kristóf 2019-10-07 08:17:38 UTC
I have a Sapphire AMD RX 5700XT graphics card (the reference design).

When I use DisplayPort-0 and DisplayPort-1 connectors with two Dell U2718Q monitors, I get the following errors in dmesg.

[   39.270292] amdgpu: [powerplay] failed send message: TransferTableSmu2Dram (18) 	param: 0x00000006 response 0xffffffc2
[   39.270294] amdgpu: [powerplay] Failed to export SMU metrics table!
[   41.787785] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14) 	param: 0x00000080 response 0xffffffc2
[   44.288882] amdgpu: [powerplay] failed send message: TransferTableSmu2Dram (18) 	param: 0x00000006 response 0xffffffc2
[   44.288883] amdgpu: [powerplay] Failed to export SMU metrics table!
[   46.822924] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14) 	param: 0x00000080 response 0xffffffc2
[   49.237444] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14) 	param: 0x00000080 response 0xffffffc2
[   49.353456] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14) 	param: 0x00000080 response 0xffffffc2
[   49.353457] amdgpu: [powerplay] Failed to export SMU metrics table!
[   66.676957] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[   69.095868] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[   71.634371] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[   74.166180] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[   76.526822] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[   78.804664] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[   81.348866] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[   83.881667] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[   86.335097] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[   88.763558] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[   91.209310] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[   93.740772] amdgpu: [powerplay] failed send message: NumOfDisplays (64) 	param: 0x00000002 response 0xffffffc2
[  128.174369] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00000000 response 0xffffffc2
[  130.597545] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00000000 response 0xffffffc2
[  133.135948] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00020000 response 0xffffffc2
[  135.676398] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00020000 response 0xffffffc2
[  138.213539] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00000000 response 0xffffffc2
[  140.753936] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00000000 response 0xffffffc2
[  143.034004] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00020000 response 0xffffffc2
[  145.313763] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00020000 response 0xffffffc2
[  151.255023] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00000000 response 0xffffffc2
[  153.531678] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00000000 response 0xffffffc2
[  156.054509] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00020000 response 0xffffffc2
[  156.054516] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00020000 response 0xfffffffb
[  156.057923] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00000000 response 0xfffffffb
[  156.057930] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00000000 response 0xfffffffb
[  156.057932] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00020000 response 0xfffffffb
[  156.057938] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31) 	param: 0x00020000 response 0xfffffffb
Comment 1 Timur Kristóf 2019-10-07 08:22:50 UTC
After a looking at the problem a bit further, it seems that the problem occurs when using any two DisplayPort connectors, but does not happen when using just one DisplayPort and the HDMI connector.
Comment 2 Timur Kristóf 2019-10-07 08:28:02 UTC
Forgot to mention, this happened with kernel 5.4-rc1 and mesa 19.2
Comment 3 Stefan Rehm 2019-10-07 18:46:10 UTC
I can confirm this. My card is a PowerColor Radeon RX 5700 XT Red Dragon. As soon as I connect a second monitor, I get the same errors in dmesg as Timur Kristóf described. Unfortunately, the workaround with the HDMI connection does not seem to work in my case. It does not matter wether the monitors are connected via DP or HDMI.

One important fact: the problem started with kernel 5.4-rc1 and persists in 5.4-rc2, but 5.3 works fine (except for the problem with the high idle power consumption, but that is a different story :))!
Comment 4 Stefan Rehm 2019-10-07 19:06:29 UTC
Just to clarify: this is not just a "cosmetic" issue. The computer is barely usable. Application take extremely long to start and/or run slowly. Also the files in sysfs (/sys/class/drm/card0/device/pp_*) dont return anything anymore and lm_senors reports N/A for all sensors except the fan speed.
Comment 5 Andrew Sheldon 2019-10-07 22:52:15 UTC
Are both monitors 60hz? I've seen this occur with 2x60hz setups, but not with other combinations of refresh rates. It seems to be similar to issues with 75hz in a single monitor configuration.

Other combinations of dual monitor refresh rates don't exhibit the issue, for me (although there are other problems, as discussed in https://bugs.freedesktop.org/show_bug.cgi?id=111482).
Comment 6 Stefan Rehm 2019-10-08 12:02:05 UTC
Yes, both monitors run at 60 Hz.
Comment 7 Timur Kristóf 2019-10-08 13:56:57 UTC
(In reply to Andrew Sheldon from comment #5)
> Are both monitors 60hz? I've seen this occur with 2x60hz setups, but not
> with other combinations of refresh rates. It seems to be similar to issues
> with 75hz in a single monitor configuration.


In my case, both are Dell U2718Q monitors, the resolution is 4K (3840x2160), and the refresh rate is 60Hz on both monitors.
Comment 8 Stefan Rehm 2019-10-08 15:39:43 UTC
In my case the resolution of both monitors is 2560x1440
Comment 9 Andrew Sheldon 2019-10-09 00:24:16 UTC
(In reply to Stefan Rehm from comment #8)
> In my case the resolution of both monitors is 2560x1440

You could try overclocking (or underclocking) one or both monitors to see if the bug still exists, using: https://github.com/kevinlekiller/cvt_modeline_calculator_12

I recommend using the "-b" option which uses reduced blanking V2 mode, but you could experiment with different options.

Then to use it:

xrandr --output <monitor output> --newmode <modeline name> <modeline details from cvt>

xrandr --output <monitor output> --addmode <monitor output> <modeline name>

xrandr --output <monitor output> --mode <modeline name>

Modeline name being whatever you like.

You'll probably have to launch X with one of the monitors disconnected (as the bug may trigger before you can apply the modeline change). I believe the amdgpu DDX has support for specifying modelines, but I don't know the syntax off the top of my head.
Comment 10 Stefan Rehm 2019-10-09 18:39:55 UTC
Correction: the exact frequency reported by xrandr is 59.95

I took Andrew Sheldon`s advice and experimented a bit with refresh rates and resolutions. Turns out, that the problem does not occur in lower resolutions even when both displays operate at 60 Hz.
Comment 11 Stefan Rehm 2019-10-13 13:07:48 UTC
git bisect shows that commit fb6959ae50176758a073687dbb081d26521f4576 ("Embed DCN2 SOC bounding box") is the first to to trigger the bug.
Comment 12 Stefan Rehm 2019-10-19 11:24:10 UTC
If I change

dcn2_0_soc.dram_clock_change_latency_us

in "rivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c" from 404.0 to 10.0 (the value used in kernel 5.3) the messages disappear and the system behaves normal again. However, as long as

/sys/class/drm/card0/device/power_dpm_force_performance_level

is set to "auto", I am now seeing massive flickering. Forcing it to low or high fixes that.

According to the sources for kernel 5.3 the value of 10.0 for dram_clock_change_latency is a hack. Can anyone elaborate on this?
Comment 13 Martin Peres 2019-11-19 09:57:06 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/929.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.