Bug 111555

Summary: [amdgpu/Navi] [powerplay] Failed to send message errors
Product: DRI Reporter: Shmerl <shtetldik>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: not set CC: kingoipo
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Shmerl 2019-09-04 04:56:43 UTC
I get periodic errors like this in dmesg, which coincides with intermittent system stalls:

    [Wed Sep  4 00:43:43 2019] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
    [Wed Sep  4 00:43:43 2019] amdgpu: [powerplay] Failed to export SMU metrics table!
    [Wed Sep  4 00:44:53 2019] amdgpu: [powerplay] Failed to send message 0xe, response 0xfffffffb, param 0x80
    [Wed Sep  4 00:44:53 2019] amdgpu: [powerplay] Failed to send message 0xf, response 0xfffffffb, param 0xa90000
    [Wed Sep  4 00:45:30 2019] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb, param 0x6
    [Wed Sep  4 00:45:35 2019] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
    [Wed Sep  4 00:45:35 2019] amdgpu: [powerplay] Failed to export SMU metrics table!

I'm running kernel 5.3-rc7
GPU: Sapphire Pulse RX 5700XT (Navi 10) with firmware from  https://people.freedesktop.org/~agd5f/radeon_ucode/navi10/
Distro: Debian testing / KDE.

I noticed, that it starts happening often when I'm using ksysguard, which queries lm-sensors for amdgpu temperature and fan speed.
Comment 1 Shmerl 2019-09-05 23:34:19 UTC
These errors also happen when using radeon-profile to control the fan speed:

[ 3099.422315] amdgpu: [powerplay] Failed to send message 0xe, response 0xfffffffb param 0x80
[ 3099.422318] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 3145.423048] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
[ 3145.423051] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 3145.423076] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb, param 0x6
[ 3149.423073] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
[ 3149.423076] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 3200.422744] amdgpu: [powerplay] Failed to send message 0xf, response 0xfffffffb, param 0xa90000
[ 3200.422846] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
[ 3200.422850] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 3234.422189] amdgpu: [powerplay] Failed to send message 0xf, response 0xfffffffb, param 0xa90000
Comment 2 Shmerl 2019-09-06 01:48:11 UTC
Related: https://github.com/marazmista/radeon-profile/issues/157
Comment 3 Andrew Sheldon 2019-09-08 02:22:11 UTC
Are you running a monitor at 75hz?

I can only trigger the bug when setting 74-76hz with amd-staging-drm-next, and although I haven't tested in a while, I suspect the same applies with 5.3-rcX (and drm-next-5.4).

Here's the output after setting 75hz, on amd-staging-drm-next:
[ 7937.682003] amdgpu: [powerplay] failed send message: TransferTableSmu2Dram (18)      param: 0x00000006 response 0xffffffc2
[ 7937.682004] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 7938.087356] amdgpu: [powerplay] failed send message: NumOfDisplays (64)      param: 0x00000001 response 0xffffffc2
[ 7940.224391] amdgpu: [powerplay] failed send message: TransferTableSmu2Dram (18)      param: 0x00000006 response 0xffffffc2
[ 7940.224392] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 7942.362952] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7944.510060] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7944.510061] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 7945.269921] amdgpu: [powerplay] failed send message: NumOfDisplays (64)      param: 0x00000001 response 0xffffffc2
[ 7946.652777] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7947.411808] amdgpu: [powerplay] failed send message: NumOfDisplays (64)      param: 0x00000001 response 0xffffffc2
[ 7948.786413] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7948.786414] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 7950.918131] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7953.076247] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7953.076250] amdgpu: [powerplay] Failed to export SMU metrics table!
Comment 4 Shmerl 2019-09-08 03:32:53 UTC
(In reply to Andrew Sheldon from comment #3)
> Are you running a monitor at 75hz?
> 


No, 60 Hz which is my monitor's native refresh rate.
Comment 5 Michael de Lang 2019-09-20 16:27:50 UTC
I can confirm this happens when I use a dual-monitor setup. I have two 1440p@144 Hz screens and these messages happen when I boot with both screens, or look at the gpu temperature through the sensor command, with both screens active.

With only one screen active, I cannot reproduce the bug.
Comment 6 Shmerl 2019-09-20 16:38:52 UTC
Just for the reference, my connection is DisplayPort 1.2.
Comment 7 Tako Marks 2019-10-09 19:14:38 UTC
I ran into this issue when messing around with my BIOS settings. Not sure if helpful but when I had the option Decode Above 4G (64bit adressing on PCI bus?) on my Gigabyte Aorus B450 I experienced the same issue. After turning that option back off everything is working again.
Comment 8 Shmerl 2019-11-07 23:21:11 UTC
I don't get these errors anymore when using radeon-profile with kernel 5.4-rc6.

But with ksysguard, Failed to export SMU metrics table! message is still occurring, though it's not causing any stalls or hangs now.
Comment 9 Martin Peres 2019-11-19 09:51:02 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/900.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.