Bug 111555 - [amdgpu/Navi] [powerplay] Failed to send message errors
Summary: [amdgpu/Navi] [powerplay] Failed to send message errors
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: not set normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-04 04:56 UTC by Shmerl
Modified: 2019-09-08 03:32 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Shmerl 2019-09-04 04:56:43 UTC
I get periodic errors like this in dmesg, which coincides with intermittent system stalls:

    [Wed Sep  4 00:43:43 2019] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
    [Wed Sep  4 00:43:43 2019] amdgpu: [powerplay] Failed to export SMU metrics table!
    [Wed Sep  4 00:44:53 2019] amdgpu: [powerplay] Failed to send message 0xe, response 0xfffffffb, param 0x80
    [Wed Sep  4 00:44:53 2019] amdgpu: [powerplay] Failed to send message 0xf, response 0xfffffffb, param 0xa90000
    [Wed Sep  4 00:45:30 2019] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb, param 0x6
    [Wed Sep  4 00:45:35 2019] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
    [Wed Sep  4 00:45:35 2019] amdgpu: [powerplay] Failed to export SMU metrics table!

I'm running kernel 5.3-rc7
GPU: Sapphire Pulse RX 5700XT (Navi 10) with firmware from  https://people.freedesktop.org/~agd5f/radeon_ucode/navi10/
Distro: Debian testing / KDE.

I noticed, that it starts happening often when I'm using ksysguard, which queries lm-sensors for amdgpu temperature and fan speed.
Comment 1 Shmerl 2019-09-05 23:34:19 UTC
These errors also happen when using radeon-profile to control the fan speed:

[ 3099.422315] amdgpu: [powerplay] Failed to send message 0xe, response 0xfffffffb param 0x80
[ 3099.422318] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 3145.423048] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
[ 3145.423051] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 3145.423076] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb, param 0x6
[ 3149.423073] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
[ 3149.423076] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 3200.422744] amdgpu: [powerplay] Failed to send message 0xf, response 0xfffffffb, param 0xa90000
[ 3200.422846] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
[ 3200.422850] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 3234.422189] amdgpu: [powerplay] Failed to send message 0xf, response 0xfffffffb, param 0xa90000
Comment 2 Shmerl 2019-09-06 01:48:11 UTC
Related: https://github.com/marazmista/radeon-profile/issues/157
Comment 3 Andrew Sheldon 2019-09-08 02:22:11 UTC
Are you running a monitor at 75hz?

I can only trigger the bug when setting 74-76hz with amd-staging-drm-next, and although I haven't tested in a while, I suspect the same applies with 5.3-rcX (and drm-next-5.4).

Here's the output after setting 75hz, on amd-staging-drm-next:
[ 7937.682003] amdgpu: [powerplay] failed send message: TransferTableSmu2Dram (18)      param: 0x00000006 response 0xffffffc2
[ 7937.682004] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 7938.087356] amdgpu: [powerplay] failed send message: NumOfDisplays (64)      param: 0x00000001 response 0xffffffc2
[ 7940.224391] amdgpu: [powerplay] failed send message: TransferTableSmu2Dram (18)      param: 0x00000006 response 0xffffffc2
[ 7940.224392] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 7942.362952] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7944.510060] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7944.510061] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 7945.269921] amdgpu: [powerplay] failed send message: NumOfDisplays (64)      param: 0x00000001 response 0xffffffc2
[ 7946.652777] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7947.411808] amdgpu: [powerplay] failed send message: NumOfDisplays (64)      param: 0x00000001 response 0xffffffc2
[ 7948.786413] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7948.786414] amdgpu: [powerplay] Failed to export SMU metrics table!
[ 7950.918131] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7953.076247] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)      param: 0x00000080 response 0xffffffc2
[ 7953.076250] amdgpu: [powerplay] Failed to export SMU metrics table!
Comment 4 Shmerl 2019-09-08 03:32:53 UTC
(In reply to Andrew Sheldon from comment #3)
> Are you running a monitor at 75hz?
> 


No, 60 Hz which is my monitor's native refresh rate.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.