Bug 109887

Summary: [Vega10][powerplay] P7 gets reset to max_vddc (1.2V/1.25V) after applying any custom settings via pp_od_clk_voltage and/or pp_table
Product: DRI Reporter: kgkggl+bugs.freedesktop.org
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: bednarczyk.pawel, haro41, nrndda, t.clastres, wslatem
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=205277
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg output
none
Xorg.log
none
pp_table none

Description kgkggl+bugs.freedesktop.org 2019-03-06 08:06:12 UTC
Created attachment 143547 [details]
dmesg output

I overwrite "pp_od_clk_voltage" to control the voltage, but I have a problem.

The GPU voltage represents the value of "/sys/class/drm/card0/device/hwmon/hwmon0/in0_input"

If I set "pp_od_clk_voltage" before starting Xorg/compton/WM, the GPU voltage will be locked at 1200mv.
If I set "pp_od_clk_voltage" after starting Xorg/compton/WM, the GPU is locked to 1200mv after a heavy load.

Unless I set "echo c > /sys/class/drm/card0/device/pp_od_clk_voltage" again, the idle voltage will return to 900-950mv.But still higher than the set value.

Then we have a new problem.I can't control the "P7" voltage by setting "pp_od_clk_voltage". The value in "pp_od_clk_voltage" can be changed, but the reading is always 1200mv when the GPU jumps to "P7".

I set "P6" and "P7" to the same value to prevent the GPU from jumping to "P7"

My "pp_od_clk_voltage" setting:
OD_SCLK:
0:        852Mhz        800mV
1:        974Mhz        825mV
2:       1096Mhz        850mV
3:       1218Mhz        875mV
4:       1340Mhz        900mV
5:       1462Mhz        925mV
6:       1584Mhz        950mV
7:       1584Mhz        950mV
OD_MCLK:
0:        167Mhz        800mV
1:        500Mhz        800mV
2:        700Mhz        900mV
3:        800Mhz        950mV
OD_RANGE:
SCLK:     852MHz       2400MHz
MCLK:     167MHz       1500MHz
VDDC:     800mV        1200mV

Default "pp_od_clk_voltage" setting
OD_SCLK:
0:        852Mhz        800mV
1:        991Mhz        900mV
2:       1138Mhz        950mV
3:       1269Mhz       1000mV
4:       1312Mhz       1050mV
5:       1474Mhz       1100mV
6:       1538Mhz       1150mV
7:       1590Mhz       1200mV
OD_MCLK:
0:        167Mhz        800mV
1:        500Mhz        800mV
2:        700Mhz        900mV
3:        800Mhz        950mV
OD_RANGE:
SCLK:     852MHz       2400MHz
MCLK:     167MHz       1500MHz
VDDC:     800mV        1200mV

RYZEN 1700
MSI B350M MORTAR
PowerColor Radeon RX Vega 56

Linux 5.0.0-arch1-1-ARCH #1 SMP PREEMPT Mon Mar 4 14:11:43 UTC 2019 x86_64 GNU/Linux

Thanks!
Comment 1 kgkggl+bugs.freedesktop.org 2019-03-06 08:08:00 UTC
Created attachment 143548 [details]
Xorg.log
Comment 2 kgkggl+bugs.freedesktop.org 2019-03-06 08:10:10 UTC
Created attachment 143549 [details]
pp_table
Comment 3 fin4478 2019-03-06 14:52:06 UTC
You need to have a kernel command line parameter and c is used to commit changes. See: https://wiki.archlinux.org/index.php/AMDGPU#Overclocking
Comment 4 kgkggl+bugs.freedesktop.org 2019-03-06 15:45:20 UTC
(In reply to fin4478 from comment #3)
> You need to have a kernel command line parameter and c is used to commit
> changes. See: https://wiki.archlinux.org/index.php/AMDGPU#Overclocking

Yes, I use "c" to commit changes, the GPU frequency can always be modified, but the voltage does not take effect.

I use parameters:

# echo s 1 974 825 > /sys/class/drm/card0/device/pp_od_clk_voltage
# echo s 2 1096 850 > /sys/class/drm/card0/device/pp_od_clk_voltage
# echo s 3 1218 875 > /sys/class/drm/card0/device/pp_od_clk_voltage
# echo s 4 1340 900 > /sys/class/drm/card0/device/pp_od_clk_voltage
# echo s 5 1462 925 > /sys/class/drm/card0/device/pp_od_clk_voltage
# echo s 6 1584 950 > /sys/class/drm/card0/device/pp_od_clk_voltage
# echo s 7 1584 950 > /sys/class/drm/card0/device/pp_od_clk_voltage
# echo c > /sys/class/drm/card0/device/pp_od_clk_voltage
Comment 5 bednarczyk.pawel 2019-04-07 14:48:38 UTC
Did you manage to get this resolved. I have the same issue and in my case setting P6 = P7 Frequency, the memory clock gets stuck at P0 167 Mhz. I tried 5.1 RC-3 but no joy either.
Comment 6 Jon Doane 2019-06-07 22:10:22 UTC
I'm also having this particular issue with a Vega 64. It appears that setting any non-stock voltage to any of the power states will cause the voltage at any clock to jump to 1.20v. I haven't been able to find a way around it yet.
Comment 7 hagar-dunor 2019-06-20 13:38:39 UTC
Met the same annoyance, and found a rather convoluted way to get around it. It would be better overclocking/undervolting work by setting pp_od_clk_voltage only.

https://forum.level1techs.com/t/how-to-overclock-vega-on-linux/132771/65
Comment 8 Andrew Sheldon 2019-07-31 04:05:50 UTC
I also can confirm the problem, and it seems to have gotten worse since 5.3.0-rcX.

In past kernels, you could kind of work around it by setting slightly less conservative undervolts and it would work. If you go past a certain point (a point that works fine if the pp table is overridden), it would wrap around to 1.2V. For reference this around 950mv at state 6/7.

Now with 5.3, even that same relatively conservative undervolt would immediately jump to 1.2V under load.

As hagar-dunor suggested, manually overriding the entire PP table, works around the problem.
Comment 9 Andrew Sheldon 2019-07-31 09:16:05 UTC
Here's a linux pp table editor that also seems to support more options (such as raising the power cap) than OverDriveNTool:
https://github.com/amezin/powerplay-table-editor

I will note that you might still see raised voltages if you do a too aggressive overclock/undervolt with modded PP tables, but it seems to only overvolt as much as is needed (say 975mv -> 1.05V) if you set a too high clock, rather than jumping to the maximum possible voltage that you see by editing pp_od_clk_voltage.
Comment 10 Andrew Sheldon 2019-07-31 10:43:38 UTC
(In reply to Andrew Sheldon from comment #9)
> I will note that you might still see raised voltages if you do a too
> aggressive overclock/undervolt with modded PP tables, but it seems to only
> overvolt as much as is needed (say 975mv -> 1.05V) if you set a too high
> clock, rather than jumping to the maximum possible voltage that you see by
> editing pp_od_clk_voltage.

Replying to myself here. I've found that the raised voltage occurs when forcing GPU state to "high". If I then re-write the PP table again, it settles on the correct voltage, and successfully switches to "high" after. I'm not sure if this is related to some of the other voltage issues, but there you go. This occurs on both 5.2.3 and 5.3-rc2, for the record.

I have to do this every time I force to "high", BTW.
Comment 11 Stefan Springer 2019-10-13 18:26:46 UTC
Loading a powerplay table only temporarily alleviates the issue. It comes back after a while of desktop usage.

I think the culprit might be here: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c#L348

Why set the voltage for a state to max_vddc in the first place?
Comment 12 Stefan Springer 2019-10-13 20:05:34 UTC
Actually, it gets reset to 1200mV every time the resolution changes. I.e. when launching a fullscreen game or restarting the display manager.
Comment 13 Stefan Springer 2019-10-23 14:40:41 UTC
*** Bug 110113 has been marked as a duplicate of this bug. ***
Comment 14 Stefan Springer 2019-10-23 14:41:21 UTC
*** Bug 110347 has been marked as a duplicate of this bug. ***
Comment 15 Stefan Springer 2019-10-24 09:53:56 UTC
The patch Pelle van Gils proposed here works flawlessly for me:
https://bugzilla.kernel.org/show_bug.cgi?id=205277

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.