with Polaris and Vega, setting amdgpu.ppfeaturemask=0xffffffff worked without issues here: It unlocked pp_od_clk_voltage and didn't cause any issues for me.
But with Navi, it doesn't work. I'm still not allowed to open
as root with specifying that flag.
Also, I can't increase the GPU's power consumption, as
only allows the default 100% Powertune limit, meaning I can't set any higher value in
Apart from not being able to change the aforementioned parameters, setting amdgpu.ppfeaturemask=0xffffffff causes stuttering, even on the desktop and also affects the mouse cursor.
This is with kernel drm-next-5.5-wip 73cdff347343504287feae8b36fa7317f04dcc61
and an MSI 5700 XT Gaming X.
Still happens with current 5.5-wip/drm-next kernels.
I don't know if it is supposed to be implemented, but there seems to be some bug apart from that:
Just reading sysfs entries at "/sys/class/drm/card0/device/" makes the parsing program freeze, e.g. filebrowser (also if started as root).
Anyhow, "# cat /sys/class/drm/card0/device/pp_od_clk_voltage" returns nothing.
Could there be an update on this? Not being able to overclock/undervolt almost 4 months after Navi release is a huge disappointment.
As a workaround, use upp instead as a workaround (write to the powerplay binary directly). See: https://github.com/sibradzic/upp
I suggest using 5.4-rcX as AMD's wip kernels (amd-staging-drm-next and drm-next) may still have a bug with pptable writing. Or you can try reverting 3abf8d896f8ac72341677a6cd82662b80943f9c8
drm/amd/powerplay: do proper cleanups on hw_fini
Be aware that this method can cause issues with fan control, so you might also need to manually set the fans after that. You can use fanctl to handle this:
I have the same (or at least a similar) bug. /sys/class/drm/card1/device/hwmon/hwmon3/power1_cap_max in my case gives the default 220W (value: 220000000).
$ cat /sys/class/drm/card0/device/pp_od_clk_voltage
I don't get any stuttering though, with kernel 5.3.6 or with 5.4rc2.
Dolphin freezes when looking at /sys/class/drm/card1/device/ as well.
Thanks for the hint @ Andrew Sheldon, SPPT being possible on Linux totally passed me by. Will test it with my cheap Polaris card first, which made me stick with custom fan curve anyway.
Regarding the stutter with amdgpu.ppfeaturemask=0xffffffff: I'm not sure anymore if it really was related, as hardware cursor support seems to be still a complete mess for Navi with 5.3/5.4 and 5.5 still being incomplete.
I can also confirm the issue exists. Setting amdgpu.ppfeaturemask=0xffffffff doesn't allow me to access the "States Table" section in radeon-profile, as if the parameter was ignored.
As for the stutter issue, I don't know what exactly it is as I don't notice any difference with or without the parameter. On 5.3 kernel, the mouse feels sluggish as if my monitor is running at 30Hz, but it's fine on 5.4 (rc) kernel. This is observed on official Manjaro kernels.
Tested custom soft power play table via UPP on Polaris and it generally seems to work well (might be able to test Navi at a later time).
However, there is the issue that the voltage gets reset when there is a modeline switch. So I've written a script which checks the voltage and restarts UPP when it exceeds values which would not occur with my undervolting:
while true; do
read -r num < /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon0/in0_input
if [[ "$num" -gt 1030 ]]; then
systemctl restart amdgpu-oc && systemctl restart amdgpu-fancontrol
This patch should take care of the problem by treating navi10's TDPODLimit the same as vega20 does: https://patchwork.freedesktop.org/series/69090/
(In reply to Matt Coffin from comment #7)
> This patch should take care of the problem by treating navi10's TDPODLimit
> the same as vega20 does: https://patchwork.freedesktop.org/series/69090/
Sorry for the spam. This is in reply to the power1_cap issue, not the whole bug in general.
Thank you, I'll try it out at some point.
I also got an email by fin4478 with the suggestion to try out amdgpu.ppfeaturemask=0xfffd7fff.
I'm already using ppfeaturemask=0xfffd7fff, it doesn't unlock anything - or at least CoreCtrl doesn't show anything.
In the journald log I see a lot of these lines, always grouped together:
08.11.19 20:20 kernel amdgpu: [powerplay] Failed to send message 0xe, response 0xfffffffb, param 0x80
08.11.19 20:20 kernel amdgpu: [powerplay] Failed to send message 0x20, response 0xfffffffb param 0x2
That really looks suspicious.
Looks like the issue of voltage resetting itself to default I mentioned earlier when using custom power play stables might not apply to Navi:
Can anyone share his/her experience with using custom power play tables via upp for Navi?
Now with that fix for Polaris by Alex, it seems to be absolutely flawless for me.
Would be good to know if the same applied to Navi.
It was just a bit inconvenient that for my Polaris card the Vdds were defined as garbage values when parsing the default pp_table. Though specifying custom values in mV worked without issues.
(In reply to tempel.julian from comment #11)
> It was just a bit inconvenient that for my Polaris card the Vdds were
> defined as garbage values when parsing the default pp_table. Though
> specifying custom values in mV worked without issues.
If you are seeing values like 0xff01, those are not garbage. They are virtual voltage ids so that the driver uses look up the proper voltage via a different method.
It might be that, just not in hex. E.g. VddcLookupTable entry 1 returns a Vdd of 65282.
(In reply to tempel.julian from comment #13)
> It might be that, just not in hex. E.g. VddcLookupTable entry 1 returns a
> Vdd of 65282.
Correct. 65282 is 0xff02 which is a virtual voltage id. The driver uses that id to look up the real voltage based on the leakage for the board. Take a look at smu7_get_evv_voltages() or smu7_get_elb_voltages() in smu7_hwmgr.c.
pp_od_clk_voltage isn't implemented yet for navi. There are patches on the mailing list: