Summary: | RX 5700 XT Navi - amdgpu.ppfeaturemask=0xffffffff causes stuttering and does not unlock clock/voltage/power controls | ||
---|---|---|---|
Product: | DRI | Reporter: | tempel.julian |
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | enhancement | ||
Priority: | not set | CC: | danielkinsman.nospam, ragnaros39216, tempel.julian |
Version: | DRI git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
tempel.julian
2019-09-22 09:54:16 UTC
Still happens with current 5.5-wip/drm-next kernels. I don't know if it is supposed to be implemented, but there seems to be some bug apart from that: Just reading sysfs entries at "/sys/class/drm/card0/device/" makes the parsing program freeze, e.g. filebrowser (also if started as root). Anyhow, "# cat /sys/class/drm/card0/device/pp_od_clk_voltage" returns nothing. Could there be an update on this? Not being able to overclock/undervolt almost 4 months after Navi release is a huge disappointment. As a workaround, use upp instead as a workaround (write to the powerplay binary directly). See: https://github.com/sibradzic/upp I suggest using 5.4-rcX as AMD's wip kernels (amd-staging-drm-next and drm-next) may still have a bug with pptable writing. Or you can try reverting 3abf8d896f8ac72341677a6cd82662b80943f9c8 drm/amd/powerplay: do proper cleanups on hw_fini Be aware that this method can cause issues with fan control, so you might also need to manually set the fans after that. You can use fanctl to handle this: https://gitlab.com/mcoffin/fanctl I have the same (or at least a similar) bug. /sys/class/drm/card1/device/hwmon/hwmon3/power1_cap_max in my case gives the default 220W (value: 220000000). $ cat /sys/class/drm/card0/device/pp_od_clk_voltage returns nothing. I don't get any stuttering though, with kernel 5.3.6 or with 5.4rc2. Dolphin freezes when looking at /sys/class/drm/card1/device/ as well. Thanks for the hint @ Andrew Sheldon, SPPT being possible on Linux totally passed me by. Will test it with my cheap Polaris card first, which made me stick with custom fan curve anyway. Regarding the stutter with amdgpu.ppfeaturemask=0xffffffff: I'm not sure anymore if it really was related, as hardware cursor support seems to be still a complete mess for Navi with 5.3/5.4 and 5.5 still being incomplete. I can also confirm the issue exists. Setting amdgpu.ppfeaturemask=0xffffffff doesn't allow me to access the "States Table" section in radeon-profile, as if the parameter was ignored. As for the stutter issue, I don't know what exactly it is as I don't notice any difference with or without the parameter. On 5.3 kernel, the mouse feels sluggish as if my monitor is running at 30Hz, but it's fine on 5.4 (rc) kernel. This is observed on official Manjaro kernels. Tested custom soft power play table via UPP on Polaris and it generally seems to work well (might be able to test Navi at a later time). However, there is the issue that the voltage gets reset when there is a modeline switch. So I've written a script which checks the voltage and restarts UPP when it exceeds values which would not occur with my undervolting: #!/bin/bash while true; do sleep 1 read -r num < /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon0/in0_input if [[ "$num" -gt 1030 ]]; then systemctl restart amdgpu-oc && systemctl restart amdgpu-fancontrol fi done This patch should take care of the problem by treating navi10's TDPODLimit the same as vega20 does: https://patchwork.freedesktop.org/series/69090/ (In reply to Matt Coffin from comment #7) > This patch should take care of the problem by treating navi10's TDPODLimit > the same as vega20 does: https://patchwork.freedesktop.org/series/69090/ Sorry for the spam. This is in reply to the power1_cap issue, not the whole bug in general. Thank you, I'll try it out at some point. I also got an email by fin4478 with the suggestion to try out amdgpu.ppfeaturemask=0xfffd7fff. I'm already using ppfeaturemask=0xfffd7fff, it doesn't unlock anything - or at least CoreCtrl doesn't show anything. In the journald log I see a lot of these lines, always grouped together: 08.11.19 20:20 kernel amdgpu: [powerplay] Failed to send message 0xe, response 0xfffffffb, param 0x80 08.11.19 20:20 kernel amdgpu: [powerplay] Failed to send message 0x20, response 0xfffffffb param 0x2 That really looks suspicious. Looks like the issue of voltage resetting itself to default I mentioned earlier when using custom power play stables might not apply to Navi: https://bugzilla.kernel.org/show_bug.cgi?id=205393 Can anyone share his/her experience with using custom power play tables via upp for Navi? Now with that fix for Polaris by Alex, it seems to be absolutely flawless for me. Would be good to know if the same applied to Navi. It was just a bit inconvenient that for my Polaris card the Vdds were defined as garbage values when parsing the default pp_table. Though specifying custom values in mV worked without issues. (In reply to tempel.julian from comment #11) > It was just a bit inconvenient that for my Polaris card the Vdds were > defined as garbage values when parsing the default pp_table. Though > specifying custom values in mV worked without issues. If you are seeing values like 0xff01, those are not garbage. They are virtual voltage ids so that the driver uses look up the proper voltage via a different method. It might be that, just not in hex. E.g. VddcLookupTable entry 1 returns a Vdd of 65282. (In reply to tempel.julian from comment #13) > It might be that, just not in hex. E.g. VddcLookupTable entry 1 returns a > Vdd of 65282. Correct. 65282 is 0xff02 which is a virtual voltage id. The driver uses that id to look up the real voltage based on the leakage for the board. Take a look at smu7_get_evv_voltages() or smu7_get_elb_voltages() in smu7_hwmgr.c. pp_od_clk_voltage isn't implemented yet for navi. There are patches on the mailing list: https://patchwork.freedesktop.org/series/69152/ -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/913. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.