Bug 106177 - overclocking doesn't work with 4.17-rc1
Summary: overclocking doesn't work with 4.17-rc1
Status: RESOLVED NOTABUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-22 16:46 UTC by Christoph Haag
Modified: 2018-10-19 01:47 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Christoph Haag 2018-04-22 16:46:15 UTC
RX 480.

I noticed a while ago that overclocking didn't work on drm-next-4.17-wip/drm-next-4.18-wip, but now it's gotten into mainline.

With linux 4.16.3 overclockling works fine.

With 4.17-rc1 the setting doesn't "stick" and the clocks aren't increased:

root@c-pc ~ # cat /sys/class/drm/card0/device/pp_sclk_od
0
root@c-pc ~ # cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 300Mhz *
1: 608Mhz
2: 910Mhz
3: 1077Mhz
4: 1145Mhz
5: 1191Mhz
6: 1236Mhz
7: 1288Mhz
root@c-pc ~ # echo 2 > /sys/class/drm/card0/device/pp_sclk_od
root@c-pc ~ # cat /sys/class/drm/card0/device/pp_sclk_od
0
root@c-pc ~ # cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 300Mhz *
1: 608Mhz
2: 910Mhz
3: 1077Mhz
4: 1145Mhz
5: 1191Mhz
6: 1236Mhz
7: 1288Mhz
Comment 1 grmat 2018-05-03 17:55:01 UTC
I have the same issue on 4.16.6 and 4.17-rc2 and CIK hardware (Hawaii/R9 290X)
Comment 2 Christoph Haag 2018-05-03 23:42:46 UTC
I just tried 4.16.7 and overclocking still works on the 4.16 version.
And I think on 4.17rc3 it still doesn't work.

Do you mean on 4.16.6 overclocking does not work for you? Then it's probably not the same as my issue.
Comment 3 grmat 2018-05-03 23:46:24 UTC
At least I had the same behaviour, value not "sticky" and no impact on actual clock.

I noticed it when I tested 4.17-rc2, actually wanted to test the "wattman" functionality. Then noticed, it doesn't work on the 4.16.6 either. Will have to test older kernels, maybe this weekend.
Comment 4 tempel.julian 2018-05-04 16:49:22 UTC
I'm on a RX 560.

I can confirm that with 4.17, behavior has quite changed a lot. I don't think it's buggy for Polaris, but not documented correctly.
To make values "stick" in pp_sclk_od, you have to boot with the parameter amdgpu.ppfeaturemask=0xffffffff.

Applying values higher than 1 in pp_sclk_od don't get reported back correctly, cat pp_sclk_od always returns 1.
However, entering values higher than 1 have an effect despite of this, as they allow using of higher clocks in pp_od_clk_voltage.

And unlike with 4.16, higher values than 0 in pp_sclk_od don't automatically lead to higher clocks. Additionally it is also required to change the pstates in pp_od_clk_voltage (again unlocked with amdgpu.ppfeaturemask=0xffffffff).
Comment 5 Alex Deucher 2018-05-08 21:21:13 UTC
Please use debugfs (e.g., /sys/kernel/debug/dri/0/amdgpu_pm_info) to check the current clocks.
Comment 6 Christoph Haag 2018-05-09 07:11:52 UTC
I have not tried amdgpu.ppfeaturemask=0xffffffff etc. yet.
If a behavior change like that is intended, there should be some easily accessible documentation about it somewhere.

1288 MHz (SCLK) is the factory overclocked default max clock, so debugfs agrees that pp_sclk_od on its own does nothing with 4.17rc4.

root@c-pc ~ # echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
root@c-pc ~ # echo 5 > /sys/class/drm/card0/device/pp_sclk_od
root@c-pc ~ # cat /sys/kernel/debug/dri/0/amdgpu_pm_info
Clock Gating Flags Mask: 0x37bcf
        Graphics Medium Grain Clock Gating: On
        Graphics Medium Grain memory Light Sleep: On
        Graphics Coarse Grain Clock Gating: On
        Graphics Coarse Grain memory Light Sleep: On
        Graphics Coarse Grain Tree Shader Clock Gating: Off
        Graphics Coarse Grain Tree Shader Light Sleep: Off
        Graphics Command Processor Light Sleep: On
        Graphics Run List Controller Light Sleep: On
        Graphics 3D Coarse Grain Clock Gating: Off
        Graphics 3D Coarse Grain memory Light Sleep: Off
        Memory Controller Light Sleep: On
        Memory Controller Medium Grain Clock Gating: On
        System Direct Memory Access Light Sleep: Off
        System Direct Memory Access Medium Grain Clock Gating: On
        Bus Interface Medium Grain Clock Gating: Off
        Bus Interface Light Sleep: On
        Unified Video Decoder Medium Grain Clock Gating: On
        Video Compression Engine Medium Grain Clock Gating: On
        Host Data Path Light Sleep: Off
        Host Data Path Medium Grain Clock Gating: On
        Digital Right Management Medium Grain Clock Gating: Off
        Digital Right Management Light Sleep: Off
        Rom Medium Grain Clock Gating: On
        Data Fabric Medium Grain Clock Gating: Off

GFX Clocks and Power:
        2000 MHz (MCLK)
        1288 MHz (SCLK)
        300 MHz (PSTATE_SCLK)
        601 MHz (PSTATE_MCLK)
        1093 mV (VDDGFX)
        19.166 W (VDDC)
        15.19 W (VDDCI)
        40.200 W (max GPU)
        40.185 W (average GPU)

GPU Temperature: 43 C
GPU Load: 0 %

UVD: Disabled

VCE: Disabled
Comment 7 alvarex 2018-10-16 04:03:55 UTC
I can confirm this behaviour with kernel 4.16 and point release 16 and 18 it works as I would expect, with kernel 4.17, 4.18.14 and 4.19 from drm fixes git it doesn't work. Setting feature mask have no effect. 
Polaris Rx 460
Comment 8 alvarex 2018-10-16 04:07:52 UTC
same effect as Christoph Haag mentions echoing pp_dpm_mclk and pp_mclk_od reports nothing changed.
Comment 9 alvarex 2018-10-18 22:33:52 UTC
this is not a bug. tempel julian is right. There is also the archwiki which explain this also. Behaviour has changed but it's not a bug.
Comment 10 alvarex 2018-10-18 22:37:07 UTC
check here 
https://wiki.archlinux.org/index.php/AMDGPU#Overclocking

and it's documented on the kernel source
Comment 11 Alex Deucher 2018-10-19 01:47:18 UTC
(In reply to alvarex from comment #10)
> check here 
> https://wiki.archlinux.org/index.php/AMDGPU#Overclocking
> 
> and it's documented on the kernel source

The arch wiki has some errors in it as well.  Here's the kernel documentation:
https://dri.freedesktop.org/docs/drm/gpu/amdgpu.html#gpu-power-thermal-controls-and-monitoring


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.