linux 4.11.4 I use the IGPU on SKL and a RX460 as a dedicated card. In XOrg i turned autobindgpu, autoaddgpu off and singlecard on. with fb fbcon=map kernel parameter i map the linux ttys to different cards. the way i switch to the AMD card is to vt switch and then change the input on my monitor. this works fine and lets me login via agetty and launch a new xserver. One thing i noticed is that until i actually do that the memory clock of the card stays high (17xx mhz?) which makes it 5-8°C hotter and probably uses more power than it needs.
Same problem with TONGA. When GPU is idle, mclk goes to its maximum. This is easily reproducible: just turn off the monitor. I noticed this issue after upgrade from 4.9.x kernels to 4.11.7. Maybe I'll check the mainline kernel later and/or bisect. # cat /sys/class/drm/card0/device/pp_dpm_mclk 0: 150Mhz 1: 300Mhz 2: 700Mhz 3: 1450Mhz * # sensors amdgpu-\* amdgpu-pci-0100 Adapter: PCI adapter fan1: 2025 RPM temp1: +47.0°C (crit = +0.0°C, hyst = +0.0°C)
I've added printing of some debug info into smu7_hwmgr.c and here what I get before GPU enters that state: [ 778.701843] AMDGPU: vblank_time_us: 630, switch_limit_us: 450 [ 778.707608] AMDGPU: vblank_time_us: 630, switch_limit_us: 450 [ 778.713379] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 60 [ 778.748777] AMDGPU: vblank_time_us: 0, switch_limit_us: 450 [ 778.754361] AMDGPU: vblank_time_us: 0, switch_limit_us: 450 [ 778.759951] AMDGPU: disable_mclk_switching: 1, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 1, mode_info.refresh_rate: 0 For some reason if refresh_rate = 0 then vblank_time_us = 0. Shouldn't the latter be 0xffffffff instead? So I guess the following commit is the culprit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09be4a5219610a6fae3215d4f51f948d6f5d2609 and the following patch should fix (or workaround?) this issue: - if (vblank_time_us < switch_limit_us) + if (vblank_time_us && (vblank_time_us < switch_limit_us)) After applying it: [ 409.588673] AMDGPU: vblank_time_us: 630, switch_limit_us: 450 [ 409.594427] AMDGPU: vblank_time_us: 630, switch_limit_us: 450 [ 409.600182] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 60 [ 409.639750] AMDGPU: vblank_time_us: 0, switch_limit_us: 450 [ 409.645321] AMDGPU: vblank_time_us: 0, switch_limit_us: 450 [ 409.650917] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 0 $ cat /sys/class/drm/card0/device/pp_dpm_mclk 0: 150Mhz * 1: 300Mhz 2: 700Mhz 3: 1450Mhz
Created attachment 132358 [details] [review] possible fix
(In reply to Alex Deucher from comment #3) > Created attachment 132358 [details] [review] [review] > possible fix This patch fixes this bug for me. Thank you! [ 359.229187] AMDGPU: vblank_time_us: 630, switch_limit_us: 450 [ 359.234933] AMDGPU: vblank_time_us: 630, switch_limit_us: 450 [ 359.240684] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 60 [ 359.283987] AMDGPU: vblank_time_us: 4294967295, switch_limit_us: 450 [ 359.290342] AMDGPU: vblank_time_us: 4294967295, switch_limit_us: 450 [ 359.296703] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 0 ... ...
Works for me on 4.11.10. Display off, MCLK is low and card temperature is 27°C as expected. Thanks.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.