Bug 111240 - ASUS TUF Gaming laptops gets throttled down when the RX560X GPU is being used
Summary: ASUS TUF Gaming laptops gets throttled down when the RX560X GPU is being used
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: highest major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-28 14:44 UTC by Denys
Modified: 2019-11-19 09:38 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Unigine_Heaven-4.0 around 27 fps (1.03 MB, image/png)
2019-07-28 14:44 UTC, Denys
no flags Details
moment of performance drop (272.96 KB, image/jpeg)
2019-08-03 20:30 UTC, Christoph Haag
no flags Details
Windows 10 test - 64 fps (1.73 MB, image/png)
2019-08-04 11:08 UTC, Denys
no flags Details

Description Denys 2019-07-28 14:44:55 UTC
Created attachment 144897 [details]
Unigine_Heaven-4.0 around 27 fps

I have laptop ASUS TUF Gaming with Ubuntu 19.04, but graphics card rx 560x very slow on linux system.


DRI_PRIME=1 glxinfo | grep OpenGL                
OpenGL vendor string: X.Org
OpenGL renderer string: Radeon RX 560 Series (POLARIS11, DRM 3.27.0, 5.0.0-21-generic, LLVM 8.0.1)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.2.0-devel (git-2f92360 2019-07-26 disco-oibaf-ppa)
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.2.0-devel (git-2f92360 2019-07-26 disco-oibaf-ppa)
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 19.2.0-devel (git-2f92360 2019-07-26 disco-oibaf-ppa)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:


glxinfo | grep OpenGL                                      
OpenGL vendor string: X.Org
OpenGL renderer string: AMD RAVEN (DRM 3.27.0, 5.0.0-21-generic, LLVM 8.0.1)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.2.0-devel (git-2f92360 2019-07-26 disco-oibaf-ppa)
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.2.0-devel (git-2f92360 2019-07-26 disco-oibaf-ppa)
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 19.2.0-devel (git-2f92360 2019-07-26 disco-oibaf-ppa)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

I tried run Unigine_Heaven-4.0. I've got around 27 fps on basic preset...
Comment 1 Christoph Haag 2019-08-03 20:30:18 UTC
Created attachment 144944 [details]
moment of performance drop

I have the same laptop so I tried it and there is actually an interesting effect, see attached screenshot.

At first performance is quite good, but at some point during the benchmark, performance and GPU usage suddenly drops a lot.

The performance then stays low permanently, even when restarting the benchmark, after waiting for the dgpu get shutdown by runpm.

Maybe I'll take a closer look later with more gallium hud graphs and maybe umr --top, but for now I'll just leave a comment here confirming that there is an issue.

Also loading the benchmark seems to take much longer after the performance drop happens, which seems very suspicious.

Tested on Arch with mesa 19.1.3, Linux 5.1, 5.2 and 5.3-rc2. Happens on all three kernel versions.
Comment 2 Denys 2019-08-04 08:36:30 UTC
On the Arch you have more fps then Ubuntu, probably Kernel 5.1 better then 5.0.

Just now update Kernel to 5.0.0-23-generic, it seems nothing change, same 26-27 fps on basic preset.

Christoph Haag, do you know the developers planning some update for amdgpu driver?
Comment 3 Denys 2019-08-04 11:08:17 UTC
Created attachment 144945 [details]
Windows 10 test - 64 fps

Just for comparison. 
Same laptop, same test, but running on Windows 10. I have got around 64 fps...
Comment 4 cmdrrdo 2019-09-01 14:06:21 UTC
The performance drop may be related to an incorrect boost frequency (400MHz), compare here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1830522

Workaround is to set
echo 0 > /sys/devices/system/cpu/cpufreq/boost
Comment 5 Sylvain BERTRAND 2019-09-01 17:24:53 UTC
On multi-core cpus, better use the performance cpu freq governor.
Some load profiles may be badly managed by the on-demand cpu freq governor.
Due to the load profile, some cores may not get enough load to be switched to
their highest frequencies (which is critical for realtime game performance).

I did notice significant fps loss with the on-demand cpu governor with some of
my dota2 configurations running on a 8-cores 4.7GHz (FX9590).

This is a pb for lambda users who don't want to know anything about cpu freq
governors.

Multi-threaded games must be aware of the pb and warn the users about the
cpu governor settings (this is true on any OS).
Comment 6 Jacek Konieczny 2019-10-28 07:13:44 UTC
I have a similar laptop (TUF Gaming FX505DY), and the very same problem, with kernel 5.3.7 and governor set to 'performance'.

When I run a game with DRI_PRIME=1 (to use the dedicated GPU) the system gets throttled down at some point and never recovers. Every CPU core runs at 400Mhz which makes the system practically unusable until reboot.
Comment 7 Jacek Konieczny 2019-10-28 21:33:58 UTC
I have tried kernel 5.4-rc5 – that did not help.

Disabling cpufreq boost (echo 0 > /sys/devices/system/cpu/cpufreq/boost) stops the drastic slowdown from happening, but this also makes the system significantly slower than normal (still much better than after the bug appears), so there is no gain from using the discrete GPU (DRI_PRIME=1).

I wonder if this might be the problem:

> $ cat /sys/class/hwmon/hwmon3/{name,temp1_crit,temp1_crit_hyst}
> amdgpu
> 94000
> -273150
> $ cat /sys/class/hwmon/hwmon4/{name,temp1_crit,temp1_crit_hyst}
> amdgpu
> 80000
> 0

Are these hysteresis values being used? If so, then those values won't work for recovering from thermal throttle. Though, I have never seen temperature reading reaching 80 degrees there.

The first one seems to be the discrete GPU (RX560X), the other one integrated (Vega 8).
Comment 8 Jacek Konieczny 2019-11-04 11:52:06 UTC
Looks like it is not amdgpu bug, but a faulty cooling algorithm implemented in BIOS/hardware (reportedly the same happens on Windows before dedicated ASUS software is installed).

I was able to make things much better with custom thermald config, but Linux is still lacking proper tools to do it right.

It is quite tricky, probably mostly due to interaction between the GPU heating and the CPU boost feature not accounting for the extra heat.
Comment 9 Martin Peres 2019-11-19 09:38:01 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/882.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.