Bug 98905 - XFX Radeon RX 480 XXX OC GPU hangs in games with "auto" power_dpm_force_performance_level
Summary: XFX Radeon RX 480 XXX OC GPU hangs in games with "auto" power_dpm_force_perfo...
Status: RESOLVED NOTABUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 98162 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-11-29 18:58 UTC by Christoph Haag
Modified: 2016-12-25 00:13 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Christoph Haag 2016-11-29 18:58:06 UTC
I've had this across several applications and several kernel/mesa/llvm versions and I'm not sure I get everything here right, but here it goes.

Most easily I can reproduce the GPU hang on SOMA when starting a new game - it will hang while the intro video plays or shortly after, in the game - other and very demanding games work without hangs.

It would hang like this:

[Di Nov 29 19:46:21 2016] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=3374996, last emitted seq=3374998
[Di Nov 29 19:46:21 2016] [drm] IP block:5 is hang!

Just now is the first time it actually recovered, usually I need to hard reset the PC.

Now this GPU is factory overclocked a little bit:
/sys/class/drm/card0/device/pp_dpm_sclk
0: 300Mhz
1: 608Mhz
2: 910Mhz
3: 1077Mhz
4: 1145Mhz
5: 1191Mhz
6: 1236Mhz
7: 1288Mhz *

I am reasonably confident that when I set a power level with this command:

echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo 7 > /sys/class/drm/card0/device/pp_dpm_sclk

the GPU will not hang while the power level is fixed.
Therefore I think that the GPU hang is related to switching between power levels.
It's possible that this problem is specific to this factory overclocked model.
Comment 1 Christoph Haag 2016-12-25 00:06:23 UTC
As far as I can tell after a couple of days this is "fixed" by RMA'ing the GPU and getting a new one (same model).

The entire issue was probably caused by bad hardware.
Comment 2 Christoph Haag 2016-12-25 00:13:04 UTC
*** Bug 98162 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.