Bug 99710

Summary: [amdgpu R9 390] GPU hang when playing Hearthstone in Wine
Product: Mesa Reporter: Garth Theisen <garththeisen>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: mike, sandy.8925
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: DDEBUG_DUMP_#1
Xorg.log
lspci output

Description Garth Theisen 2017-02-08 05:50:11 UTC
Created attachment 129407 [details]
DDEBUG_DUMP_#1

Repeatedly able to hang system running Hearthstone via Wine. The crash is never predictable but it often occurs right as the game is entering a multiplayer match or some short while after.  Screen goes blank and system is unresponsive, no luck with Magic SysRq key in most cases.


System Profile ...

GPU: R9 390X
Distro: Gentoo
Kernel: Linux 4.9.8 
KMD: amdgpu
UMD: Mesa (git)
Comment 1 Garth Theisen 2017-02-08 05:51:20 UTC
Created attachment 129408 [details]
Xorg.log
Comment 2 Garth Theisen 2017-02-08 05:53:13 UTC
GPU: XFX R9 390
Comment 3 Garth Theisen 2017-02-08 05:55:40 UTC
Created attachment 129409 [details]
lspci output
Comment 4 Mike Lothian 2017-02-08 13:11:36 UTC
Which graphics mode are you using? Default, CMST or Gallium Nine?
Comment 5 Garth Theisen 2017-02-08 15:17:22 UTC
I am able to recreate this on Wine-git using modes default and nine. Additionally this is also a problem in Crossover with Performance Enhanced Graphics enabled.
Comment 6 Garth Theisen 2017-02-12 03:51:11 UTC
Interesting discovery. I loaded Android OS Virtualbox guest using Genymotion (Google Nexus 7 image with ARM translation installed) and tested the Google Play version of Hearthstone. 

I am able to reproduce the same behaviour, hard locking my host (the machine profiled above).  Black screens after an indeterminate amount of time running a Standard match. My host machine is unresponsive to SSH access after most trials, ... any suggestions for capturing diagnostics would be appreciated.
Comment 7 Sandeep 2017-09-04 05:32:44 UTC
I have the same GPU, and have also started experiencing system hangs since the past 1-2 months. I believe it may be related to this issue, since it only occurs when using 3D graphics in some form, either while playing Left 4 Dead 2 or when using the Chromium browser with GPU acceleration enabled. In the case of Left 4 Dead 2, the system always hangs unpredictably at some point.

I am using the AMDGPU driver, with AMDGPU CIK support enabled.

I tried running 4.13 stable today, and the crash still occurred. I will try older kernels to see if I am still able to reproduce.
Comment 8 Sandeep 2017-09-04 06:12:00 UTC
Well, I tried running 4.11 RC3, and that also had the same problem. Will see if I can go further back (4.10, 4.9 etc.) and see if I can get it to work without problems. Otherwise, the problem lies somewhere else, but is definitely related to the GPU drivers, since it doesn't get triggered by anything else.
Comment 9 Sandeep 2017-09-05 03:47:30 UTC
I've tried 4.11.9 and 4.10.13 kernels, and the hang occurs on both of them.
Comment 10 Sandeep 2017-09-08 16:27:38 UTC
I tried the 4.11.0 kernel since I suspected that the buggy change might also be present in the point releases, but that also caused a hang whenplaying Left 4 Dead 2. Will try older kernels and see if they work correctly (which they should since this hang definitely wasn't present 2 months ago).
Comment 11 Garth Theisen 2017-09-12 03:34:29 UTC
(In reply to Sandeep from comment #10)
> I tried the 4.11.0 kernel since I suspected that the buggy change might also
> be present in the point releases, but that also caused a hang whenplaying
> Left 4 Dead 2. Will try older kernels and see if they work correctly (which
> they should since this hang definitely wasn't present 2 months ago).

Yep, Sandeep, I think this behaviour is tied to DPM issues highlighted in Bug 91880 : 'Radeonsi on Grenada cards (r9 390) exceptionally unstable and poorly performing'. I suggest following, reading, and commenting on that issue.
Comment 12 Sandeep 2017-09-21 02:27:00 UTC
Well, to clarify, I have been using the AMDGPU driver for the past year, not the Radeon driver. I've only faced this issue since the past 2 months, never had the problem earlier - so I don't think the other bug applies. Also, the devs say old firmware is the culprit there, but I run Arch Linux, and the linux-firmware has whatever's present on 7th September of this year - so I doubt the firmware is out of date.

I did test 4.10.6, but Left 4 Dead 2 crashed less than a second after loading a level - this is weird, since I did run 4.10.x kernels without any problems. Makes me think the problem lies elsewhere. 

Will see if I can figure out what's causing this.
Comment 13 Sandeep 2017-09-23 18:40:28 UTC
Ok fine - I tested wtih amdgpu.dpm=0 and no hangs after 2 whole levels of Left 4 Dead 2.

Looks like DPM is the problem - seems to be a regression, since it was working fine with DPM 2-3 months ago.
Comment 14 Sandeep 2017-09-24 23:33:06 UTC
And I just found out that suspend/resume is totally broken if I disable DPM.

This is a critical bug for us Linux users, especially given that this is a $300 card, it doesn't feel like I got my money's worth. I've never had so many problems with Intel and NVIDIA GPUs as I've had with AMD GPU drivers.
Comment 15 Sandeep 2018-03-03 01:49:30 UTC
Trying to get an apitrace trace so that I can reproduce consistently.

Found out that I can reproduce reliably with OpenArena. Will try to create a trace now.
Comment 16 Sandeep 2018-03-03 02:11:06 UTC
Ok, I was able to successfully reproduce using a trace from OpenArena.

I'll try uploading the file and paste a link here.
Comment 17 Sandeep 2018-03-03 03:18:34 UTC
Definitely seems to be DPM related, I used the following command to force DPM performance level, and Left 4 Dead 2 and OpenArena worked fine with no hangs.

"echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level"

Works fine if I set "low" as well.

The system hangs if I set "auto".

Well, atleast I have a trace that reproduces it.
Comment 18 GitLab Migration User 2019-09-25 17:56:52 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1251.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.