Bug 109461

Summary: [amdgpu/radeonsi,HAWAII] Hand of Fate 2 leads to GPU lock up (display powered off, SSH works, keyboard dead): "flip_done timed out"
Product: DRI Reporter: Kai <kai>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: rverschelde
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 77449    
Attachments:
Description Flags
dmesg excerpt showing the backtraces and other DRM related entries none

Description Kai 2019-01-26 16:21:11 UTC
Created attachment 143232 [details]
dmesg excerpt showing the backtraces and other DRM related entries

Playing Hand of Fate 2 leads to reproducible lock ups of my HAWAII Pro GPU. Sometimes directly on initial load, sometimes after playing for a while. The system can be reached over SSH, but the attached input devices are dead (not even num lock changes work). In addition to this the display gets powered off (display turns black and shows behaviour as if looking for input, ie. the connector identifier is show.

In dmesg I can see "flip_done timed out" errors and two backtraces (see attached dmesg excerpt for all the details):
> [15465.441663] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:44:crtc-0] flip_done timed out
> [15465.451746] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1164561, emitted seq=1164563
> [15465.451751] [drm] GPU recovery disabled.
> [15467.233739] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=171220, emitted seq=171221
> [15467.233746] [drm] GPU recovery disabled.
> [15475.681643] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:44:crtc-0] flip_done timed out
> [15485.921664] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:42:plane-5] flip_done timed out
> [15485.921779] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* amdgpu_dm_commit_planes: acrtc 0, already busy

If you need data logged with umr, please provide me with the exact command I should run.

The bug is reproducible with the following stack (Debian testing as a base):
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/8e9ad592c3
libdrm: 2.4.97
LLVM: SVN:trunk/r351739 (9.0 devel)
X.Org: 2:1.20.3-1
Linux: 4.20.4
Firmware (firmware-amd-graphics): 20190114-1
libclc: Git:master/428e821c1e
DDX (xserver-xorg-video-amdgpu): 18.1.0-1

Let me know if you need anything else.
Comment 1 Michel Dänzer 2019-01-28 10:07:52 UTC
Is this a regression, and if so, can you bisect? Note that it could be a Mesa/LLVM issue rather than a kernel one.
Comment 2 Kai 2019-01-28 10:22:53 UTC
(In reply to Michel Dänzer from comment #1)
> Is this a regression, and if so, can you bisect? Note that it could be a
> Mesa/LLVM issue rather than a kernel one.

Technically yes. But I don't know a (reasonable) good version, because the last time I played this game was in 2017. Between then and now there were many updates for the game (the last one is from 2019-01-11) and an incredible amount of commits for the kernel, Mesa and LLVM.

I'm not even sure if I still used the radeon module the last time or if I was already on amdgpu.
Comment 3 Hubert Kario 2019-04-12 10:33:46 UTC
did you try to monitor temperature while running the game?
Did you try to reproduce it with something like glmark2 or FurMark?

see also bug 109466 comment 9
Comment 4 Martin Peres 2019-11-19 09:11:39 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/684.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.