Bug 108049

Summary: [vega10] amdgpu fails to either wake up the GPU or while putting it to sleep
Product: DRI Reporter: Gediminas Jakutis <gediminas>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: gediminas, harry.wentland, nicholas.kazlauskas, sunpeng.li
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
log for dce_mi_free_dmif line:636
none
log for dce_mi_allocate_dmif line:599 none

Description Gediminas Jakutis 2018-09-24 19:46:47 UTC
Created attachment 141725 [details]
log for dce_mi_free_dmif line:636

If my machine with a Vega 64 idles for long enough for the GPU to be "put to sleep" to save power, it won't wake up. If given inputs, The monitors get wake up, but only show black, indicating that something is happening.

Looking at kernel logs, this always starts with:
[drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif line:636
(full log attached separately)

While trying to "wake up" and/or if I try to switch to a different TTY, another error pops up for every attempt, but the opening line is now:
[drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 6000 tries - dce_mi_allocate_dmif line:599
(full log also attached separately)

I experience this problem with every kernel version I tried from 4.16.x to and including 4.18.7
Haven't tried earlier versions and not sure which kernel to try to build in order to test the *latest* amdgpu code. Could test the latest amdgpu code if I could get pointed to it.

As a workaround, I have a tiny video on mpv running on infinite loop in the background using an opengl output, to prevent the GPU from trying to sleep.
Comment 1 Gediminas Jakutis 2018-09-24 19:47:20 UTC
Created attachment 141726 [details]
log for dce_mi_allocate_dmif line:599
Comment 2 Gediminas Jakutis 2018-09-24 19:53:41 UTC
Bug 107947 might be related, but the backtrace appears to be rather different.
Comment 3 Nicholas Kazlauskas 2018-09-27 13:14:43 UTC
The kernel you want to try for the latest amdgpu code is the amd-staging-drm-next branch from the repository linked:

https://cgit.freedesktop.org/~agd5f/linux/
Comment 4 Gediminas Jakutis 2018-10-20 16:32:38 UTC
I've been running amd-staging-drm-next at de67947d1595c17df82ba0de02c318cd6b28926b for some two weeks now and I haven't encountered this problem once.
I guess it's already fixed in amd-staging-drm-next, then.
Comment 5 Matthew Miller 2018-11-08 16:31:54 UTC
Can confirm: kernel-4.20.0-0.rc1.git1.2.fc30.x86_64 fixes.
Comment 6 Pokemon 2019-04-11 05:43:29 UTC Comment hidden (spam)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.