The issue started on Linux 4.19, failed to enter suspend. I managed to fix that by disabling amd display code on boot.
4.20, I was able to use use suspend and have amd display code.
5.0, first stared to have screen freezes after resume, but the cursor
is able to move on screen.
I will boot into 5.1-rc2 to post lshw soon.
Created attachment 143789 [details]
I am experiencing the exact same issue on my Dell Inspiron 3185 with a Stoney Ridge A9-9420e. I have tried multiple kernels and distributions and I experience the same freeze as described on resume from suspend. (I haven't found a setup that can resume from suspend at all.) On X11, the cursor still works but everything else on the display has hung. On wayland the whole display freezes on resume. I can get these lines in the kernel log though ssh after this happens:
[ 60.081396] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=379, emitted seq=382
[ 60.081550] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 2030 thread Xorg:cs0 pid 2031
[ 60.081560] [drm] IP block:gfx_v8_0 is hung!
[ 60.081637] [drm] GPU recovery disabled.
Passing amdgpu.gpu_recovery=1 to the kernel gets a larger kernel error message, but does not fix the issue.
I am able to use 4.19LTS with amdgpu.dc=0 to have suspend working.
I hope the next 5.1 gets fixed.
Same problem here with an HP 255 G6 laptop with the following GPU details:
Amd radeon R2
GPU name: Beema
Architecture: GCN 2.0 or GCN 1.2
Device ID: 98E4
I was able to use suspend before upgrading to 5.0 kernel.
An additional note (I don't know if useful) is that I experience the same behaviour even before starting X server. I mean, just starting the computer on tty and then a suspend/resume cycle. At this point everything works; if I try to start an X server with startx the screen freezes.
Nothing special in my log different from what has already been posted,
Mine experiences the same behavior with a suspend-resume cycle before X/wayland is loaded. I'm doing a bisect of the kernel between 4.20 and 5.0 to see which commit introduces the bug
A me-too. Although I have something to add: if I switch to a VT, I can suspend and resume. However, Xorg after suspend is still f*cked - i.e. I can switch to a text VT, suspend, resume in the VT and still have an operating VT; but if I try to switch back to Xorg, the machine hangs at this point.
Hardware is (lspci -v)
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Stoney [Radeon R2/R3/R4/R5 Graphics] (rev c1) (prog-if 00 [VGA controller])
Subsystem: Acer Incorporated [ALI] Device 1099
Flags: bus master, fast devsel, latency 0, IRQ 38
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at f0000000 (64-bit, prefetchable) [size=8M]
I/O ports at 3000 [size=256]
Memory at f0d00000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities:  Vendor Specific Information: Len=08 <?>
Capabilities:  Power Management version 3
Capabilities:  Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities:  Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities:  Secondary PCI Express <?>
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Kernel driver in use: amdgpu
Kernel modules: amdgpu
part of /proc/cpuinfo:
vendor_id : AuthenticAMD
cpu family : 21
model : 112
model name : AMD A9-9410 RADEON R5, 5 COMPUTE CORES 2C+3G
Lastly, it appears that the last time I suspend and resume okay was with 4.20.16-200.fc29.x86_64 ; 5.0.6-200.fc29.x86_64 and 5.0.7-200.fc29.x86_64 both hangs on waking. I have no data between. Although, as noted, suspend and resume in VT and staying in Vt, works - just that X is no longer functional after resume.
We have Aspire A315-21G and TravelMate B114-21 laptops and get the related problem, too.
106c7d6148e5aadd394e6701f7e498df49b869d1 is the first bad commit
Author: Likun Gao <Likun.Gao@amd.com>
Date: Thu Nov 8 20:19:54 2018 +0800
drm/amdgpu: abstract the function of enter/exit safe mode for RLC
Abstract the function of amdgpu_gfx_rlc_enter/exit_safe_mode and some part of
rlc_init to improve the reusability of RLC.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Acked-by: Christian König <email@example.com>
Reviewed-by: Alex Deucher <firstname.lastname@example.org>
Signed-off-by: Alex Deucher <email@example.com>
:040000 040000 8f3b365496f3bbd380a62032f20642ace51c8fef e14ec968011019e3f601df3f15682bb9ae0bafc6 M drivers
This run on my HP 15-bw0xx
cpu:AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+3G
with integrated graphics:
Stoney [Radeon R2/R3/R4/R5 Graphics] [1002:98E4]
I get the same symptoms as above;
a more involved scenario that may shed light is to switch to a tty and stop xdm (and hence sddm) so I have no graphics sessions running.
pm-suspend followed by resume works and brings me back to the tty, but when I then start xdm, I get a broken screen, usually garbage or grey, and syslog shows something like the following:
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=49, emitted seq=51
[drm] IP block:gfx_v8_0 is hung!
[drm] GPU recovery disabled.
If I enable amdgpu.gpu_recovery=1
kernel: [ 279.726475] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=57, emitted seq=59
kernel: [ 279.726536] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process X pid 2860 thread X:cs0 pid 2861
kernel: [ 279.726542] [drm] IP block:gfx_v8_0 is hung!
kernel: [ 279.726609] amdgpu 0000:00:01.0: GPU reset begin!
kernel: [ 279.726992] amdgpu 0000:00:01.0: GRBM_SOFT_RESET=0x000F0001
kernel: [ 279.727047] amdgpu 0000:00:01.0: SRBM_SOFT_RESET=0x00000100
kernel: [ 279.863162] [drm] recover vram bo from shadow start
kernel: [ 279.863164] [drm] recover vram bo from shadow done
kernel: [ 279.863166] [drm] Skip scheduling IBs!
kernel: [ 279.863191] amdgpu 0000:00:01.0: GPU reset(2) succeeded!
kernel: [ 280.015794] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
I can probably run diagnostics or collect a trace if someone tells me what and how.
The problem persists - I still get it running kernel 5.2.1
I've changed the "Hardware" from Other to AMD64 in the hope someone might actually look at this bug. It's been open for almost 4 months, and so far nothing's happened.
For anyone wanting a bypass, the only one that works for me is to use the last kernel release (4.20.17) before 5.0 came out. The latest LTS (long term stable) kernel before 5.0 is 4.19.60, but that exhibited strange lockups and performance issues when I tried it.
This leaves a choice of (a) running kernel 4.20.17, which is out of support, and therefore missing security fixes or (b) going without suspend, which is a severe limitation on a laptop; hibernate doesn't work on any kernel I've tried, so the only alternative to a flat battery is shutdown/reboot.
Does someone work on the fix here?
Should this issue be reported to the kernel.org?
I got in touch with the developer. He made a fix, I've tested it, so I presume it will be included in the next kernel (for certain values of "next").
I could ask him if I could put the patch here, if people want a fix sooner.
Fix is on it's way upstream:
The patch works!
I've been able to apply it to the gentoo-sources-5.2.7.
Thank you very much for reply!
*** Bug 110457 has been marked as a duplicate of this bug. ***
*** Bug 111399 has been marked as a duplicate of this bug. ***