110258 – Lenovo V110-15AST AMD A9-9410 AMD R5 Stoney hangs after waking after suspend. 5.0 onwards

Bug 110258 - Lenovo V110-15AST AMD A9-9410 AMD R5 Stoney hangs after waking after suspend. 5.0 onwards

Summary: Lenovo V110-15AST AMD A9-9410 AMD R5 Stoney hangs after waking after suspend...

Status:	RESOLVED MOVED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/AMDgpu (show other bugs)
Version:	XOrg git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium major
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Duplicates (2):	110457 111399 (view as bug list)
Depends on:
Blocks:

Reported:	2019-03-27 11:27 UTC by ukbeast89
Modified:	2019-11-19 09:18 UTC (History)
CC List:	5 users (show)

See Also:
i915 platform:
i915 features:

Attachments
lshw (23.13 KB, text/plain) 2019-03-27 11:35 UTC, ukbeast89	no flags	Details
View All

Description ukbeast89 2019-03-27 11:27:01 UTC

The issue started on Linux 4.19, failed to enter suspend. I managed to fix that by disabling amd display code on boot.

4.20, I was able to use use suspend and have amd display code.

5.0, first stared to have screen freezes after resume, but the cursor
is able to move on screen. 

I will boot into 5.1-rc2 to post lshw soon.

Comment 1 ukbeast89 2019-03-27 11:35:01 UTC

Created attachment 143789 [details]
lshw

Comment 2 Ethan 2019-04-03 01:31:12 UTC

I am experiencing the exact same issue on my Dell Inspiron 3185 with a Stoney Ridge A9-9420e. I have tried multiple kernels and distributions and I experience the same freeze as described on resume from suspend. (I haven't found a setup that can resume from suspend at all.) On X11, the  cursor still works but everything else on the display has hung. On wayland the whole display freezes on resume. I can get these lines in the kernel log though ssh after this happens:

[   60.081396] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=379, emitted seq=382
[   60.081550] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 2030 thread Xorg:cs0 pid 2031
[   60.081560] [drm] IP block:gfx_v8_0 is hung!
[   60.081637] [drm] GPU recovery disabled.

Passing amdgpu.gpu_recovery=1 to the kernel gets a larger kernel error message, but does not fix the issue.

Comment 3 ukbeast89 2019-04-03 18:19:01 UTC

I am able to use 4.19LTS with amdgpu.dc=0 to have suspend working.
I hope the next 5.1 gets fixed.

Comment 4 tigerjack89 2019-04-06 13:37:46 UTC

Same problem here with an HP 255 G6 laptop with the following GPU details:

Amd radeon R2
GPU name: Beema
Architecture: GCN 2.0 or GCN 1.2
Bus: IGP
UVD 6
Device ID: 98E4

I was able to use suspend before upgrading to 5.0 kernel.

An additional note (I don't know if useful) is that I experience the same behaviour even before starting X server. I mean, just starting the computer on tty and then a suspend/resume cycle. At this point everything works; if I try to start an X server with startx the screen freezes. 

Nothing special in my log different from what has already been posted,

Comment 5 Ethan 2019-04-06 16:05:05 UTC

Mine experiences the same behavior with a suspend-resume cycle before X/wayland is loaded. I'm doing a bisect of the kernel between 4.20 and 5.0 to see which commit introduces the bug

Comment 6 Hin-Tak Leung 2019-04-13 22:15:07 UTC

A me-too. Although I have something to add: if I switch to a VT, I can suspend and resume. However, Xorg after suspend is still f*cked - i.e. I can switch to a text VT, suspend, resume in the VT and still have an operating VT; but if I try to switch back to Xorg, the machine hangs at this point.

Hardware is (lspci -v)

00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Stoney [Radeon R2/R3/R4/R5 Graphics] (rev c1) (prog-if 00 [VGA controller])
	Subsystem: Acer Incorporated [ALI] Device 1099
	Flags: bus master, fast devsel, latency 0, IRQ 38
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at f0000000 (64-bit, prefetchable) [size=8M]
	I/O ports at 3000 [size=256]
	Memory at f0d00000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
	Capabilities: [58] Express Root Complex Integrated Endpoint, MSI 00
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [270] Secondary PCI Express <?>
	Capabilities: [2b0] Address Translation Service (ATS)
	Capabilities: [2c0] Page Request Interface (PRI)
	Capabilities: [2d0] Process Address Space ID (PASID)
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

part of /proc/cpuinfo:

vendor_id	: AuthenticAMD
cpu family	: 21
model		: 112
model name	: AMD A9-9410 RADEON R5, 5 COMPUTE CORES 2C+3G

Lastly, it appears that the last time I suspend and resume okay was with 4.20.16-200.fc29.x86_64 ; 5.0.6-200.fc29.x86_64 and 5.0.7-200.fc29.x86_64 both hangs on waking. I have no data between. Although, as noted, suspend and resume in VT and staying in Vt, works - just that X is no longer functional after resume.

Comment 7 jian-hong 2019-04-23 02:48:39 UTC

We have Aspire A315-21G and TravelMate B114-21 laptops and get the related problem, too.
https://bugzilla.freedesktop.org/show_bug.cgi?id=110457

Comment 8 Paul Gover 2019-07-18 10:50:25 UTC

Git bisect:

106c7d6148e5aadd394e6701f7e498df49b869d1 is the first bad commit
commit 106c7d6148e5aadd394e6701f7e498df49b869d1
Author: Likun Gao <Likun.Gao@amd.com>
Date:   Thu Nov 8 20:19:54 2018 +0800

    drm/amdgpu: abstract the function of enter/exit safe mode for RLC
    
    Abstract the function of amdgpu_gfx_rlc_enter/exit_safe_mode and some part of
    rlc_init to improve the reusability of RLC.
    
    Signed-off-by: Likun Gao <Likun.Gao@amd.com>
    Acked-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 8f3b365496f3bbd380a62032f20642ace51c8fef e14ec968011019e3f601df3f15682bb9ae0bafc6 M      drivers

This run on my HP 15-bw0xx
cpu:AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+3G
with integrated graphics:
Stoney [Radeon R2/R3/R4/R5 Graphics] [1002:98E4]

I get the same symptoms as above;
a more involved scenario that may shed light is to switch to a tty and stop xdm (and hence sddm) so I have no graphics sessions running.
pm-suspend followed by resume works and brings me back to the tty, but when I then start xdm, I get a broken screen, usually garbage or grey, and syslog shows something like the following:
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=49, emitted seq=51
[drm] IP block:gfx_v8_0 is hung!
[drm] GPU recovery disabled.

If I enable amdgpu.gpu_recovery=1
kernel: [  279.726475] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=57, emitted seq=59
kernel: [  279.726536] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process X pid 2860 thread X:cs0 pid 2861
kernel: [  279.726542] [drm] IP block:gfx_v8_0 is hung!
kernel: [  279.726609] amdgpu 0000:00:01.0: GPU reset begin!
kernel: [  279.726992] amdgpu 0000:00:01.0: GRBM_SOFT_RESET=0x000F0001
kernel: [  279.727047] amdgpu 0000:00:01.0: SRBM_SOFT_RESET=0x00000100
kernel: [  279.863162] [drm] recover vram bo from shadow start
kernel: [  279.863164] [drm] recover vram bo from shadow done
kernel: [  279.863166] [drm] Skip scheduling IBs!
kernel: [  279.863191] amdgpu 0000:00:01.0: GPU reset(2) succeeded!
kernel: [  280.015794] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

I can probably run diagnostics or collect a trace if someone tells me what and how.

The problem persists - I still get it running kernel 5.2.1

Comment 9 Paul Gover 2019-07-22 09:46:41 UTC

I've changed the "Hardware" from Other to AMD64 in the hope someone might actually look at this bug.  It's been open for almost 4 months, and so far nothing's happened.

For anyone wanting a bypass, the only one that works for me is to use the last kernel release (4.20.17) before 5.0 came out.  The latest LTS (long term stable) kernel before 5.0 is 4.19.60, but that exhibited strange lockups and performance issues when I tried it.

This leaves a choice of (a) running kernel 4.20.17, which is out of support, and therefore missing security fixes or (b) going without suspend, which is a severe limitation on a laptop; hibernate doesn't work on any kernel I've tried, so the only alternative to a flat battery is shutdown/reboot.

Comment 10 Eugene Bright 2019-08-09 01:27:24 UTC

Does someone work on the fix here?
Should this issue be reported to the kernel.org?

Comment 11 Paul Gover 2019-08-09 08:51:54 UTC

I got in touch with the developer.  He made a fix, I've tested it, so I presume it will be included in the next kernel (for certain values of "next"). 
I could ask him if I could put the patch here, if people want a fix sooner.

Comment 12 Alex Deucher 2019-08-09 15:15:08 UTC

Fix is on it's way upstream:
https://cgit.freedesktop.org/drm/drm/commit/?h=drm-fixes&id=72cda9bb5e219aea0f2f62f56ae05198c59022a7

Comment 13 Eugene Bright 2019-08-09 16:49:29 UTC

The patch works!
I've been able to apply it to the gentoo-sources-5.2.7.

Thank you very much for reply!

Comment 14 Alex Deucher 2019-08-13 02:57:43 UTC

*** Bug 110457 has been marked as a duplicate of this bug. ***

Comment 15 Vladimir 2019-08-14 11:31:42 UTC

*** Bug 111399 has been marked as a duplicate of this bug. ***

Comment 16 Martin Peres 2019-11-19 09:18:12 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/734.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.