Summary: | System resumes failed and hits [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout on Acer Aspire A315-21G | ||
---|---|---|---|
Product: | DRI | Reporter: | jian-hong |
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | critical | ||
Priority: | high | ||
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
jian-hong
2019-04-17 05:53:27 UTC
Created attachment 144007 [details]
dmesg with amdgpu.dc=1 drm.debug=7 amdgpu.runpm=0 in boot command
Also tried with amdgpu.runpm=0 in boot command. However, it still get the same error.
[ 78.078762] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=290, emitted seq=294
[ 78.078897] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 572 thread Xorg:cs0 pid 588
[ 78.078908] [drm] IP block:gfx_v8_0 is hung!
[ 78.079079] [drm] GPU recovery disabled.
Created attachment 144008 [details]
lspci -nnv on Acer Squirtle_SR
Created attachment 144030 [details]
dmesg with amdgpu.dc=1 drm.debug=7 in boot command on Acer TravelMate B114-21
We have another laptop Acer TravelMate B114-21, which hits the same issue. It is equipped with AMD A4-9120C RADEON R4, 5 COMPUTE CORES 2C+3G.
[ 60.011965] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=206, emitted seq=208
[ 60.012215] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 1388 thread gnome-shel:cs0 pid 1409
[ 60.012226] [drm] IP block:gfx_v8_0 is hung!
[ 60.012320] [drm] GPU recovery disabled.
00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Stoney [Radeon R2/R3/R4/R5 Graphics] [1002:98e4] (rev eb) (prog-if 00 [VGA controller])
Subsystem: Acer Incorporated [ALI] Stoney [Radeon R2/R3/R4/R5 Graphics] [1025:132a]
Flags: bus master, fast devsel, latency 0, IRQ 36
Memory at e8000000 (64-bit, prefetchable) [size=128M]
Memory at f0000000 (64-bit, prefetchable) [size=8M]
I/O ports at f000 [size=256]
Memory at fea00000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [270] #19
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Kernel driver in use: amdgpu
Kernel modules: amdgpu
Also tried with amdgpu.runpm=0 in boot command, but this issue still can be reproduced.
Created attachment 144031 [details]
lspci -nnv on Acer TravelMate B114-21
Created attachment 144042 [details]
journal log on Acer TravelMate B114-21
Got more information after wait more time for resuming on Acer TravelMate B114-21.
Apr 19 15:06:38 endless kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2841, emitted seq=2845
Apr 19 15:06:38 endless kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 695 thread Xorg:cs0 pid 698
Apr 19 15:06:38 endless kernel: [drm] IP block:gfx_v8_0 is hung!
Apr 19 15:06:38 endless kernel: [drm] GPU recovery disabled.
Apr 19 15:06:40 endless kernel: INFO: task Xorg:695 blocked for more than 604 seconds.
Apr 19 15:06:40 endless kernel: Tainted: G W 5.1.0-rc5+ #1
Apr 19 15:06:40 endless kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 15:06:40 endless kernel: Xorg D 0 695 683 0x00400004
Apr 19 15:06:40 endless kernel: Call Trace:
Apr 19 15:06:40 endless kernel: __schedule+0x2d4/0x840
Apr 19 15:06:40 endless kernel: schedule+0x2c/0x70
Apr 19 15:06:40 endless kernel: schedule_timeout+0x258/0x360
Apr 19 15:06:40 endless kernel: ? amdgpu_atom_execute_table_locked+0x136/0x210 [amdgpu]
Apr 19 15:06:40 endless kernel: dma_fence_default_wait+0x20a/0x280
Apr 19 15:06:40 endless kernel: ? dma_fence_release+0xa0/0xa0
Apr 19 15:06:40 endless kernel: dma_fence_wait_timeout+0xe7/0x110
Apr 19 15:06:40 endless kernel: amdgpu_fence_wait_empty+0x61/0xc0 [amdgpu]
Apr 19 15:06:40 endless kernel: amdgpu_pm_compute_clocks+0x70/0x590 [amdgpu]
Apr 19 15:06:40 endless kernel: dm_pp_apply_display_requirements+0x19a/0x1b0 [amdgpu]
Apr 19 15:06:40 endless kernel: dce11_pplib_apply_display_requirements+0x1f4/0x210 [amdgpu]
Apr 19 15:06:40 endless kernel: dce11_update_clocks+0xa0/0x100 [amdgpu]
Apr 19 15:06:40 endless kernel: dce110_prepare_bandwidth+0x3e/0x50 [amdgpu]
Apr 19 15:06:40 endless kernel: dc_commit_state+0x22d/0x5a0 [amdgpu]
Apr 19 15:06:40 endless kernel: ? drm_calc_timestamping_constants+0x106/0x150 [drm]
Apr 19 15:06:40 endless kernel: amdgpu_dm_atomic_commit_tail+0x1fb/0x1930 [amdgpu]
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70
Apr 19 15:06:40 endless kernel: ? __switch_to_xtra+0x3b8/0x5b0
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70
Apr 19 15:06:40 endless kernel: ? ttm_bo_mem_compat+0x28/0x60 [ttm]
Apr 19 15:06:40 endless kernel: ? ttm_bo_validate+0x3d/0x130 [ttm]
Apr 19 15:06:40 endless kernel: ? __switch_to+0x48b/0x4f0
Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70
Apr 19 15:06:40 endless kernel: ? __schedule+0x2dc/0x840
Apr 19 15:06:40 endless kernel: ? amdgpu_bo_pin_restricted+0x1a2/0x270 [amdgpu]
Apr 19 15:06:40 endless kernel: ? _cond_resched+0x19/0x30
Apr 19 15:06:40 endless kernel: ? wait_for_completion_timeout+0x38/0x140
Apr 19 15:06:40 endless kernel: ? _cond_resched+0x19/0x30
Apr 19 15:06:40 endless kernel: ? wait_for_completion_interruptible+0x35/0x1a0
Apr 19 15:06:40 endless kernel: commit_tail+0x42/0x70 [drm_kms_helper]
Apr 19 15:06:40 endless kernel: ? commit_tail+0x42/0x70 [drm_kms_helper]
Apr 19 15:06:40 endless kernel: drm_atomic_helper_commit+0x113/0x120 [drm_kms_helper]
Apr 19 15:06:40 endless kernel: amdgpu_dm_atomic_commit+0x9b/0xe0 [amdgpu]
Apr 19 15:06:40 endless kernel: drm_atomic_commit+0x4a/0x50 [drm]
Apr 19 15:06:40 endless kernel: drm_atomic_helper_set_config+0x87/0x90 [drm_kms_helper]
Apr 19 15:06:40 endless kernel: drm_mode_setcrtc+0x1bb/0x740 [drm]
Apr 19 15:06:40 endless kernel: ? drm_is_current_master+0x1f/0x40 [drm]
Apr 19 15:06:40 endless kernel: ? drm_mode_getcrtc+0x1a0/0x1a0 [drm]
Apr 19 15:06:40 endless kernel: drm_ioctl_kernel+0xb0/0x100 [drm]
Apr 19 15:06:40 endless kernel: drm_ioctl+0x233/0x410 [drm]
Apr 19 15:06:40 endless kernel: ? drm_mode_getcrtc+0x1a0/0x1a0 [drm]
Apr 19 15:06:40 endless kernel: amdgpu_drm_ioctl+0x4f/0x80 [amdgpu]
Apr 19 15:06:40 endless kernel: do_vfs_ioctl+0xa9/0x640
Apr 19 15:06:40 endless kernel: ? tomoyo_file_ioctl+0x19/0x20
Apr 19 15:06:40 endless kernel: ksys_ioctl+0x67/0x90
Apr 19 15:06:40 endless kernel: __x64_sys_ioctl+0x1a/0x20
Apr 19 15:06:40 endless kernel: do_syscall_64+0x5a/0x110
Apr 19 15:06:40 endless kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 19 15:06:40 endless kernel: RIP: 0033:0x7f36f7126777
Apr 19 15:06:40 endless kernel: Code: Bad RIP value.
Apr 19 15:06:40 endless kernel: RSP: 002b:00007ffeb62a80d8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
Apr 19 15:06:40 endless kernel: RAX: ffffffffffffffda RBX: 00007ffeb62a8110 RCX: 00007f36f7126777
Apr 19 15:06:40 endless kernel: RDX: 00007ffeb62a8110 RSI: 00000000c06864a2 RDI: 000000000000000d
Apr 19 15:06:40 endless kernel: RBP: 00007ffeb62a8110 R08: 0000000000000000 R09: 00005652f3eb9510
Apr 19 15:06:40 endless kernel: R10: 00007ffeb62a81d0 R11: 0000000000003246 R12: 00000000c06864a2
Apr 19 15:06:40 endless kernel: R13: 000000000000000d R14: 0000000000000000 R15: 00005652f3eb9510
Vega56 Ryzen 2700x Kernel 5.0.3 Mesa latest master git libdrm latest master git llvm 8 I have the same problem then I use DXVK for the free version of Assasin Creed. [ 3137.670744] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=191619, emitted seq=191621 [ 3137.670765] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process ACU.exe pid 8085 thread ACU.exe:cs0 pid 8118 [ 3137.670767] amdgpu 0000:1f:00.0: GPU reset begin! [ 3147.900752] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or flip_done timed out I am having very similar issues and see similar errors in logs. The most recent error was: kernel: amdgpu 0000:06:00.0: [gfxhub] no-retry page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 1301 thread Xorg:cs0 pid 1362) kernel: amdgpu 0000:06:00.0: in page starting at address 0x0000800108a18000 from 27 kernel: amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 The laptop is then unusable and requires a hard reboot. Linux Mint 19.1 Kernel 5.1.0 AMD Ryzen PRO 2700U with Vega 10 graphics Trying to load cities skylines is a guaranteed crash. This is probably related to bug 102322, yes? Created attachment 144900 [details]
Thinkpad E585 log file with amdgpu errors
I'm running into an issue that I think is related to this. Attached a journal file containing the traces from the last boot where it occurred. For some reason, it doesn't happen every time I try to resume from suspend, but when it does I have no choice but to hard reboot. This is a Thinkpad E585, uname -a "Linux thonkpad 5.2.3-arch1-1-ARCH #1 SMP PREEMPT Fri Jul 26 08:13:47 UTC 2019 x86_64 GNU/Linux"
The patch is on it's way https://bugs.freedesktop.org/show_bug.cgi?id=110258#c12 (In reply to Eugene Bright from comment #10) > The patch is on it's way > https://bugs.freedesktop.org/show_bug.cgi?id=110258#c12 I tried the patch upon Linux stable 5.2.8. It fixed this issue. Thank you so much! *** This bug has been marked as a duplicate of bug 110258 *** Hello. please, explain. Why I work fine with FX-8320 CPU, but after Ryzen r5 1600 upgrade, I see this OS freezes and bug? is pcie generation any cause? planned obsolescence? or coincidence with amdgpu driver update? part of my log: [49266.138534] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=5660155, emitted seq=5660157 [49266.138578] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Civ6Sub pid 1778 thread Civ6Sub:cs0 pid 1781 [49266.138580] [drm] GPU recovery disabled. [49275.866518] INFO: task Xorg:sh1:1789 blocked for more than 122 seconds. [49275.866521] Tainted: G R O 5.2.10 #2 radeon 7970. mesa utils(8.4.0-1) linux 5.2.10 amdgpu Version: 18.1.99+git20190207-1 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.