Bug 112174 - AMD Radeon 5700 / Navi: amdgpu.gpu_recovery not working
Summary: AMD Radeon 5700 / Navi: amdgpu.gpu_recovery not working
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-30 07:57 UTC by KLingel
Modified: 2019-11-19 09:59 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description KLingel 2019-10-30 07:57:41 UTC
I have set "amdgpu.gpu_recovery=1" in my kernel boot params. When my GPU is crashing, recovery does not work.

Syslog:
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=1935, emitted seq=1937
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1861 thread Xorg:cs0 pid 1864
 amdgpu 0000:45:00.0: GPU reset begin!
[drm] ring test on 10 succeeded in 22 usecs
[drm] ring test on 10 succeeded in 29 usecs
amdgpu 0000:45:00.0: GPU reset succeeded, trying to resume
[drm] PCIE GART of 512M enabled (table at 0x00000080001E8000).
[drm] PSP is resuming...
[drm] reserve 0x7200000 from 0x81f7c00000 for PSP TMR
amdgpu: [powerplay] SMU is resuming...
amdgpu: [powerplay] SMU is resumed successfully!
[drm] kiq ring mec 2 pipe 1 q 0
[drm] ring test on 10 succeeded in 33 usecs
[drm] ring test on 10 succeeded in 8 usecs
[drm] gfx 0 ring me 0 pipe 0 q 0
[drm:gfx_v10_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v10_0> failed -22
amdgpu 0000:45:00.0: GPU reset(1) failed
amdgpu 0000:45:00.0: GPU reset end with ret = -22
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=1937, emitted seq=1937
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1861 thread Xorg:cs0 pid 1864
amdgpu 0000:45:00.0: GPU reset begin!


GPU recovery is really important, especially at the moment with the current state of navi stability issues.
Please fix and enable recovery as default.
Comment 1 Martin Peres 2019-11-19 09:59:25 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/948.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.