Bug 110928 - wx5100 gpu crash
Summary: wx5100 gpu crash
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: All Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-17 06:56 UTC by baopeng
Modified: 2019-11-19 09:31 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
gstack info (20.58 KB, text/plain)
2019-06-17 06:58 UTC, baopeng
no flags Details
situation_1_dmesg (58.21 KB, text/plain)
2019-06-17 07:01 UTC, baopeng
no flags Details

Description baopeng 2019-06-17 06:56:34 UTC
When we used wx5100 for rendering and encoding, we encountered some gpu hangs. This situation is very bad and must be resolved by rebooting. The log information is as follows. Please help analyze, thank you very much.
situation 1:
2019-06-16T14:39:24.708544+08:00|err|kernel[-]|[398383.549799] amdgpu 0005:01:00.0: GPU fault detected: 146 0x04203d0c for process a.babycard.ssvs pid 330210 thread RenderThread pid 330511
2019-06-16T14:39:24.708703+08:00|err|kernel[-]|[398383.549803] amdgpu 0005:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00102184
2019-06-16T14:39:24.708812+08:00|err|kernel[-]|[398383.549805] amdgpu 0005:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D03D014
2019-06-16T14:39:24.708908+08:00|err|kernel[-]|[398383.549809] amdgpu 0005:01:00.0: VM fault (0x14, vmid 6, pasid 33627) at page 1057156, write from 'SDM1' (0x53444d31) (61)

After the GPU fault, about 17 seconds later:

2019-06-16T14:39:41.924400+08:00|err|kernel[-]|[398400.765123] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vce0 timeout, signaled seq=3868950, emitted seq=3868954
2019-06-16T14:39:41.924463+08:00|info|kernel[-]|[398400.765132] [drm] GPU recovery disabled.

situation 2:
[Thu Jun  6 22:00:14 2019] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=919191055, emitted seq=919191057
[Thu Jun  6 22:00:14 2019] [drm] GPU recovery disabled.
[Thu Jun  6 22:00:16 2019] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=101603699, emitted seq=101603701
[Thu Jun  6 22:00:16 2019] [drm] GPU recovery disabled.

situation 3:
2019-06-16T14:59:05.248325+08:00|err|kernel[-]|[399194.411704] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=230984670, emitted seq=230984673
2019-06-16T14:59:05.248404+08:00|info|kernel[-]|[399194.411708] [drm] GPU recovery disabled.

can you help me to analyze these situations to solve these problems? thank you.
Comment 1 baopeng 2019-06-17 06:58:02 UTC
Created attachment 144567 [details]
gstack info
Comment 2 baopeng 2019-06-17 07:01:13 UTC
Created attachment 144568 [details]
situation_1_dmesg

situation 1 dmesg
Comment 3 Martin Peres 2019-11-19 09:31:27 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/829.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.