Bug 111807 - [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout cause process into Disk sleep state
Summary: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout cause process in...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: ARM Linux (All)
: not set major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 111808 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-09-25 02:38 UTC by shouzhe
Modified: 2019-11-19 09:55 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
timeoutlog (30.15 MB, application/x-zip-compressed)
2019-09-25 02:38 UTC, shouzhe
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description shouzhe 2019-09-25 02:38:57 UTC
Created attachment 145506 [details]
timeoutlog

We ran into some gfx timeout problems.
Currently, we use the kernel of 4.19.36. We merged some patches regarding GPU from the community. There are multiple GPUs on each server, and each GPU is running some rendering programs. Now, there are 2 different cases of failures.
The first one is that one graphics card of a server fails, rendering program does not have a D state, and it shows error code 110 tested by /sys/kernel/debug/dri/1/amdgpu_test_ib, then shows pass after a second test. See tmp-618-2.zip for details.
The second one is that one graphics card of a server fails, the whole rendering program running on the server fails and has D state. It fails at drm_release. See tmp-619.zip for details.
Could you please help us out?
Comment 1 Andre Klapper 2019-09-25 08:26:22 UTC
*** Bug 111808 has been marked as a duplicate of this bug. ***
Comment 2 Martin Peres 2019-11-19 09:55:37 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/918.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.