Summary: | [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | yanhua <78666679> | ||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||
Status: | RESOLVED INVALID | QA Contact: | |||||||
Severity: | major | ||||||||
Priority: | not set | CC: | 78666679, christian.koenig | ||||||
Version: | XOrg git | ||||||||
Hardware: | ARM | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
yanhua
2019-09-03 13:40:26 UTC
Created attachment 145253 [details]
dmesg output
grep drm dmesg.txt. there are sdma1 ring timout
Created attachment 145260 [details]
The previous dmesg.txt has messages been overwriten. from the dmesg-full.txt can see more information
As far as I can see this is a really large box with multiple GPUs installed. The SDMA rarely locks up, especially not while executing page table updates. So there is most likely something wrong with the hardware here. Are you sure that the power supply is large enough for that system? What system/platform is that? Could this be a coherency problem? I have asked hardware team, they have tested, and can be sure there are no power supply problem. The system is arm64 with 64 cores. and there are three amdgpu card in the board. there are rarely gfx timeout, sdma timeout, and vce timeout. When the ring timeout occur, we can use amd supplied tools umr to read chip registers. can we know the real cause from the register value? with the coherency problem you said, I think if that was true. the problem should occur more frequently. I'm not sure. amdgpu is known to not work on arm64 until very recently. So it is not a supprise that this isn't working. Please switch to a newer kernel and re-test. Apart from that there isn't much we can do about it. As far as I know, arm64 does not support wc memory. and We have already turn the wc flag as newer kernel version does. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.