Summary: | GPU Recovery + DC deadlock | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Bas Nieuwenhuizen <bas> | ||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||
Status: | RESOLVED MOVED | QA Contact: | |||||||
Severity: | normal | ||||||||
Priority: | medium | CC: | andrey.grodzovsky, bas, harry.wentland | ||||||
Version: | unspecified | ||||||||
Hardware: | Other | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
Bas Nieuwenhuizen
2018-05-13 11:38:43 UTC
Created attachment 139562 [details]
Quick try to avoid deadlock
Can u give this a quick try and seed if it helps ?
Created attachment 139568 [details]
dmesg after trying 139562
I tried the patch and as expected we do not deadlock at the original places since we don't call those anymore. But I get garbage on my display (possibly expected due to loss of VRAM), can't switch VT and stopping X hangs X.
Furthermore I eventually still get stuck fence waits in dmesg (attached).
Furthermore, it seems the UVDF ringtest fails.
(In reply to Bas Nieuwenhuizen from comment #2) > Created attachment 139568 [details] > dmesg after trying 139562 > > I tried the patch and as expected we do not deadlock at the original places > since we don't call those anymore. But I get garbage on my display (possibly > expected due to loss of VRAM), can't switch VT and stopping X hangs X. > > Furthermore I eventually still get stuck fence waits in dmesg (attached). > > Furthermore, it seems the UVDF ringtest fails. I think indeed the garbage is due to VRAM lost, maybe we don't create a shadow BO for the display's BO. GPU reset fails due to UVD failure to resume and SMU failure so I believe that why any further fence submission hangs. The pipe never recovers. Harry, check the patch I attached, no reason to call drm_atomic_helper_resume/suspend explicitly from amdgpu_device_gpu_recover - First of all it's already being called from the display code from amd_ip_funcs.suspend/resume hooks. Second of all, the place in amdgpu_device_gpu_recover it's being called is wrong for GPU stalls since it is called BEFORE we cancel and force completion of all in flight jobs which are stuck on the GPU. So as Bas explained it will try to wait for fence in amdgpu_pm_compute_clocks but the pipe is hanged so we end up in deadlock. If we call the mode set AFTER forceful completion (as the patch makes happen) no deadlock will happen. UVD/SMU failures require further debugging but I am on a different task at the moment so maybe some one can pick this up... Do you remember why that code is there ? I think it's remains of old code. If you OK with this patch I will send it for review. Further -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/385. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.