Bug 107689 - System freezes on shutdown. [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)
Summary: System freezes on shutdown. [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disab...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium blocker
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-26 08:52 UTC by john-s-84
Modified: 2019-11-19 08:49 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
full dmesg log (includes closing lit) (80.67 KB, text/plain)
2018-08-27 19:35 UTC, john-s-84
no flags Details
0001-drm-amdgpu-Only-retrieve-GPU-address-of-GART-table-a.patch (3.09 KB, patch)
2018-08-28 15:50 UTC, Andrey Grodzovsky
no flags Details | Splinter Review
dmesg_applied_patch_0001-drm-amdgpu-Only-retrieve-GPU-address-of-GART-table (18.02 KB, text/x-log)
2018-08-29 20:59 UTC, john-s-84
no flags Details

Description john-s-84 2018-08-26 08:52:15 UTC
Common shutdown seems to be ok. After any suspend (pressing stand-by button or close lit), I am not able to shutdown successful. The system hangs on shutdown.

Sorry, for double posting. I do not know, the right place for these issue.

https://lists.freedesktop.org/archives/amd-gfx/2018-August/025818.html


Error-Log:


This is the bad output: Shutdown after pressing stand-by button:

[  294.651066] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)


This is the bad output: Closing the lit:

[   71.696123] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD)
[   71.696378] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -22
[   71.696421] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-22).
[   87.431032] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)
[   87.521991] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)
Comment 1 john-s-84 2018-08-26 08:59:27 UTC
This log is after the suspend:

[ 3091.111509] amdgpu: [powerplay] 
[ 3091.600889] amdgpu: [powerplay] 
[ 3092.085174] amdgpu: [powerplay] 
[ 3092.547237] amdgpu: [powerplay] 
[ 3093.005634] amdgpu: [powerplay] 
[ 3093.464712] amdgpu: [powerplay] 
[ 3093.928337] amdgpu: [powerplay] 
[ 3094.391342] amdgpu: [powerplay] 
[ 3094.850455] amdgpu: [powerplay] 
[ 3095.774119] amdgpu: [powerplay] 
[ 3096.239871] amdgpu: [powerplay] 
[ 3096.711250] amdgpu: [powerplay] 
[ 3097.172032] amdgpu: [powerplay] 
[ 3097.631737] amdgpu: [powerplay] 
[ 3098.090704] amdgpu: [powerplay] 
[ 3098.550846] amdgpu: [powerplay] 
[ 3099.013701] amdgpu: [powerplay] 
[ 3099.476970] amdgpu: [powerplay] 
[ 3099.941099] amdgpu: [powerplay] 
[ 3100.404389] amdgpu: [powerplay] 
[ 3100.867675] amdgpu: [powerplay] 
[ 3101.326540] amdgpu: [powerplay] 
[ 3101.785426] amdgpu: [powerplay] 
[ 3102.709452] amdgpu: [powerplay] 
[ 3103.167897] amdgpu: [powerplay] 
[ 3104.091096] amdgpu: [powerplay] 
[ 3104.554677] amdgpu: [powerplay] 
[ 3105.018251] amdgpu: [powerplay] 
[ 3105.481543] amdgpu: [powerplay] 
[ 3106.397859] amdgpu: [powerplay] 
[ 3106.859070] amdgpu: [powerplay] 
[ 3107.319476] amdgpu: [powerplay] 
[ 3107.778301] amdgpu: [powerplay] 
[ 3108.696209] amdgpu: [powerplay] 
[ 3109.155393] amdgpu: [powerplay] 
[ 3110.071528] amdgpu: [powerplay] 
[ 3110.529575] amdgpu: [powerplay] 
[ 3110.989401] amdgpu: [powerplay] 
[ 3111.447587] amdgpu: [powerplay] 
[ 3112.363547] amdgpu: [powerplay] 
[ 3112.824429] amdgpu: [powerplay] 
[ 3113.741639] amdgpu: [powerplay] 
[ 3114.202019] amdgpu: [powerplay] 
[ 3114.665248] amdgpu: [powerplay] 
[ 3115.127638] amdgpu: [powerplay] 
[ 3116.045559] amdgpu: [powerplay] 
[ 3116.503855] amdgpu: [powerplay] 
[ 3116.966372] amdgpu: [powerplay] 
[ 3117.424757] amdgpu: [powerplay] 
[ 3118.351174] amdgpu: [powerplay] 
[ 3118.814380] amdgpu: [powerplay] 
[ 3119.740415] amdgpu: [powerplay] 
[ 3120.204136] amdgpu: [powerplay] 
[ 3120.414945] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD)
[ 3120.414962] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -22
[ 3120.414978] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-22).
[ 3125.907682] amdgpu: [powerplay] 
[ 3126.371235] amdgpu: [powerplay] 
[ 3127.636961] amdgpu: [powerplay] 
[ 3128.109298] amdgpu: [powerplay] 
[ 3129.044901] amdgpu: [powerplay] 
[ 3129.504295] amdgpu: [powerplay] 
[ 3130.429692] amdgpu: [powerplay] 
[ 3130.893790] amdgpu: [powerplay] 
[ 3131.815757] amdgpu: [powerplay] 
[ 3132.274641] amdgpu: [powerplay] 
[ 3133.193550] amdgpu: [powerplay] 
[ 3133.651888] amdgpu: [powerplay] 
[ 3134.568326] amdgpu: [powerplay] 
[ 3135.028265] amdgpu: [powerplay] 
[ 3135.957009] amdgpu: [powerplay] 
[ 3136.437691] amdgpu: [powerplay] 
[ 3136.658150] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)
[ 3136.875305] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)
[ 3137.092574] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)
[ 3137.314115] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)
[ 3137.540010] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)
[ 3137.765326] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)
[ 3137.982629] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)
[ 3138.199617] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD)
[ 3138.678639] amdgpu: [powerplay] 
[ 3139.171428] amdgpu: [powerplay] 
[ 3139.668258] amdgpu: [powerplay] 
[ 3140.164675] amdgpu: [powerplay] 
[ 3140.657299] amdgpu: [powerplay] 
[ 3141.151370] amdgpu: [powerplay] 
[ 3142.138768] amdgpu: [powerplay] 
[ 3142.602942] amdgpu: [powerplay] 
[ 3143.062798] amdgpu: [powerplay] 
[ 3143.521840] amdgpu: [powerplay] 
[ 3144.442769] amdgpu: [powerplay] 
[ 3144.906261] amdgpu: [powerplay] 
[ 3145.858408] amdgpu: [powerplay] 
[ 3146.320076] amdgpu: [powerplay]
Comment 2 john-s-84 2018-08-27 19:35:34 UTC
Created attachment 141300 [details]
full dmesg log (includes closing lit)
Comment 3 Andrey Grodzovsky 2018-08-27 20:54:52 UTC
(In reply to john-s-84 from comment #2)
> Created attachment 141300 [details]
> full dmesg log (includes closing lit)

I tried to reproduce the issues you report using Lexa card but encountered  other, different bugs. But using Baffin ASIC i was able to reproduce something that looks what you experience. Will investigate.
Comment 4 Andrey Grodzovsky 2018-08-28 15:50:08 UTC
Created attachment 141310 [details] [review]
0001-drm-amdgpu-Only-retrieve-GPU-address-of-GART-table-a.patch

Please try with our latest kernel from here 
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next + the attached patch on top.
Also just to be sure try to use latest firmware for amdgpu from here 
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/amdgpu

P.S Don't forget to update your initramfs after you copy the firmware files to your /lib/firmware/XXX locatiotion
Comment 5 john-s-84 2018-08-29 20:59:18 UTC
Created attachment 141365 [details]
dmesg_applied_patch_0001-drm-amdgpu-Only-retrieve-GPU-address-of-GART-table

Tried the amd-staging-drm-next kernel, firmware and applied your patch. The patch does not have any effect. Shutdown still does not work. Please see the dmesg log.
Comment 6 Andrey Grodzovsky 2018-08-30 16:05:55 UTC
I noticed amdgpu 0000:01:00.0: GPU pci config reset print, long before the suspend. Did you manually trigger device reset before the suspend ?
Comment 7 john-s-84 2018-08-31 18:52:55 UTC
What I have done:
- Applied the patches
- Pressed the power on button for suspend
- Pressed the power on button again for resume
- Copy and paste the dmesg log


How critical are these errors? Respectively, which error causes the error state?

1. amdgpu: [powerplay] Voltage value looks like a Leakage ID but it's not patched

2. amdgpu: [powerplay] failed to send message 254 ret is 0

3. [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -22
[drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-22).
[drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 9 test failed (scratch(0xC040)=0xCAFEDEAD)
[drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed

4. amdgpu 0000:01:00.0: kfd not supported on this ASIC

5. amdgpu: [powerplay] Failed to retrieve minimum clocks.
amdgpu: [powerplay] Error in phm_get_clock_info 

6. [drm:dc_create [amdgpu]] *ERROR* DC: Number of connectors is zero!
Comment 8 Martin Peres 2019-11-19 08:49:23 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/491.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.