Bug 110886 - After S3 resume, kernel: [drm] psp command failed and response status is (-65529) at 27th time of S3. 28th time of S3 freeze the system.
Summary: After S3 resume, kernel: [drm] psp command failed and response status is (-65...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-11 06:17 UTC by Kai-Heng Feng
Modified: 2019-10-06 18:38 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Full kernel log (210.62 KB, text/plain)
2019-06-11 06:18 UTC, Kai-Heng Feng
no flags Details
Another kind of fail (370.40 KB, text/plain)
2019-06-11 09:06 UTC, Kai-Heng Feng
no flags Details
failed log when iommu is disabled. (357.82 KB, text/plain)
2019-08-13 08:22 UTC, Kai-Heng Feng
no flags Details
amd-staging-drm-net dmesg log (102.74 KB, text/plain)
2019-08-17 22:25 UTC, Samantha McVey
no flags Details
amd-staging-drm-next xorg log (48.71 KB, text/plain)
2019-08-17 22:25 UTC, Samantha McVey
no flags Details
journalctl last boot kernel message (433.98 KB, text/plain)
2019-09-28 18:05 UTC, Kai-Heng Feng
no flags Details
PSP failed with drm.debug=1 (2.27 MB, text/plain)
2019-10-06 18:37 UTC, Kai-Heng Feng
no flags Details
ring test failed with drm.debug=1 (2.21 MB, text/plain)
2019-10-06 18:38 UTC, Kai-Heng Feng
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kai-Heng Feng 2019-06-11 06:17:40 UTC
System: HP ProBook 645 G4
APU: Ryzen 3 PRO 2300U

After system S3 resume, the system may freeze:

Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:57:crtc-0] flip_done timed out
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:57:crtc-0] flip_done timed out
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:65:eDP-1] flip_done timed out
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:50:plane-3] flip_done timed out
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: WARNING: CPU: 1 PID: 1058 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5580 amdgpu_dm_atomic_commit_tail+0x19f4/0x1a80 [amdgpu]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: Modules linked in: ccm nls_iso8859_1 amdgpu snd_hda_codec_conexant arc4 iwlmvm snd_hda_codec_generic amd_iommu_v2 ledtrig_audio snd_hda_codec_hdmi gpu_sched kvm_amd snd_hda_intel i2c_
algo_bit snd_hda_codec ccp ttm snd_hwdep kvm snd_hda_core drm_kms_helper mac80211 snd_pcm irqbypass syscopyarea snd_seq sysfillrect iwlwifi snd_timer sysimgblt snd_seq_device snd fb_sys_fops drm crct10dif_pclmul crc32_pclmul so
undcore cfg80211 ghash_clmulni_intel rtsx_pci_ms aesni_intel hp_wmi sparse_keymap k10temp wmi_bmof memstick aes_x86_64 ucsi_acpi glue_helper hp_accel typec_ucsi typec crypto_simd cryptd video hp_wireless wmi joydev input_leds l
is3lv02d mac_hid input_polldev serio_raw sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 rtsx_pci_sdmmc psmouse i2c_piix4 ahci rtsx_pci libahci r8169 realtek i2c_hid hid
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: CPU: 1 PID: 1058 Comm: kworker/u32:6 Not tainted 5.2.0-rc1+ #2
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: Hardware name: HP HP ProBook 645 G4/8401, BIOS Q82 Ver. 01.07.01 05/06/2019
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: Workqueue: events_unbound async_run_entry_fn
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x19f4/0x1a80 [amdgpu]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: Code: ff ff 8b b0 90 04 00 00 48 c7 c7 61 bc bf c0 e8 c2 0a b5 ff 0f b6 85 06 fe ff ff 88 85 08 fe ff ff 49 8b 45 08 e9 f9 f1 ff ff <0f> 0b e9 1d f3 ff ff 0f 0b 48 8b 06 0f b6 8e e0 0
1 00 00 bf 04 00
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: RSP: 0018:ffffb1e4c243b8e0 EFLAGS: 00010002
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: RAX: 0000000000000002 RBX: 0000000000000202 RCX: ffff9a8fd18b6970
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffff9a8fd02a5958
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: RBP: ffffb1e4c243bb20 R08: ffffb1e4c243b7f4 R09: 0000000000000000
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: R10: 0000000000000000 R11: ffffb1e4c243b838 R12: ffff9a8fe2ba0400
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: R13: ffff9a8fe1495f80 R14: ffff9a8fd18b6800 R15: ffff9a8fd2280000
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: FS:  0000000000000000(0000) GS:ffff9a8fe7c40000(0000) knlGS:0000000000000000
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: CR2: 0000000000000000 CR3: 000000020f434000 CR4: 00000000003406e0
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: Call Trace:
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  commit_tail+0x42/0x70 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ? commit_tail+0x42/0x70 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_atomic_helper_commit+0x113/0x120 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  amdgpu_dm_atomic_commit+0xb1/0xf0 [amdgpu]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_atomic_commit+0x4a/0x50 [drm]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  restore_fbdev_mode_atomic+0x1bf/0x1d0 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  restore_fbdev_mode+0x4e/0x160 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ? _cond_resched+0x19/0x30
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_fb_helper_restore_fbdev_mode_unlocked+0x4e/0xa0 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_fb_helper_hotplug_event.part.41+0x97/0xc0 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_kms_helper_hotplug_event+0x2a/0x40 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  amdgpu_device_resume+0x319/0x3a0 [amdgpu]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  amdgpu_pmops_resume+0x31/0x60 [amdgpu]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  pci_pm_resume+0x6d/0xa0
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ? pci_pm_suspend_late+0x40/0x40
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  dpm_run_callback+0x5b/0x150
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  device_resume+0xb8/0x200
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  async_resume+0x1d/0x30
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  async_run_entry_fn+0x3c/0x150
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  process_one_work+0x20f/0x410
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  worker_thread+0x34/0x400
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  kthread+0x120/0x140
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ? process_one_work+0x410/0x410
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ? __kthread_parkme+0x70/0x70
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ret_from_fork+0x22/0x40
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: ---[ end trace 55daf5798b2f5f1a ]---

Test conducted on latest amdgpu/amd-staging-drm-next, it's commit 40cc64619a2580b26f924bcabdefd555e7831a14 as of now.
Comment 1 Kai-Heng Feng 2019-06-11 06:18:11 UTC
Created attachment 144498 [details]
Full kernel log
Comment 2 Kai-Heng Feng 2019-06-11 09:06:27 UTC
Created attachment 144502 [details]
Another kind of fail

Jun 11 03:02:41 u-HP-ProBook-645-G4 kernel: [drm] psp command failed and response status is (-65529)
Comment 3 Alex Deucher 2019-07-05 16:01:08 UTC
Is this a regression?  If so, can you bisect?
Comment 4 Kai-Heng Feng 2019-07-05 16:19:28 UTC
(In reply to Alex Deucher from comment #3)
> Is this a regression?  If so, can you bisect?
No this is not a regression.

This issue (S3 resume fail) also happens on previous kernel versions, but without any stack trace logged.
On amd-staging-drm-next we can observe the same issue and a stacktrace.
Comment 5 Alex Deucher 2019-08-08 05:54:12 UTC
Does disabling the IOMMU help?
Comment 6 Kai-Heng Feng 2019-08-13 08:22:28 UTC
Created attachment 145044 [details]
failed log when iommu is disabled.
Comment 7 Kai-Heng Feng 2019-08-13 08:26:33 UTC
I also tried disabling GFXOFF but the same issue still happens:
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
index a24beaa4fb01..62a8394b1f5f 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
@@ -173,6 +173,7 @@ int hwmgr_early_init(struct pp_hwmgr *hwmgr)
        case AMDGPU_FAMILY_RV:
                switch (hwmgr->chip_id) {
                case CHIP_RAVEN:
+                       hwmgr->feature_mask &= ~PP_GFXOFF_MASK;
                        hwmgr->od_enabled = false;
                        hwmgr->smumgr_funcs = &smu10_smu_funcs;
                        smu10_init_function_pointers(hwmgr);
Comment 8 Andrey Grodzovsky 2019-08-13 18:41:31 UTC
(In reply to Kai-Heng Feng from comment #6)
> Created attachment 145044 [details]
> failed log when iommu is disabled.

What was the failur ewith IOMMU disabled ? Is it the same as with IOMMU enabled ?
In the log I only see PSP errors on resume. Can you confirm that the only failure/error you observed in the log in that use case ?

Can you please provide your FW versions by 
cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
Comment 9 Kai-Heng Feng 2019-08-14 04:10:45 UTC
(In reply to Andrey Grodzovsky from comment #8)
> (In reply to Kai-Heng Feng from comment #6)
> > Created attachment 145044 [details]
> > failed log when iommu is disabled.
> 
> What was the failur ewith IOMMU disabled ?
Blanked screen. Graphics no longer works.

>Is it the same as with IOMMU enabled ?
Yes.

> In the log I only see PSP errors on resume. Can you confirm that the only
> failure/error you observed in the log in that use case ?
Yes. I haven't seen 
"[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:57:crtc-0] flip_done timed out"
for a while.

Now it always shows PSP fail.

> 
> Can you please provide your FW versions by 
> cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 40, firmware version: 0x00000099
PFP feature version: 40, firmware version: 0x000000ae
CE feature version: 40, firmware version: 0x0000004d
RLC feature version: 1, firmware version: 0x00000213
RLC SRLC feature version: 1, firmware version: 0x00000001
RLC SRLG feature version: 1, firmware version: 0x00000001
RLC SRLS feature version: 1, firmware version: 0x00000001
MEC feature version: 40, firmware version: 0x0000018b
MEC2 feature version: 40, firmware version: 0x0000018b
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 0, firmware version: 0x001ad4d4
TA XGMI feature version: 0, firmware version: 0x00000000
TA RAS feature version: 0, firmware version: 0x00000000
SMC feature version: 0, firmware version: 0x00001e4f
SDMA0 feature version: 41, firmware version: 0x000000a9
VCN feature version: 0, firmware version: 0x0110901c
DMCU feature version: 0, firmware version: 0x00000000
VBIOS version: SWBRT32481.001
Comment 10 Samantha McVey 2019-08-14 06:59:12 UTC
I am getting this same issue (at least I believe the same). It is in the 5.2 series but not in the 5.1 series of the kernel. If needed I can post my logs. I have Lenovo A485 w/ 2700U
Comment 11 Kai-Heng Feng 2019-08-15 14:17:53 UTC
(In reply to Samantha McVey from comment #10)
> I am getting this same issue (at least I believe the same). It is in the 5.2
> series but not in the 5.1 series of the kernel. If needed I can post my
> logs. I have Lenovo A485 w/ 2700U

Can you please build a kernel from branch [1], reproduce the issue, and attach `journalctl -b -1 -k` so we can check if is really a same issue.

[1] https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
Comment 12 Kai-Heng Feng 2019-08-15 14:26:57 UTC
> Now it always shows PSP fail.
I've dug up more info about this issue. It always times out in psp_cmd_submit_buf(). Particularly, this code section:

	while (*((unsigned int *)psp->fence_buf) != index) {
		if (--timeout == 0)
			break;
		msleep(1);
	}

psp->fence_buf stuck at 406 and index stuck at 407 and it eventually times out.
This _always_ happens at 27th time of S3, and freeze the whole system at 28th S3 attempt.
Comment 13 Samantha McVey 2019-08-17 22:25:21 UTC
Created attachment 145085 [details]
amd-staging-drm-net dmesg log
Comment 14 Samantha McVey 2019-08-17 22:25:53 UTC
Created attachment 145086 [details]
amd-staging-drm-next xorg log
Comment 15 Samantha McVey 2019-08-17 22:27:04 UTC
I have uploaded my dmesg log and xorg log from amd-staging-drm-next
Comment 16 Kai-Heng Feng 2019-08-19 08:52:30 UTC
(In reply to Samantha McVey from comment #13)
> Created attachment 145085 [details]
> amd-staging-drm-net dmesg log

Doesn't look like the same one.
Comment 17 Alex Deucher 2019-09-26 13:59:38 UTC
Does this system support conventional S3 or is it a reduced ACPI platform that only supports suspend to idle?
Comment 18 Kai-Heng Feng 2019-09-26 15:02:53 UTC
(In reply to Alex Deucher from comment #17)
> Does this system support conventional S3 or is it a reduced ACPI platform
> that only supports suspend to idle?

This system defaults to S3, and the issue happens under S3. Is there any first gen Raven Ridge supports s2idle?
Comment 19 Andrey Grodzovsky 2019-09-26 18:11:45 UTC
(In reply to Kai-Heng Feng from comment #9)
> (In reply to Andrey Grodzovsky from comment #8)
> > (In reply to Kai-Heng Feng from comment #6)
> > > Created attachment 145044 [details]
> > > failed log when iommu is disabled.
> > 
> > What was the failur ewith IOMMU disabled ?
> Blanked screen. Graphics no longer works.
> 
> >Is it the same as with IOMMU enabled ?
> Yes.
> 
> > In the log I only see PSP errors on resume. Can you confirm that the only
> > failure/error you observed in the log in that use case ?
> Yes. I haven't seen 
> "[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
> [CRTC:57:crtc-0] flip_done timed out"
> for a while.
> 
> Now it always shows PSP fail.
> 
> > 
> > Can you please provide your FW versions by 
> > cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
> VCE feature version: 0, firmware version: 0x00000000
> UVD feature version: 0, firmware version: 0x00000000
> MC feature version: 0, firmware version: 0x00000000
> ME feature version: 40, firmware version: 0x00000099
> PFP feature version: 40, firmware version: 0x000000ae
> CE feature version: 40, firmware version: 0x0000004d
> RLC feature version: 1, firmware version: 0x00000213
> RLC SRLC feature version: 1, firmware version: 0x00000001
> RLC SRLG feature version: 1, firmware version: 0x00000001
> RLC SRLS feature version: 1, firmware version: 0x00000001
> MEC feature version: 40, firmware version: 0x0000018b
> MEC2 feature version: 40, firmware version: 0x0000018b
> SOS feature version: 0, firmware version: 0x00000000
> ASD feature version: 0, firmware version: 0x001ad4d4
> TA XGMI feature version: 0, firmware version: 0x00000000
> TA RAS feature version: 0, firmware version: 0x00000000
> SMC feature version: 0, firmware version: 0x00001e4f
> SDMA0 feature version: 41, firmware version: 0x000000a9
> VCN feature version: 0, firmware version: 0x0110901c
> DMCU feature version: 0, firmware version: 0x00000000
> VBIOS version: SWBRT32481.001

Can you please confirm the issue happens regardless of graphic enabled, load system in console mode and verify you still observe the problem.(In reply to Kai-Heng Feng from comment #12)
> > Now it always shows PSP fail.
> I've dug up more info about this issue. It always times out in
> psp_cmd_submit_buf(). Particularly, this code section:
> 
> 	while (*((unsigned int *)psp->fence_buf) != index) {
> 		if (--timeout == 0)
> 			break;
> 		msleep(1);
> 	}
> 
> psp->fence_buf stuck at 406 and index stuck at 407 and it eventually times
> out.
> This _always_ happens at 27th time of S3, and freeze the whole system at
> 28th S3 attempt.

Does it happen also when no acceleration in system - i mean if you do S3 cycles from console mode ?
Comment 20 Kai-Heng Feng 2019-09-26 18:29:39 UTC
(In reply to Andrey Grodzovsky from comment #19) 
> Can you please confirm the issue happens regardless of graphic enabled, load
> system in console mode and verify you still observe the problem.

I guess you mean without graphical session? Yes I already tested that.
1. If amdgpu.ko is loaded, the issue happens under both console or graphical session.
2. If amdgpu.ko is not loaded, the issue doesn't happen regardless of console or graphical session.

> Does it happen also when no acceleration in system - i mean if you do S3
> cycles from console mode ?

Please refer to the point 2 above.
Comment 21 Andrey Grodzovsky 2019-09-26 18:32:34 UTC
In fact please rebase latest drm-next from here - https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next, there are 2 changes by Alex in communication with PSP with might help 

drm/amdgpu/psp: invalidate the hdp read cache before reading the psp response   
drm/amdgpu/psp: flush HDP write fifo after submitting cmds to the psp  

See if the PSP errors go away with that.
Comment 22 Kai-Heng Feng 2019-09-28 18:04:24 UTC
(In reply to Andrey Grodzovsky from comment #21)
> In fact please rebase latest drm-next from here -
> https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next, there
> are 2 changes by Alex in communication with PSP with might help 
> 
> drm/amdgpu/psp: invalidate the hdp read cache before reading the psp
> response   
> drm/amdgpu/psp: flush HDP write fifo after submitting cmds to the psp  
> 
> See if the PSP errors go away with that.

The slightly different error message still popped out after 27th S3, and 28th S3 attempt froze the system:
Sep 28 05:38:44 u-HP-ProBook-645-G4 kernel: [drm:psp_hw_start.cold [amdgpu]] *ERROR* PSP load asd failed!
Sep 28 05:38:44 u-HP-ProBook-645-G4 kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
Sep 28 05:38:44 u-HP-ProBook-645-G4 kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22
Sep 28 05:38:44 u-HP-ProBook-645-G4 kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-22).
Sep 28 05:38:44 u-HP-ProBook-645-G4 kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -22
Sep 28 05:38:44 u-HP-ProBook-645-G4 kernel: PM: Device 0000:04:00.0 failed to resume async: error -22

$ journalctl -b -1 -k | grep "suspend entry (deep)" | wc -l
28
Comment 23 Kai-Heng Feng 2019-09-28 18:05:24 UTC
Created attachment 145576 [details]
journalctl last boot kernel message
Comment 24 Andrey Grodzovsky 2019-10-04 18:53:16 UTC
(In reply to Kai-Heng Feng from comment #23)
> Created attachment 145576 [details]
> journalctl last boot kernel message

Can u retry with latest FW from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

and also load kernel with drm.debug=1 as there seems  a failure in PSP command submission during FW loading but the actual code of failure is now under debug log level.
Comment 25 Kai-Heng Feng 2019-10-06 18:36:59 UTC
(In reply to Andrey Grodzovsky from comment #24)
> (In reply to Kai-Heng Feng from comment #23)
> > Created attachment 145576 [details]
> > journalctl last boot kernel message
> 
> Can u retry with latest FW from
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

Still same issue.

> 
> and also load kernel with drm.debug=1 as there seems  a failure in PSP
> command submission during FW loading but the actual code of failure is now
> under debug log level.

I can reproduce the issue on latest firmware ("amdgpu: update vega20 ucode for 19.30") and latest amd-staging-drm-next ("drm/amdgpu: remove redundant variable r and redundant return statement").

I don't see keep trying latest kernel/firmware makes us going anywhere. If you need a physical hardware to debug, please just let me know.
Comment 26 Kai-Heng Feng 2019-10-06 18:37:38 UTC
Created attachment 145666 [details]
PSP failed with drm.debug=1
Comment 27 Kai-Heng Feng 2019-10-06 18:38:09 UTC
Created attachment 145667 [details]
ring test failed with drm.debug=1


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.