Bug 110886 - After S3 resume, kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:57:crtc-0] flip_done timed out
Summary: After S3 resume, kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_h...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-11 06:17 UTC by Kai-Heng Feng
Modified: 2019-08-19 08:52 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Full kernel log (210.62 KB, text/plain)
2019-06-11 06:18 UTC, Kai-Heng Feng
no flags Details
Another kind of fail (370.40 KB, text/plain)
2019-06-11 09:06 UTC, Kai-Heng Feng
no flags Details
failed log when iommu is disabled. (357.82 KB, text/plain)
2019-08-13 08:22 UTC, Kai-Heng Feng
no flags Details
amd-staging-drm-net dmesg log (102.74 KB, text/plain)
2019-08-17 22:25 UTC, Samantha McVey
no flags Details
amd-staging-drm-next xorg log (48.71 KB, text/plain)
2019-08-17 22:25 UTC, Samantha McVey
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kai-Heng Feng 2019-06-11 06:17:40 UTC
System: HP ProBook 645 G4
APU: Ryzen 3 PRO 2300U

After system S3 resume, the system may freeze:

Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:57:crtc-0] flip_done timed out
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:57:crtc-0] flip_done timed out
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:65:eDP-1] flip_done timed out
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:50:plane-3] flip_done timed out
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: WARNING: CPU: 1 PID: 1058 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5580 amdgpu_dm_atomic_commit_tail+0x19f4/0x1a80 [amdgpu]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: Modules linked in: ccm nls_iso8859_1 amdgpu snd_hda_codec_conexant arc4 iwlmvm snd_hda_codec_generic amd_iommu_v2 ledtrig_audio snd_hda_codec_hdmi gpu_sched kvm_amd snd_hda_intel i2c_
algo_bit snd_hda_codec ccp ttm snd_hwdep kvm snd_hda_core drm_kms_helper mac80211 snd_pcm irqbypass syscopyarea snd_seq sysfillrect iwlwifi snd_timer sysimgblt snd_seq_device snd fb_sys_fops drm crct10dif_pclmul crc32_pclmul so
undcore cfg80211 ghash_clmulni_intel rtsx_pci_ms aesni_intel hp_wmi sparse_keymap k10temp wmi_bmof memstick aes_x86_64 ucsi_acpi glue_helper hp_accel typec_ucsi typec crypto_simd cryptd video hp_wireless wmi joydev input_leds l
is3lv02d mac_hid input_polldev serio_raw sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 rtsx_pci_sdmmc psmouse i2c_piix4 ahci rtsx_pci libahci r8169 realtek i2c_hid hid
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: CPU: 1 PID: 1058 Comm: kworker/u32:6 Not tainted 5.2.0-rc1+ #2
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: Hardware name: HP HP ProBook 645 G4/8401, BIOS Q82 Ver. 01.07.01 05/06/2019
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: Workqueue: events_unbound async_run_entry_fn
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x19f4/0x1a80 [amdgpu]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: Code: ff ff 8b b0 90 04 00 00 48 c7 c7 61 bc bf c0 e8 c2 0a b5 ff 0f b6 85 06 fe ff ff 88 85 08 fe ff ff 49 8b 45 08 e9 f9 f1 ff ff <0f> 0b e9 1d f3 ff ff 0f 0b 48 8b 06 0f b6 8e e0 0
1 00 00 bf 04 00
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: RSP: 0018:ffffb1e4c243b8e0 EFLAGS: 00010002
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: RAX: 0000000000000002 RBX: 0000000000000202 RCX: ffff9a8fd18b6970
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffff9a8fd02a5958
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: RBP: ffffb1e4c243bb20 R08: ffffb1e4c243b7f4 R09: 0000000000000000
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: R10: 0000000000000000 R11: ffffb1e4c243b838 R12: ffff9a8fe2ba0400
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: R13: ffff9a8fe1495f80 R14: ffff9a8fd18b6800 R15: ffff9a8fd2280000
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: FS:  0000000000000000(0000) GS:ffff9a8fe7c40000(0000) knlGS:0000000000000000
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: CR2: 0000000000000000 CR3: 000000020f434000 CR4: 00000000003406e0
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: Call Trace:
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  commit_tail+0x42/0x70 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ? commit_tail+0x42/0x70 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_atomic_helper_commit+0x113/0x120 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  amdgpu_dm_atomic_commit+0xb1/0xf0 [amdgpu]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_atomic_commit+0x4a/0x50 [drm]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  restore_fbdev_mode_atomic+0x1bf/0x1d0 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  restore_fbdev_mode+0x4e/0x160 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ? _cond_resched+0x19/0x30
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_fb_helper_restore_fbdev_mode_unlocked+0x4e/0xa0 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_fb_helper_hotplug_event.part.41+0x97/0xc0 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  drm_kms_helper_hotplug_event+0x2a/0x40 [drm_kms_helper]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  amdgpu_device_resume+0x319/0x3a0 [amdgpu]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  amdgpu_pmops_resume+0x31/0x60 [amdgpu]
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  pci_pm_resume+0x6d/0xa0
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ? pci_pm_suspend_late+0x40/0x40
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  dpm_run_callback+0x5b/0x150
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  device_resume+0xb8/0x200
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  async_resume+0x1d/0x30
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  async_run_entry_fn+0x3c/0x150
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  process_one_work+0x20f/0x410
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  worker_thread+0x34/0x400
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  kthread+0x120/0x140
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ? process_one_work+0x410/0x410
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ? __kthread_parkme+0x70/0x70
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel:  ret_from_fork+0x22/0x40
Jun 11 01:40:21 u-HP-ProBook-645-G4 kernel: ---[ end trace 55daf5798b2f5f1a ]---

Test conducted on latest amdgpu/amd-staging-drm-next, it's commit 40cc64619a2580b26f924bcabdefd555e7831a14 as of now.
Comment 1 Kai-Heng Feng 2019-06-11 06:18:11 UTC
Created attachment 144498 [details]
Full kernel log
Comment 2 Kai-Heng Feng 2019-06-11 09:06:27 UTC
Created attachment 144502 [details]
Another kind of fail

Jun 11 03:02:41 u-HP-ProBook-645-G4 kernel: [drm] psp command failed and response status is (-65529)
Comment 3 Alex Deucher 2019-07-05 16:01:08 UTC
Is this a regression?  If so, can you bisect?
Comment 4 Kai-Heng Feng 2019-07-05 16:19:28 UTC
(In reply to Alex Deucher from comment #3)
> Is this a regression?  If so, can you bisect?
No this is not a regression.

This issue (S3 resume fail) also happens on previous kernel versions, but without any stack trace logged.
On amd-staging-drm-next we can observe the same issue and a stacktrace.
Comment 5 Alex Deucher 2019-08-08 05:54:12 UTC
Does disabling the IOMMU help?
Comment 6 Kai-Heng Feng 2019-08-13 08:22:28 UTC
Created attachment 145044 [details]
failed log when iommu is disabled.
Comment 7 Kai-Heng Feng 2019-08-13 08:26:33 UTC
I also tried disabling GFXOFF but the same issue still happens:
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
index a24beaa4fb01..62a8394b1f5f 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
@@ -173,6 +173,7 @@ int hwmgr_early_init(struct pp_hwmgr *hwmgr)
        case AMDGPU_FAMILY_RV:
                switch (hwmgr->chip_id) {
                case CHIP_RAVEN:
+                       hwmgr->feature_mask &= ~PP_GFXOFF_MASK;
                        hwmgr->od_enabled = false;
                        hwmgr->smumgr_funcs = &smu10_smu_funcs;
                        smu10_init_function_pointers(hwmgr);
Comment 8 Andrey Grodzovsky 2019-08-13 18:41:31 UTC
(In reply to Kai-Heng Feng from comment #6)
> Created attachment 145044 [details]
> failed log when iommu is disabled.

What was the failur ewith IOMMU disabled ? Is it the same as with IOMMU enabled ?
In the log I only see PSP errors on resume. Can you confirm that the only failure/error you observed in the log in that use case ?

Can you please provide your FW versions by 
cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
Comment 9 Kai-Heng Feng 2019-08-14 04:10:45 UTC
(In reply to Andrey Grodzovsky from comment #8)
> (In reply to Kai-Heng Feng from comment #6)
> > Created attachment 145044 [details]
> > failed log when iommu is disabled.
> 
> What was the failur ewith IOMMU disabled ?
Blanked screen. Graphics no longer works.

>Is it the same as with IOMMU enabled ?
Yes.

> In the log I only see PSP errors on resume. Can you confirm that the only
> failure/error you observed in the log in that use case ?
Yes. I haven't seen 
"[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:57:crtc-0] flip_done timed out"
for a while.

Now it always shows PSP fail.

> 
> Can you please provide your FW versions by 
> cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 40, firmware version: 0x00000099
PFP feature version: 40, firmware version: 0x000000ae
CE feature version: 40, firmware version: 0x0000004d
RLC feature version: 1, firmware version: 0x00000213
RLC SRLC feature version: 1, firmware version: 0x00000001
RLC SRLG feature version: 1, firmware version: 0x00000001
RLC SRLS feature version: 1, firmware version: 0x00000001
MEC feature version: 40, firmware version: 0x0000018b
MEC2 feature version: 40, firmware version: 0x0000018b
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 0, firmware version: 0x001ad4d4
TA XGMI feature version: 0, firmware version: 0x00000000
TA RAS feature version: 0, firmware version: 0x00000000
SMC feature version: 0, firmware version: 0x00001e4f
SDMA0 feature version: 41, firmware version: 0x000000a9
VCN feature version: 0, firmware version: 0x0110901c
DMCU feature version: 0, firmware version: 0x00000000
VBIOS version: SWBRT32481.001
Comment 10 Samantha McVey 2019-08-14 06:59:12 UTC
I am getting this same issue (at least I believe the same). It is in the 5.2 series but not in the 5.1 series of the kernel. If needed I can post my logs. I have Lenovo A485 w/ 2700U
Comment 11 Kai-Heng Feng 2019-08-15 14:17:53 UTC
(In reply to Samantha McVey from comment #10)
> I am getting this same issue (at least I believe the same). It is in the 5.2
> series but not in the 5.1 series of the kernel. If needed I can post my
> logs. I have Lenovo A485 w/ 2700U

Can you please build a kernel from branch [1], reproduce the issue, and attach `journalctl -b -1 -k` so we can check if is really a same issue.

[1] https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
Comment 12 Kai-Heng Feng 2019-08-15 14:26:57 UTC
> Now it always shows PSP fail.
I've dug up more info about this issue. It always times out in psp_cmd_submit_buf(). Particularly, this code section:

	while (*((unsigned int *)psp->fence_buf) != index) {
		if (--timeout == 0)
			break;
		msleep(1);
	}

psp->fence_buf stuck at 406 and index stuck at 407 and it eventually times out.
This _always_ happens at 27th time of S3, and freeze the whole system at 28th S3 attempt.
Comment 13 Samantha McVey 2019-08-17 22:25:21 UTC
Created attachment 145085 [details]
amd-staging-drm-net dmesg log
Comment 14 Samantha McVey 2019-08-17 22:25:53 UTC
Created attachment 145086 [details]
amd-staging-drm-next xorg log
Comment 15 Samantha McVey 2019-08-17 22:27:04 UTC
I have uploaded my dmesg log and xorg log from amd-staging-drm-next
Comment 16 Kai-Heng Feng 2019-08-19 08:52:30 UTC
(In reply to Samantha McVey from comment #13)
> Created attachment 145085 [details]
> amd-staging-drm-net dmesg log

Doesn't look like the same one.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.