Bug 103987 - [DC] drm:drm_atomic_helper_wait_for_dependencies - flip_done timed out
Summary: [DC] drm:drm_atomic_helper_wait_for_dependencies - flip_done timed out
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-30 03:15 UTC by Barry G
Modified: 2018-04-24 18:56 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
full dmesg (188.18 KB, text/plain)
2017-11-30 03:15 UTC, Barry G
no flags Details
Add locking to front-end programming (2.21 KB, text/plain)
2017-12-01 00:34 UTC, Harry Wentland
no flags Details

Description Barry G 2017-11-30 03:15:36 UTC
Created attachment 135826 [details]
full dmesg

I am having an intermittent issue with the RX Vega 64 in my box on Linux 4.15-rc1.  Sometimes everything boots up and starts fine.  Other times, it boots to monitors with the backlight lit and nothing on the display (blank startups).

During these blank startups, I get a series of BUGs in the output.  The first one looks like this:
[   37.600184] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:45:crtc-1] flip_done timed out
[   41.706361] FS-Cache: Loaded
[   41.716678] RPC: Registered named UNIX socket transport module.
[   41.717859] RPC: Registered udp transport module.
[   41.718895] RPC: Registered tcp transport module.
[   41.718895] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   41.729550] FS-Cache: Netfs 'nfs' registered for caching
[   41.733295] Key type dns_resolver registered
[   41.744370] NFS: Registering the id_resolver key type
[   41.744994] Key type id_resolver registered
[   41.745581] Key type id_legacy registered
[   47.626795] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:47:crtc-2] flip_done timed out
[   57.653462] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:37:plane-1] flip_done timed out
[   67.680126] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:38:plane-2] flip_done timed out
[   67.680177] [drm] {1280x1024, 1688x1066@108000Khz}
[   67.680213] [drm] {1280x1024, 1688x1066@108000Khz}
[   67.680249] [drm] {1280x1024, 1688x1066@108000Khz}
[   67.680285] [drm] {1280x1024, 1688x1066@108000Khz}
[   67.695986] [drm] HBR2x4 pass VS=0, PE=0
[   68.174842] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* amdgpu_dm_commit_planes: acrtc 1, already busy
[   68.174885] WARNING: CPU: 2 PID: 290 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:3936 amdgpu_dm_atomic_commit_tail+0x8f3/0x9a0 [amdgpu]
[   68.174886] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache amdkfd snd_hda_codec_realtek amd_iommu_v2 snd_hda_codec_generic snd_hda_codec_hdmi nls_iso8859_1 amdgpu wmi_bmof mxm_wmi edac_mce_amd snd_usb_audio snd_usbmidi_lib snd_hda_intel nls_cp437 kvm snd_hda_codec vfat chash snd_hda_core snd_rawmidi irqbypass ttm snd_seq_device snd_hwdep xpad fat mousedev ixgbe snd_pcm evdev ff_memless input_leds drm_kms_helper joydev mac_hid pcspkr drm led_class snd_timer syscopyarea igb mdio sysfillrect sysimgblt ptp sp5100_tco fb_sys_fops snd cdc_acm pps_core k10temp i2c_algo_bit soundcore i2c_piix4 dca shpchp tpm_tis tpm_tis_core tpm wmi 8250_dw button acpi_cpufreq sch_fq_codel ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypto algif_skcipher af_alg sd_mod dm_crypt
[   68.174922]  dm_mod dax uas usb_storage hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ahci libahci ccp xhci_pci nvme rng_core sha256_generic libata xhci_hcd nvme_core sha1_generic usbcore scsi_mod usb_common serio
[   68.174937] CPU: 2 PID: 290 Comm: kworker/2:4 Tainted: G        W        4.15.0-rc1-g4fbd8d194f06 #1
[   68.174938] Hardware name: Micro-Star International Co., Ltd. MS-7B09/X399 GAMING PRO CARBON AC (MS-7B09), BIOS 1.60 11/14/2017
[   68.174943] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[   68.174945] task: ffff880fefc88000 task.stack: ffffc90007ba4000
[   68.174970] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x8f3/0x9a0 [amdgpu]
[   68.174971] RSP: 0018:ffffc90007ba7ac0 EFLAGS: 00010086
[   68.174972] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000062
[   68.174973] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000082
[   68.174973] RBP: ffff880ff27ea000 R08: 0000006264b60830 R09: 0000000000000062
[   68.174974] R10: ffff880fe4104000 R11: 0000000000000000 R12: ffff880fe4104000
[   68.174975] R13: ffff880ff44e8b40 R14: ffff880fefc74480 R15: ffff880fe4104000
[   68.174976] FS:  0000000000000000(0000) GS:ffff880ffc880000(0000) knlGS:0000000000000000
[   68.174977] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   68.174977] CR2: 00007f254b5d3010 CR3: 0000000fefe4e000 CR4: 00000000003406e0
[   68.174978] Call Trace:
[   68.174987]  commit_tail+0x3a/0x70 [drm_kms_helper]
[   68.174992]  drm_atomic_helper_commit+0xfc/0x110 [drm_kms_helper]
[   68.174995]  restore_fbdev_mode_atomic+0x181/0x1f0 [drm_kms_helper]
[   68.175000]  drm_fb_helper_restore_fbdev_mode_unlocked.part.25+0x23/0x70 [drm_kms_helper]
[   68.175003]  drm_fb_helper_set_par+0x3e/0x70 [drm_kms_helper]
[   68.175006]  drm_fb_helper_hotplug_event.part.24+0x9e/0xb0 [drm_kms_helper]
[   68.175010]  drm_dp_send_link_address+0x17f/0x1f0 [drm_kms_helper]
[   68.175014]  drm_dp_add_port+0x28f/0x3c0 [drm_kms_helper]
[   68.175019]  ? try_to_del_timer_sync+0x4d/0x80
[   68.175020]  ? del_timer_sync+0x35/0x40
[   68.175023]  ? schedule_timeout+0x9e/0x440
[   68.175025]  ? collect_expired_timers+0xa0/0xa0
[   68.175027]  ? finish_wait+0x2f/0x60
[   68.175031]  drm_dp_send_link_address+0x168/0x1f0 [drm_kms_helper]
[   68.175035]  drm_dp_check_and_send_link_address+0x87/0xc0 [drm_kms_helper]
[   68.175038]  drm_dp_mst_link_probe_work+0x4b/0x70 [drm_kms_helper]
[   68.175042]  process_one_work+0x1da/0x410
[   68.175044]  worker_thread+0x2b/0x3d0
[   68.175046]  ? process_one_work+0x410/0x410
[   68.175047]  kthread+0x111/0x130
[   68.175049]  ? kthread_create_on_node+0x70/0x70
[   68.175050]  ret_from_fork+0x1f/0x30
[   68.175052] Code: fb ff ff 45 8b 87 b0 04 00 00 48 c7 c1 10 68 c6 a0 48 c7 c2 54 bf c9 a0 31 f6 48 c7 c7 e6 bd c9 a0 48 89 44 24 40 e8 ed 04 96 ff <0f> ff 48 8b 44 24 40 4c 8b 1c 24 e9 a0 fc ff ff 49 8b 8f 18 02 
[   68.175073] ---[ end trace 898e866af31ab98b ]---

Sometimes a power-cycle will boot to a good display, sometimes it won't.  I am currently on 8 bad boots in a row.  At lunch my 4th boot worked.

I have Xorg disabled on this device right now so it appears to be limited to a kernel issue.
Comment 1 Barry G 2017-11-30 15:51:54 UTC
Last night I updated the kernel to

1086abf3e8ab8eaf7355c6362bad28283b4fb021 (drm/amd/display: USB-C / thunderbolt dock specific workaround)

from https://cgit.freedesktop.org/~hwentland/linux/commit/?h=4.15-rc1-fixes

per the amd-gfx pull request.  It did not affect this bug.  It took 4 boots for my displays to come up with content on 1086abf.

Also note this particular system doesn't have a reset button, owing to the fact case manufacturers appear to be removing them.  As such, these are all cold boots.  I could wire up a hacky reset button if warm-reset behavior would be helpful.
Comment 2 Harry Wentland 2017-12-01 00:34:49 UTC
Created attachment 135850 [details]
Add locking to front-end programming

Do you have a chance to give this patch a spin? It helped us with similar issues on an internal branch.
Comment 3 Barry G 2017-12-01 02:25:02 UTC
Yes!  The pipe locking patch made a huge differencee.  Just cold booted 10 times and the displays came up great 10 times.  

I applied it on top of 1086abf3e8ab for this test.

Thanks!
Comment 4 Harry Wentland 2017-12-01 14:27:10 UTC
Thanks for testing. I'll queue it up for the next set of 4.15 fixes.
Comment 5 Barry G 2017-12-11 02:38:04 UTC
Looks like 4.15-rc3 doesn't have this patch in it.  Is it still staged?

Thanks!
Comment 6 Harry Wentland 2017-12-11 15:01:23 UTC
Been busy and haven't gotten it in yet. It's still staged. Hope to queue it up for next RC this week.
Comment 7 Harry Wentland 2018-04-24 18:56:15 UTC
Marking resolved as fix should be in mainline for a while now. If this is still an issue feel free to reopen.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.