Bug 107947

Summary: REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif line:563 during boot
Product: DRI Reporter: Tobias Theisselmann <mail>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: major    
Priority: medium CC: asmith, harry.wentland, root.main, sunpeng.li
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
i915 platform: i915 features:
Description Flags
dmesg logs for failed boot to black screen
dmesg logs for successful boot to login screen none

Description Tobias Theisselmann 2018-09-16 06:34:40 UTC
During boot of the system, the following stacktrace appears (on kernel 4.17.19 and 4.18.7), my GPU is a Vega 64. System works otherwise.

 [    9.354820] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif line:563
[    9.354887] WARNING: CPU: 13 PID: 192 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:195 generic_reg_wait+0xe7/0x160 [amdgpu]
[    9.354888] Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common snd_usb_audio videodev snd_usbmidi_lib media snd_rawmidi snd_seq_device squashfs nls_iso8859_1 nls_cp437 vfat fat loop btrfs zstd_compress libcrc32c zstd_decompress xxhash xor amdkfd amd_iommu_v2 edac_mce_amd amdgpu kvm_amd snd_hda_codec_realtek chash snd_hda_codec_generic kvm snd_hda_codec_ca0132 gpu_sched snd_hda_codec_hdmi ttm snd_hda_intel irqbypass drm_kms_helper snd_hda_codec crct10dif_pclmul crc32_pclmul eeepc_wmi ghash_clmulni_intel asus_wmi sparse_keymap pcbc rfkill wmi_bmof mxm_wmi snd_hda_core drm ccp snd_hwdep igb snd_pcm snd_timer aesni_intel snd agpgart sp5100_tco aes_x86_64 syscopyarea crypto_simd sysfillrect i2c_algo_bit sysimgblt cryptd raid6_pq fb_sys_fops glue_helper pcspkr
[    9.354910]  rng_core soundcore i2c_piix4 k10temp dca shpchp rtc_cmos pinctrl_amd gpio_amdpt evdev pcc_cpufreq wmi acpi_cpufreq mac_hid usbip_host(+) usbip_core uinput sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 fscrypto sr_mod cdrom sd_mod hid_corsair led_class hid_generic usbhid hid ahci libahci xhci_pci crc32c_intel xhci_hcd libata usbcore scsi_mod usb_common
[    9.354926] CPU: 13 PID: 192 Comm: kworker/13:1 Not tainted 4.17.19-1-MANJARO #1
[    9.354926] Hardware name: System manufacturer System Product Name/CROSSHAIR VI HERO, BIOS 3008 11/29/2017
[    9.354954] Workqueue: events dm_irq_work_func [amdgpu]
[    9.354979] RIP: 0010:generic_reg_wait+0xe7/0x160 [amdgpu]
[    9.354979] RSP: 0018:ffffa832c22a7a18 EFLAGS: 00010297
[    9.354980] RAX: 0000000000000000 RBX: 000000000000000a RCX: 0000000000000001
[    9.354981] RDX: 0000000000000000 RSI: ffffffff82e89d1e RDI: 00000000ffffffff
[    9.354981] RBP: ffff8856cbdee580 R08: 0000000000000000 R09: 0000000000000002
[    9.354982] R10: ffffa832d102a220 R11: 0000000000000001 R12: 0000000000000dad
[    9.354982] R13: 00000000000035b0 R14: 0000000000000010 R15: 0000000000000001
[    9.354983] FS:  0000000000000000(0000) GS:ffff8856ce940000(0000) knlGS:0000000000000000
[    9.354984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.354984] CR2: 00007f33eed94e38 CR3: 000000037f00a000 CR4: 00000000003406e0
[    9.354985] Call Trace:
[    9.355015]  dce_mi_free_dmif+0xf8/0x180 [amdgpu]
[    9.355041]  dce110_reset_hw_ctx_wrap+0x141/0x1b0 [amdgpu]
[    9.355067]  dce110_apply_ctx_to_hw+0x52/0xa30 [amdgpu]
[    9.355092]  ? hwmgr_handle_task+0x6b/0xc0 [amdgpu]
[    9.355117]  ? pp_dpm_dispatch_tasks+0x5a/0x70 [amdgpu]
[    9.355133]  ? amdgpu_pm_compute_clocks+0x98/0x4d0 [amdgpu]
[    9.355158]  dc_commit_state+0x2df/0x560 [amdgpu]
[    9.355182]  ? set_freesync_on_streams.part.6+0x4d/0x250 [amdgpu]
[    9.355205]  ? mod_freesync_set_user_enable+0x11f/0x150 [amdgpu]
[    9.355230]  amdgpu_dm_atomic_commit_tail+0x36b/0xd80 [amdgpu]
[    9.355247]  ? amdgpu_bo_pin_restricted+0x1da/0x2c0 [amdgpu]
[    9.355250]  ? _raw_spin_lock_irq+0x1a/0x40
[    9.355250]  ? _raw_spin_unlock_irq+0x1d/0x30
[    9.355252]  ? wait_for_common+0x113/0x190
[    9.355253]  ? _raw_spin_unlock_irq+0x1d/0x30
[    9.355254]  ? wait_for_common+0x113/0x190
[    9.355278]  ? dm_plane_helper_cleanup_fb+0x120/0x120 [amdgpu]
[    9.355283]  commit_tail+0x3d/0x70 [drm_kms_helper]
[    9.355287]  drm_atomic_helper_commit+0x103/0x110 [drm_kms_helper]
[    9.355291]  restore_fbdev_mode_atomic+0x1a8/0x210 [drm_kms_helper]
[    9.355295]  drm_fb_helper_restore_fbdev_mode_unlocked+0x45/0x90 [drm_kms_helper]
[    9.355299]  drm_fb_helper_set_par+0x29/0x50 [drm_kms_helper]
[    9.355302]  drm_fb_helper_hotplug_event.part.27+0x8d/0xb0 [drm_kms_helper]
[    9.355326]  handle_hpd_irq+0x84/0x90 [amdgpu]
[    9.355350]  dm_irq_work_func+0x4e/0x60 [amdgpu]
[    9.355353]  process_one_work+0x1d1/0x3b0
[    9.355354]  worker_thread+0x2b/0x3d0
[    9.355355]  ? process_one_work+0x3b0/0x3b0
[    9.355356]  kthread+0x112/0x130
[    9.355357]  ? kthread_flush_work_fn+0x10/0x10
[    9.355359]  ret_from_fork+0x22/0x40
[    9.355360] Code: 8b 44 24 58 8b 54 24 48 89 de 44 89 4c 24 08 48 8b 4c 24 50 48 c7 c7 b0 5f 17 c1 e8 c4 38 b6 ff 83 7d 20 01 44 8b 4c 24 08 74 02 <0f> 0b 48 83 c4 10 44 89 c8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 
[    9.355377] ---[ end trace 93f20a7f7b645de3 ]---
Comment 1 Gediminas Jakutis 2018-09-24 19:52:16 UTC
Bug 107947 might be related, but the backtrace appears to be rather different.
Comment 2 Gediminas Jakutis 2018-09-24 19:53:18 UTC
(In reply to Gediminas Jakutis from comment #1)
> Bug 107947 might be related, but the backtrace appears to be rather
> different.

please ignore this comment – accidentally replied to the wrong bug report in the wrong browser tab
Comment 3 Kent Ross 2018-10-30 01:24:06 UTC
I have this issue, or one very like it. I have a system with both a Vega64 and a GTX 980ti. The 980ti is controlled by the vfio-pci driver (preempting nouveau, nvidiafb, etc. on boot) and is therefore not in use by this host system.

If a display is plugged into the 980ti at boot time, booting will always fail with a black screen and more failures from drm. If a display is not plugged in, booting will almost always work without a problem, with fewer errors from drm.

Even with no display plugged into the 980ti, when the system's displays (plugged into the vega64) resume when they have been sleeping (such as on the lock screen) there is a substantial chance all displays will be completely black and drm failures can be seen in dmesg. If a display is plugged into the 980ti when the main displays wake, the outcome is much less likely to be good.

In the case where the system is returning from sleeping screens it fortunately does not fully necessitate a complete reboot, as the displays will sleep again in a few seconds and it can be tried again until success. Nonetheless, as this system is intended to be used as a VM host with a passed-through monitor, this bug is a showstopper.

I have full dmesg logs from boot to an unusable black screen, and from boot to a login screen.
Comment 4 Kent Ross 2018-10-30 01:26:22 UTC
Created attachment 142268 [details]
dmesg logs for failed boot to black screen
Comment 5 Kent Ross 2018-10-30 01:26:48 UTC
Created attachment 142269 [details]
dmesg logs for successful boot to login screen
Comment 6 Kent Ross 2018-10-31 07:53:40 UTC
Good-citizen update for you: I have resolved my immediate problem. These apparent crashes are still happening pretty often when display modes change, but do not seem to be the root cause of my issue.

(In summary: my AMD and nvidia GPUs were in slots 1 and 4 of my motherboard; I moved them to slots 3 and 1 respectively, which gives both of them full x16 lanes instead of both being on x8 lanes if the documentation tells the truth, and while the system still shows bios and grub on the nvidia card, booting and resuming in ubuntu now seems to work fine. Quite unsure why, but there you have it.)

Apologies for any unhelpful noise, good luck with whatever this bug actually is, and thank you for your work.
Comment 7 Alex Smith 2018-10-31 15:04:52 UTC
This is occurring for me on a Vega 64. When it occurs the machine boots to a black screen. It has started happening since upgrading to Fedora 27's 4.18.16, previously I was on 4.18.7 (does not happen there). I've since updated to Fedora 29, 4.18.16-300.fc29.x86_64, and it still happens there.

[drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif line:636
WARNING: CPU: 6 PID: 122 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:254 generic_reg_wait+0xe8/0x160 [amdgpu]
Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c tun bridge stp llc ebtable_filter ebtables >
 ttm alx drm crc32c_intel mdio video
CPU: 6 PID: 122 Comm: kworker/6:1 Not tainted 4.18.16-100.fc27.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. Z170X-Gaming 3/Z170X-Gaming 3, BIOS F7 06/03/2016
Workqueue: events drm_mode_rmfb_work_fn [drm]
RIP: 0010:generic_reg_wait+0xe8/0x160 [amdgpu]
Code: 58 48 8b 4c 24 50 89 ee 8b 54 24 48 48 c7 c7 e0 93 75 c0 44 89 4c 24 08 e8 b5 4f c1 ff 41 83 7c 24 20 01 44 8b 4c 24 08 74 02 <0f> 0b 48 83 c4 10 44 89 c8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f af 
RSP: 0018:ffffb846441e7a78 EFLAGS: 00010297
RAX: 0000000000000000 RBX: 0000000000000dad RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9365c1d96938 RDI: ffff9365c1d96938
RBP: 000000000000000a R08: 0000000000000473 R09: 0000000000000002
R10: 000000000000beee R11: ffffffff889a21ed R12: ffff93659551d900
R13: 00000000000035af R14: 0000000000000010 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff9365c1d80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000250e080 CR3: 00000005e720a005 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 dce_mi_free_dmif+0xf7/0x180 [amdgpu]
 dce110_reset_hw_ctx_wrap+0x13f/0x1e0 [amdgpu]
 dce110_apply_ctx_to_hw+0x58/0x9e0 [amdgpu]
 ? _cond_resched+0x15/0x40
 ? pp_dpm_dispatch_tasks+0x41/0x60 [amdgpu]
 ? amdgpu_pm_compute_clocks.part.9+0xb7/0x590 [amdgpu]
 dc_commit_state+0x30a/0x590 [amdgpu]
 amdgpu_dm_atomic_commit_tail+0x385/0xd70 [amdgpu]
 ? _cond_resched+0x15/0x40
 ? wait_for_completion_timeout+0x3a/0x190
 ? wait_for_completion_interruptible+0x35/0x1c0
 commit_tail+0x3d/0x70 [drm_kms_helper]
 drm_atomic_helper_commit+0xfc/0x110 [drm_kms_helper]
 drm_framebuffer_remove+0x30d/0x400 [drm]
 drm_mode_rmfb_work_fn+0x4f/0x60 [drm]
 ? process_one_work+0x370/0x370
 ? kthread_create_worker_on_cpu+0x70/0x70
---[ end trace 9529270edb28c719 ]---
Comment 8 Alex Smith 2018-11-06 09:20:12 UTC
I think this is related to display power management - it occurs when the monitor is woken from standby. The blank screen I get at startup occurs between the login screen (gdm) and the desktop. After logging in, gdm briefly puts the display to sleep, and then when it wakes up again the screen is blank and this error appears in dmesg.

I've found I'm usually able to get things working again by doing a few VT switches between gdm and the desktop, but this is still quite annoying as it is happening almost every boot.
Comment 9 Martin Peres 2019-11-19 08:56:05 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/528.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.