Bug 107978

Summary: [amdgpu] Switching to tty fails with DisplayPort 1.2 monitor going to sleep (REG_WAIT timeout / dce110_stream_encoder_dp_blank)
Product: DRI Reporter: Shmerl <shtetldik>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: b.bellec, ben.r.xiao, ddstreet, dominik, freedesktop, harry.wentland, me, nicholas.kazlauskas, sunpeng.li
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg with drm.debug=4 and Xorg.0.log
none
parch for reg_wait timeout for dce
none
Patch for dp_blank timeout
none
Patch for fixing MST reboot/poweroff sequence none

Description Shmerl 2018-09-18 13:01:09 UTC
After upgrading to Linux 4.19-rc3 (from 4.18.x), I can't switch to tty anymore, the monitor connected over DisplayPort goes into sleep mode.

I see this in dmesg when it happens:

[37342.777399] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:922
[37342.777477] WARNING: CPU: 4 PID: 14403 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:254 generic_reg_wait+0xe7/0x160 [amdgpu]
[37342.777478] Modules linked in: uas usb_storage rfcomm ebtable_filter ebtables devlink ip6table_filter ip6_tables iptable_filter cmac bnep arc4 nls_ascii nls_cp437 vfat amdkfd fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd btusb btrtl amdgpu btbcm snd_hda_codec_hdmi btintel iwlmvm snd_usb_audio snd_hda_intel bluetooth kvm_amd snd_hda_codec snd_usbmidi_lib wmi_bmof mxm_wmi mac80211 snd_hda_core kvm uvcvideo videobuf2_vmalloc snd_hwdep videobuf2_memops snd_rawmidi jitterentropy_rng videobuf2_v4l2 videobuf2_common snd_seq_device chash iwlwifi irqbypass gpu_sched videodev snd_pcm crct10dif_pclmul ttm crc32_pclmul media drbg snd_timer evdev drm_kms_helper cfg80211 ansi_cprng ghash_clmulni_intel efi_pstore pcspkr drm snd k10temp ecdh_generic soundcore efivars rfkill crc16 sp5100_tco sg ccp
[37342.777514]  rng_core wmi pcc_cpufreq button acpi_cpufreq nct6775 hwmon_vid parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 xfs btrfs xor zstd_decompress zstd_compress xxhash raid6_pq libcrc32c crc32c_generic hid_generic usbhid hid sd_mod crc32c_intel ahci xhci_pci libahci aesni_intel xhci_hcd aes_x86_64 crypto_simd libata igb cryptd glue_helper nvme usbcore scsi_mod i2c_piix4 i2c_algo_bit nvme_core dca usb_common gpio_amdpt gpio_generic
[37342.777542] CPU: 4 PID: 14403 Comm: kworker/4:1 Tainted: G        W         4.19.0-rc3-amd64 #1 Debian 4.19~rc3-1~exp1
[37342.777542] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS L4.64 04/03/2018
[37342.777558] Workqueue: events drm_mode_rmfb_work_fn [drm]
[37342.777615] RIP: 0010:generic_reg_wait+0xe7/0x160 [amdgpu]
[37342.777617] Code: 44 24 58 8b 54 24 48 89 de 44 89 4c 24 08 48 8b 4c 24 50 48 c7 c7 20 9d b5 c1 e8 64 e6 f0 fe 83 7d 18 01 44 8b 4c 24 08 74 02 <0f> 0b 48 83 c4 10 44 89 c8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 0f
[37342.777618] RSP: 0018:ffffa2ae81b9ba20 EFLAGS: 00010297
[37342.777620] RAX: 0000000000000000 RBX: 000000000000000a RCX: 0000000000000000
[37342.777621] RDX: 0000000000000000 RSI: ffff93d74eb166a8 RDI: ffff93d74eb166a8
[37342.777622] RBP: ffff93d746439180 R08: 0000000000000000 R09: 0000000000010200
[37342.777623] R10: 0720072007200720 R11: 0720073207320739 R12: 0000000000000bb9
[37342.777624] R13: 00000000000051e2 R14: 0000000000010000 R15: 0000000000000000
[37342.777625] FS:  0000000000000000(0000) GS:ffff93d74eb00000(0000) knlGS:0000000000000000
[37342.777626] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37342.777627] CR2: 0000558f62bfa9d0 CR3: 00000003f2c38000 CR4: 00000000003406e0
[37342.777628] Call Trace:
[37342.777695]  dce110_stream_encoder_dp_blank+0x12c/0x1a0 [amdgpu]
[37342.777754]  core_link_disable_stream+0x54/0x220 [amdgpu]
[37342.777813]  dce110_reset_hw_ctx_wrap+0xc1/0x1e0 [amdgpu]
[37342.777872]  dce110_apply_ctx_to_hw+0x45/0x650 [amdgpu]
[37342.777928]  ? dc_remove_plane_from_context+0x1fc/0x240 [amdgpu]
[37342.777985]  dc_commit_state+0x2c6/0x520 [amdgpu]
[37342.778047]  amdgpu_dm_atomic_commit_tail+0x37a/0xd80 [amdgpu]
[37342.778052]  ? __wake_up_common_lock+0x89/0xc0
[37342.778054]  ? _cond_resched+0x15/0x30
[37342.778056]  ? wait_for_completion_timeout+0x3b/0x1a0
[37342.778117]  ? amdgpu_dm_atomic_commit_tail+0xd80/0xd80 [amdgpu]
[37342.778126]  commit_tail+0x3d/0x70 [drm_kms_helper]
[37342.778133]  drm_atomic_helper_commit+0xb4/0x120 [drm_kms_helper]
[37342.778148]  drm_framebuffer_remove+0x361/0x410 [drm]
[37342.778164]  drm_mode_rmfb_work_fn+0x4f/0x60 [drm]
[37342.778167]  process_one_work+0x1a7/0x360
[37342.778169]  worker_thread+0x30/0x390
[37342.778171]  ? pwq_unbound_release_workfn+0xd0/0xd0
[37342.778173]  kthread+0x112/0x130
[37342.778175]  ? kthread_bind+0x30/0x30
[37342.778177]  ret_from_fork+0x22/0x40
[37342.778179] ---[ end trace 3d987dd66a59ffb4 ]---

OS: Debian testing, kernel 4.19~rc3-1~exp1
GPU: Sapphire Pulse Vega 56.
amdgpu firmware: 20180825+dfsg-1
Comment 1 Shmerl 2018-09-20 03:10:09 UTC
Same thing happens with kernel 4.19-rc4.
Comment 2 Shmerl 2018-10-05 00:18:50 UTC
Still broken with 4.19.0-rc6-amd64.
Comment 3 Shmerl 2018-10-05 15:59:01 UTC
Is this issue Debian specific or anyone observed it in other distros? Because if it's something wrong with Debian's kernel build I should probably file a Debian bug about it.
Comment 4 george 2018-10-07 19:20:08 UTC
I have same problem on Fedora F28 with 4.18 kernel. I have since switched back to DVI interface which does not exhibit this bug.
Comment 5 Shmerl 2018-10-07 19:23:14 UTC
My card doesn't have DVI, and HDMI produces horrible colors, so DisplayPort is the only sane option and it's broken :(
Comment 6 Nicholas Kazlauskas 2018-10-09 13:18:32 UTC
I haven't observed this behavior occurring under Ubuntu 18.04 or Arch on 4.18 or 4.19 kernels with a Vega 56.

Do you mind posting a full dmesg log with drm.debug=4 and an xorg log? It would also help to know what desktop environment you're using.
Comment 7 Shmerl 2018-10-09 13:57:25 UTC
I'm using KDE Plasma 5.13.5, current Debian testing x86_64. Can you try it with Debian? Ubuntu stack is not that close anymore.

I also noticed one detail. When screen mode is chaning (like before sddm comes up), the monitor goes to sleep briefly and then wakes up. That wasn't happening before and it's not a correct behavior. In 4.18 it wasn't waking up at all, and it was fixed in 4.19. Similar thing is happening during switch to tty (screen mode is changing indicated by the DP monitor icon appearing), except the monitor isn't waking up anymore.
Comment 8 Shmerl 2018-10-09 13:58:16 UTC
Created attachment 141960 [details]
dmesg with drm.debug=4 and Xorg.0.log

Attaching logs.
Comment 9 Shmerl 2018-10-11 14:17:27 UTC
By the way, I tested it with this kernel:

https://github.com/M-Bab/linux-kernel-amdgpu-binaries/blob/master/linux-image-4.19.0-rc6_18.09.30.amdgpu_amd64.deb

Which is built from AMD's tree. This issue is also present in it.
Comment 10 Shmerl 2018-10-11 16:26:54 UTC
I just found something. My monitor - Dell U2413 has a setting for toggling DisplayPort 1.2. It was enabled until now. When I disable that setting, tty starts working!

My cable is supposed to support DP 1.2 and be Vesa compliant. It's Accel UltraAV DP 1.2 cable.

I hope this can help narrow down the problem.
Comment 11 george 2018-10-11 16:55:18 UTC
(In reply to Shmerl from comment #10)
> I just found something. My monitor - Dell U2413 has a setting for toggling
> DisplayPort 1.2. It was enabled until now. When I disable that setting, tty
> starts working!
> 
> My cable is supposed to support DP 1.2 and be Vesa compliant. It's Accel
> UltraAV DP 1.2 cable.
> 
> I hope this can help narrow down the problem.

Interesting, I have the exact same monitor, Dell U2413, I'll give this a try!
Comment 12 Shmerl 2018-10-16 22:42:45 UTC
@Nicholas Kazlauskas: does it help to identify the source of the problem?
Comment 13 Nicholas Kazlauskas 2018-10-17 13:07:56 UTC
(In reply to Shmerl from comment #12)
> @Nicholas Kazlauskas: does it help to identify the source of the problem?

If I had to guess I would say this is an issue specific to that monitor - DP 1.2 should generally work without issue.

This does help narrow down the problem, thanks.
Comment 14 Shmerl 2018-10-17 13:10:30 UTC
(In reply to Nicholas Kazlauskas from comment #13)
> 
> If I had to guess I would say this is an issue specific to that monitor - DP
> 1.2 should generally work without issue.

It did work fine before in DP 1.2 mode (like with kernel 4.17.x), so there must have been some regression which caused this behavior.
Comment 15 Sibren Vasse 2018-10-21 23:30:45 UTC
I own a DELL U2414H and a U2913WM, and are daisy chained via DisplayPort. I'm currently running into this issue.

A quick bisect gave this result:
 # first bad commit: [0d99889109892396a8164bf6dd178e36d3fe3166] drm/fb-helper: Eliminate the .best_encoder() usage
Comment 16 Daniel Exner 2018-11-13 16:54:39 UTC
I own a Dell U3415W also connected via DisplayPort and I have the same issue.

Worked like a charm in the past.
Comment 17 Shmerl 2018-11-22 18:55:47 UTC
(In reply to Nicholas Kazlauskas from comment #13)
> 
> This does help narrow down the problem, thanks.

Is there any chance of fixing this in 4.20?
Comment 18 Robin 2018-12-03 19:16:46 UTC
Same problem for me.
Using 2 identical DELL U2415 and a Radeon RX 580.
On Arch with 4.18.x Daisychainig both monitors worked like a charm.
After upgrading to 4.19 the monitors only work when I`m disableing DP1.2. But then I obviously can't Daisychain.
Is there any way I can help troubleshoot this?
Comment 19 Shmerl 2018-12-03 19:30:12 UTC
This seem to commonly affect Dell monitors. Do they not follow DisplayPort 1.2 spec, or amdgpu is doing something incorrectly after all?
Comment 20 Jerry Zuo 2018-12-06 20:35:26 UTC
The fix is showing up since 4.20-rc5. Please give a try.
Comment 21 Sibren Vasse 2018-12-06 21:47:29 UTC
(In reply to Jerry Zuo from comment #20)
> The fix is showing up since 4.20-rc5. Please give a try.

4.20-rc5 solves the problem for me!
Comment 22 Shmerl 2018-12-06 21:48:24 UTC
Great news! Debian didn't get 4.20 rc kernels yet, but I can try building one from source to test.
Comment 23 Sibren Vasse 2018-12-06 21:51:46 UTC
Switching to tty now works for me , I still get the [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:944 message though
Comment 24 Shmerl 2018-12-06 21:56:37 UTC
Does KWin Wayland session work for you now? It was crashing also because of this.
Comment 25 Shmerl 2018-12-07 06:30:32 UTC
Just built 4.20-rc5+. This is indeed fixed, and KWin Wayland session is finally working with DisplayPort 1.2 enabled!
Comment 26 Shmerl 2018-12-07 06:32:02 UTC
And I don't see ERROR* REG_WAIT anymore. I'm using latest Vega firmware.
Comment 27 Jerry Zuo 2018-12-07 14:58:18 UTC
The dce110_stream_encoder_dp_blank timeout is something I am working on. It doesn't break anything, but pretty annoying.
Comment 28 Shmerl 2018-12-17 03:46:02 UTC
Not sure if it's related, but with  4.20.0-rc6 the regression when monitor goes to sleep before reaching sddm is back. It's erratic, i.e. doesn't happen on every boot. And I do see this REG_WAIT in dmesg when it does.

Turning monitor off and on, then going to tty and restarting sddm works around it.

Similar bug was fixed in the previous kernels, so I wonder if the fix for this one somehow brought that issue back.

GPU: Sapphire Pulse Vega 56.
Comment 29 Jerry Zuo 2018-12-17 15:46:55 UTC
Created attachment 142834 [details] [review]
parch for reg_wait timeout for dce
Comment 30 Jerry Zuo 2018-12-17 15:49:12 UTC
Please try the patch, and see if you can see reg_wait timeout dce110_stream_encoder_dp_blank()
Comment 31 Alex Deucher 2018-12-17 15:54:14 UTC
(In reply to Jerry Zuo from comment #30)
> Please try the patch, and see if you can see reg_wait timeout
> dce110_stream_encoder_dp_blank()

Can you attach a non-zipped version?
Comment 32 Jerry Zuo 2018-12-17 16:09:24 UTC
Created attachment 142835 [details] [review]
Patch for dp_blank timeout
Comment 33 Sibren Vasse 2018-12-18 01:37:04 UTC
Just tried the patch, so far no reg_wait warnings. Looking good!
Comment 34 Shmerl 2018-12-18 02:00:51 UTC
I tested the patch. I don't see the error message now, but the startup monitor sleep regression is still present. Is it a separate issue? I can open another bug if needed.
Comment 35 Jerry Zuo 2018-12-18 15:02:09 UTC
"startup monitor sleep regression". Please give more details on that. Thanks.
Comment 36 Shmerl 2018-12-18 16:22:26 UTC
(In reply to Jerry Zuo from comment #35)
> "startup monitor sleep regression". Please give more details on that. Thanks.

This behavior used to happen in the previous kernerls but got fixed at one point. Now it's back.

Basically, after GRUB, the system boots (I usually enable loglevel=4 to see system output), but right before reaching graphical login (sddm in my case), the monitor simply switched to sleep mode. To work around it, I have to:

1. Turn the monitor off and on.
2. Switch to tty1, and restart sddm from there.

This enables graphical login.
Comment 37 Shmerl 2018-12-25 02:21:00 UTC
I wonder if it's some kind of distro specific race condition that happens during boot. Happens to me in Debian testing (you can try reproducing it there).
Comment 38 Shmerl 2018-12-27 08:14:13 UTC
(In reply to Jerry Zuo from comment #30)
> Please try the patch, and see if you can see reg_wait timeout
> dce110_stream_encoder_dp_blank()

Do you plan to backport at least this patch to 4.20.x?
Comment 40 Shmerl 2019-01-07 04:25:54 UTC
Just tested kernel 5.0-rc1, the boot problem with monitor going to sleep is still happening. Same workaround (turn the monitor off / on, switch to tty1, restart sddm) helps.
Comment 41 Shmerl 2019-01-14 02:01:40 UTC
Same thing still happen with 5.0-rc2. Does anyone else experience this race condition when the monitor goes to sleep right after boot? It is related to the same setup (DisplayPort 1.2), since it's not happening with it disabled.
Comment 42 Jerry Zuo 2019-01-15 15:56:20 UTC
We are currently looking at the startup issue now.
Comment 43 Jerry Zuo 2019-01-24 16:03:42 UTC
We observed MST cannot light up in every reboot since 4.20, and we've already got the reboot/poweroff sequence fixed. The fix will show up soon ...
Comment 44 Jerry Zuo 2019-01-24 17:59:15 UTC
Created attachment 143225 [details] [review]
Patch for fixing MST reboot/poweroff sequence
Comment 45 Jerry Zuo 2019-01-24 21:49:22 UTC
Please try the patch and see if that fixes your issue. We will come up with proper solution later.
Comment 46 Shmerl 2019-01-25 02:08:14 UTC
(In reply to Jerry Zuo from comment #45)
> Please try the patch and see if that fixes your issue. We will come up with
> proper solution later.

I tested the patch, and it didn't fix the issue for me.
Comment 47 Jerry Zuo 2019-01-25 16:14:18 UTC
You observed the monitor goes to sleep right after reboot, did you? Hotplug will have the display back on. Is that what you observed?
Comment 48 Shmerl 2019-01-25 16:17:09 UTC
(In reply to Jerry Zuo from comment #47)
> You observed the monitor goes to sleep right after reboot, did you? Hotplug
> will have the display back on. Is that what you observed?

After any boot, not necessarily reboot. The system boots to the point where sddm should start, and monitor turns sleep mode on.Toggling the monitor off and (and then restarting sddm) works around it.
Comment 49 Shmerl 2019-01-25 16:18:20 UTC
Though my tests mostly focused on reboots. Let me double check what happens after actual cold off and boot.
Comment 50 Shmerl 2019-01-25 16:24:59 UTC
I think I know what happened. When I was rebooting, I still had the older kernel running (without the fix), to actually test the new one, so probably the bug still kicked in after reboot.

I now rebooted a few more times already from the patched kernel, and I don't see the issue anymore. It also doesn't happen with boot from complete off state.

Thanks!
Comment 51 Jerry Zuo 2019-01-28 15:16:27 UTC
Good to hear that the patch fixes your issue. The official patch will be merged to the next kernel release. Thank you so much for your dedicated testing and prompt feedback ^_^
Comment 52 Shmerl 2019-01-28 16:36:05 UTC
(In reply to Jerry Zuo from comment #51)
> Good to hear that the patch fixes your issue. The official patch will be
> merged to the next kernel release. Thank you so much for your dedicated
> testing and prompt feedback ^_^

Thanks! Do you mean the fix will land in 5.0 or the one release after that?
Comment 53 Jerry Zuo 2019-01-28 17:59:12 UTC
The fix should show up in the coming release.
Comment 54 Shmerl 2019-01-28 18:01:46 UTC
(In reply to Jerry Zuo from comment #53)
> The fix should show up in the coming release.

Great, thanks!
Comment 56 Shmerl 2019-02-18 17:52:29 UTC
I see the fix in this submission though: https://lists.freedesktop.org/archives/amd-gfx/2019-February/031437.html

So it's coming in 5.1 only?
Comment 57 Alex Deucher 2019-02-18 21:00:13 UTC
(In reply to Shmerl from comment #56)
> I see the fix in this submission though:
> https://lists.freedesktop.org/archives/amd-gfx/2019-February/031437.html

This patch missed the window for last week's -fixes pull.

> 
> So it's coming in 5.1 only?

It can still land in 5.0 and prior kernels via stable if it doesn't make it in for 5.0 final.
Comment 59 Shmerl 2019-05-07 23:46:21 UTC
Closing, since it's already fixed in the released kernels.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.