I use HP EliteBook 745 G5 with Ryzen 5 PRO 2500U and external monitor BenQ GW2260.
Today after taking ~10 minutes break amdgpu had some problems with enabling my external monitor back (after putting it in sleep mode or something). It took it about half a minute I think.
I checked dmesg immediately and found there a WARNING:
[65984.999696] WARNING: CPU: 6 PID: 2081 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:868 dcn10_verify_allow_pstate_change_high+0x25/0x260 [amdgpu]
I suspect it may be related to the issue I'm reporting.
So far I was using kernels 4.19 and 4.20 (for the last 2 months) and never saw it. A day ago I've switched to the kernel 5.0.0-rc6. It may be either:
1. A regression
2. A very rare bug
Created attachment 143373 [details]
I'm attaching my pretty big dmesg. There are many MCE errors reported which should be harmless:
Created attachment 143374 [details]
Created attachment 143375 [details]
dcn10_verify_allow_pstate_change_high() source with line numbers
There were some patches recently for Raven that fixed some programming sequence issues during changes for dpms / suspend. I wonder if these would help fix what you're reporting - the pstate warnings usually indicate that something went wrong in the programming sequence and the hardware has potentially hung but it won't always happen consistently.
I think there was another related to this as well that might be in amd-staging-drm-next
If you can still reproduce the issue consistently I'd give this tree a try and see if you still have the issue.
Created attachment 143531 [details]
dmesg from 5.0.0-rc7 with WARNINGs
Thanks Nicholas for looking at this.
I kept running 5.0.0-rc7 for few more days, it seems to be reproducible. I'm going to switch to the amd-staging-drm-next now. I'll provide an update in a week or so.
seeing similar with a Dell Latitude 5495 with AMD Ryzen 5 PRO 2500U:
kernel is 5.2.10
[ 1795.534761] ------------[ cut here ]------------
[ 1795.534791] WARNING: CPU: 7 PID: 765 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:854 dcn10_verify_allow_pstate_change_high.cold+0xc/0x229
[ 1795.534793] Modules linked in: uas usb_storage algif_aead ecb algif_skcipher cmac sha512_ssse3 sha512_generic md4 algif_hash af_alg btusb btrtl btbcm btintel bluetooth ecdh_generic ecc hid_logitech_hidpp uvcvideo videobuf2_vmalloc videobuf2_memops snd_hda_codec_generic snd_hda_codec_hdmi videobuf2_v4l2 videodev videobuf2_common snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core kvm_amd snd_pcm ccp snd_timer snd kvm soundcore irqbypass crc32_pclmul rtsx_pci_sdmmc mmc_core wmi_bmof dell_wmi joydev dell_laptop aesni_intel ledtrig_audio dell_smbios ath10k_pci dell_wmi_descriptor aes_x86_64 crypto_simd dcdbas ath10k_core cryptd glue_helper ath mac80211 psmouse i2c_piix4 k10temp cfg80211 tg3 ucsi_acpi typec_ucsi libphy rtsx_pci typec wmi dell_rbtn dell_smo8800 rfkill i2c_amd_mp2_plat i2c_amd_mp2_pci hid_logitech_dj pkcs8_key_parser xhci_pci xhci_hcd pinctrl_amd i2c_hid efivarfs autofs4
[ 1795.534838] CPU: 7 PID: 765 Comm: Xorg Not tainted 5.2.10 #2
[ 1795.534841] Hardware name: Dell Inc. Latitude 5495/0G9F45, BIOS 1.2.14 05/29/2019
[ 1795.534844] RIP: 0010:dcn10_verify_allow_pstate_change_high.cold+0xc/0x229
[ 1795.534847] Code: 83 c8 ff e9 9e b6 ff ff 48 c7 c7 30 8a 72 af e8 61 8a 95 ff 0f 0b 83 c8 ff e9 88 b6 ff ff 48 c7 c7 30 8a 72 af e8 4b 8a 95 ff <0f> 0b 80 bb 93 01 00 00 00 75 05 e9 f2 db ff ff 48 8b 83 80 02 00
[ 1795.534849] RSP: 0018:ffffac7ec25578c8 EFLAGS: 00010246
[ 1795.534851] RAX: 0000000000000024 RBX: ffff8be1235ef000 RCX: 0000000000000000
[ 1795.534852] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 00000000ffffffff
[ 1795.534855] RBP: ffff8be1235ef000 R08: 000000000000043a R09: 0000000000000033
[ 1795.534856] R10: ffffac7ec2557788 R11: ffffac7ec255778d R12: ffff8be1260d7c00
[ 1795.534858] R13: 0000000000000002 R14: ffff8be1235ef000 R15: ffff8bdf02588000
[ 1795.534860] FS: 00007f06ae67fd80(0000) GS:ffff8be127fc0000(0000) knlGS:0000000000000000
[ 1795.534862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1795.534864] CR2: 00007f6fb5fe8b08 CR3: 00000002180a0000 CR4: 00000000003406e0
[ 1795.534868] Call Trace:
[ 1795.534880] dcn10_prepare_bandwidth+0xff/0x120
[ 1795.534884] dc_commit_updates_for_stream+0xb02/0xc00
[ 1795.534888] amdgpu_dm_atomic_commit_tail+0xa9b/0x1970
[ 1795.534897] ? commit_tail+0x37/0x60
[ 1795.534903] commit_tail+0x37/0x60
[ 1795.534911] drm_atomic_helper_commit+0x103/0x110
[ 1795.534919] drm_mode_obj_set_property_ioctl+0x121/0x2b1
[ 1795.534922] ? drm_mode_obj_find_prop_id+0x40/0x40
[ 1795.534925] drm_ioctl_kernel+0xad/0xf0
[ 1795.534928] drm_ioctl+0x1e6/0x33f
[ 1795.534930] ? drm_mode_obj_find_prop_id+0x40/0x40
[ 1795.534934] amdgpu_drm_ioctl+0x44/0x80
[ 1795.534938] do_vfs_ioctl+0x428/0x6b0
[ 1795.534941] ? __fget+0x6c/0xa0
[ 1795.534944] ksys_ioctl+0x59/0x90
[ 1795.534946] __x64_sys_ioctl+0x11/0x20
[ 1795.534949] do_syscall_64+0x54/0x1c0
[ 1795.534952] ? page_fault+0x8/0x30
[ 1795.534954] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1795.534957] RIP: 0033:0x7f06aed32dc7
[ 1795.534960] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 7d d9 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 00 0d 00 f7 d8 64 89 01 48
[ 1795.534962] RSP: 002b:00007ffcf32eaf68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1795.534964] RAX: ffffffffffffffda RBX: 00007ffcf32eafa0 RCX: 00007f06aed32dc7
[ 1795.534965] RDX: 00007ffcf32eafa0 RSI: 00000000c01864ba RDI: 000000000000000c
[ 1795.534967] RBP: 00000000c01864ba R08: 0000000000000052 R09: 00000000cccccccc
[ 1795.534968] R10: 00005598b65824c4 R11: 0000000000000246 R12: 00005598b5438ed0
[ 1795.534969] R13: 000000000000000c R14: 0000000000000003 R15: 0000000000000fff
[ 1795.534972] ---[ end trace 2954f837eadb53a4 ]---
Do you need more infos?
some more infos: I see the mentioned error in the logs during normal work, but no other problems. When resuming from S3 suspend, the display stays black and I find dozens of those dcn10_verify_allow_pstate_change_high.cold warnings in the log after reboot. Still happens with kernel 5.3-rc6.
For me it's a regression in the 5.2-development. Testing with 5.1-series show no errors. Resume after S3 suspend works without problem.
git bisect points me to
commit df8368be1382b442384507a5147c89978cd60702 (refs/bisect/bad)
Author: Nicholas Kazlauskas <email@example.com>
Date: Wed Feb 27 12:56:36 2019 -0500
drm/amdgpu: Bump amdgpu version for per-flip plane tiling updates
To help xf86-video-amdgpu and mesa know DC supports updating the
tiling attributes for a framebuffer per-flip.
Cc: Michel Dänzer <firstname.lastname@example.org>
Signed-off-by: Nicholas Kazlauskas <email@example.com>
Acked-by: Alex Deucher <firstname.lastname@example.org>
Reviewed-by: Marek Olšák <email@example.com>
Signed-off-by: Alex Deucher <firstname.lastname@example.org>
Does this make any sense?
(In reply to Johannes Hirte from comment #9)
> git bisect points me to
> commit df8368be1382b442384507a5147c89978cd60702 (refs/bisect/bad)
> Author: Nicholas Kazlauskas <email@example.com>
> Date: Wed Feb 27 12:56:36 2019 -0500
> drm/amdgpu: Bump amdgpu version for per-flip plane tiling updates
> To help xf86-video-amdgpu and mesa know DC supports updating the
> tiling attributes for a framebuffer per-flip.
> Cc: Michel Dänzer <firstname.lastname@example.org>
> Signed-off-by: Nicholas Kazlauskas <email@example.com>
> Acked-by: Alex Deucher <firstname.lastname@example.org>
> Reviewed-by: Marek Olšák <email@example.com>
> Signed-off-by: Alex Deucher <firstname.lastname@example.org>
> Does this make any sense?
Yes, this is the commit that enabled mesa and xf86-video-amdgpu to use DCC for scanout.
I recently fixed a bug where these warnings could be generated in some use sequences (notably immediate flipping).
Please try amd-staging-drm-next or apply the following series to your kernel:
With those two patches on top of v5.3-rc6-129-g265381004994 resume from S3 suspend still hangs with a black screen. I've had to hard reset the system, so I can't say for sure, if this is the same bug.
On top of 5.2.11 it doesn't work either. It get even worse. Without the two patches, I can shutdown the system. With both patches applied, the system hangs completely after resume. I have to force it off.
It seems DCC is broken on Raven Ridge. So how about disabling it here, until the problems are solved?
Thread with similar problem
(In reply to peter m from comment #14)
> Thread with similar problem
In my case, screen became black after entering password in welcome screen.
Seems to be fixed now. Tested with v5.3-rc8-7-g3120b9a6a3f7 and resume from S3 Works without problems. Interestingly even v5.3-rc6-129-g265381004994 works now without additional patches.
updated to kernel 5.2.13-200.fc30.x86_64
dmesg prints no more WARNING messages, but screen still black after login screen
Created attachment 145376 [details]
5.2.14 kernel messages
Messages similar to this just overflowed my systemd-journal in a couple of minutes, causing high resource use by journald. An external monitor was attached; I had not noticed any problems apart from the sudden resource use. I have attached a representative (I hope) portion of the log.
Created attachment 145377 [details]
kernel 5.2.14-200 dmesg output