Bug 109628 - WARNING at dcn10_hw_sequencer.c:868 dcn10_verify_allow_pstate_change_high()
Summary: WARNING at dcn10_hw_sequencer.c:868 dcn10_verify_allow_pstate_change_high()
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-14 09:56 UTC by Rafał Miłecki
Modified: 2019-09-16 15:06 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (495.88 KB, text/plain)
2019-02-14 09:58 UTC, Rafał Miłecki
no flags Details
Xorg.0.log (45.82 KB, text/x-log)
2019-02-14 09:59 UTC, Rafał Miłecki
no flags Details
dcn10_verify_allow_pstate_change_high() source with line numbers (711 bytes, text/plain)
2019-02-14 10:08 UTC, Rafał Miłecki
no flags Details
dmesg from 5.0.0-rc7 with WARNINGs (191.88 KB, text/plain)
2019-03-05 09:49 UTC, Rafał Miłecki
no flags Details
5.2.14 kernel messages (4.96 KB, text/plain)
2019-09-16 14:17 UTC, Rohan Lean
no flags Details
kernel 5.2.14-200 dmesg output (3.33 KB, text/plain)
2019-09-16 15:06 UTC, peter m
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rafał Miłecki 2019-02-14 09:56:49 UTC
I use HP EliteBook 745 G5 with Ryzen 5 PRO 2500U and external monitor BenQ GW2260.

Today after taking ~10 minutes break amdgpu had some problems with enabling my external monitor back (after putting it in sleep mode or something). It took it about half a minute I think.

I checked dmesg immediately and found there a WARNING:
[65984.999696] WARNING: CPU: 6 PID: 2081 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:868 dcn10_verify_allow_pstate_change_high+0x25/0x260 [amdgpu]

I suspect it may be related to the issue I'm reporting.

So far I was using kernels 4.19 and 4.20 (for the last 2 months) and never saw it. A day ago I've switched to the kernel 5.0.0-rc6. It may be either:
1. A regression
2. A very rare bug
Comment 1 Rafał Miłecki 2019-02-14 09:58:53 UTC
Created attachment 143373 [details]
dmesg

I'm attaching my pretty big dmesg. There are many MCE errors reported which should be harmless:
https://bugzilla.kernel.org/show_bug.cgi?id=202005
Comment 2 Rafał Miłecki 2019-02-14 09:59:33 UTC
Created attachment 143374 [details]
Xorg.0.log
Comment 3 Rafał Miłecki 2019-02-14 10:08:57 UTC
Created attachment 143375 [details]
dcn10_verify_allow_pstate_change_high() source with line numbers
Comment 4 Nicholas Kazlauskas 2019-02-14 14:29:33 UTC
There were some patches recently for Raven that fixed some programming sequence issues during changes for dpms / suspend. I wonder if these would help fix what you're reporting - the pstate warnings usually indicate that something went wrong in the programming sequence and the hardware has potentially hung but it won't always happen consistently.

https://patchwork.freedesktop.org/patch/282418/

I think there was another related to this as well that might be in amd-staging-drm-next

https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next

If you can still reproduce the issue consistently I'd give this tree a try and see if you still have the issue.
Comment 5 Rafał Miłecki 2019-03-05 09:49:16 UTC
Created attachment 143531 [details]
dmesg from 5.0.0-rc7 with WARNINGs

Thanks Nicholas for looking at this.

I kept running 5.0.0-rc7 for few more days, it seems to be reproducible. I'm going to switch to the amd-staging-drm-next now. I'll provide an update in a week or so.
Comment 6 Johannes Hirte 2019-08-29 21:54:41 UTC
seeing similar with a Dell Latitude 5495 with AMD Ryzen 5 PRO 2500U:

kernel is 5.2.10

[ 1795.534761] ------------[ cut here ]------------
[ 1795.534791] WARNING: CPU: 7 PID: 765 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:854 dcn10_verify_allow_pstate_change_high.cold+0xc/0x229
[ 1795.534793] Modules linked in: uas usb_storage algif_aead ecb algif_skcipher cmac sha512_ssse3 sha512_generic md4 algif_hash af_alg btusb btrtl btbcm btintel bluetooth ecdh_generic ecc hid_logitech_hidpp uvcvideo videobuf2_vmalloc videobuf2_memops snd_hda_codec_generic snd_hda_codec_hdmi videobuf2_v4l2 videodev videobuf2_common snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core kvm_amd snd_pcm ccp snd_timer snd kvm soundcore irqbypass crc32_pclmul rtsx_pci_sdmmc mmc_core wmi_bmof dell_wmi joydev dell_laptop aesni_intel ledtrig_audio dell_smbios ath10k_pci dell_wmi_descriptor aes_x86_64 crypto_simd dcdbas ath10k_core cryptd glue_helper ath mac80211 psmouse i2c_piix4 k10temp cfg80211 tg3 ucsi_acpi typec_ucsi libphy rtsx_pci typec wmi dell_rbtn dell_smo8800 rfkill i2c_amd_mp2_plat i2c_amd_mp2_pci hid_logitech_dj pkcs8_key_parser xhci_pci xhci_hcd pinctrl_amd i2c_hid efivarfs autofs4
[ 1795.534838] CPU: 7 PID: 765 Comm: Xorg Not tainted 5.2.10 #2
[ 1795.534841] Hardware name: Dell Inc. Latitude 5495/0G9F45, BIOS 1.2.14 05/29/2019
[ 1795.534844] RIP: 0010:dcn10_verify_allow_pstate_change_high.cold+0xc/0x229
[ 1795.534847] Code: 83 c8 ff e9 9e b6 ff ff 48 c7 c7 30 8a 72 af e8 61 8a 95 ff 0f 0b 83 c8 ff e9 88 b6 ff ff 48 c7 c7 30 8a 72 af e8 4b 8a 95 ff <0f> 0b 80 bb 93 01 00 00 00 75 05 e9 f2 db ff ff 48 8b 83 80 02 00
[ 1795.534849] RSP: 0018:ffffac7ec25578c8 EFLAGS: 00010246
[ 1795.534851] RAX: 0000000000000024 RBX: ffff8be1235ef000 RCX: 0000000000000000
[ 1795.534852] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 00000000ffffffff
[ 1795.534855] RBP: ffff8be1235ef000 R08: 000000000000043a R09: 0000000000000033
[ 1795.534856] R10: ffffac7ec2557788 R11: ffffac7ec255778d R12: ffff8be1260d7c00
[ 1795.534858] R13: 0000000000000002 R14: ffff8be1235ef000 R15: ffff8bdf02588000
[ 1795.534860] FS:  00007f06ae67fd80(0000) GS:ffff8be127fc0000(0000) knlGS:0000000000000000
[ 1795.534862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1795.534864] CR2: 00007f6fb5fe8b08 CR3: 00000002180a0000 CR4: 00000000003406e0
[ 1795.534868] Call Trace:
[ 1795.534880]  dcn10_prepare_bandwidth+0xff/0x120
[ 1795.534884]  dc_commit_updates_for_stream+0xb02/0xc00
[ 1795.534888]  amdgpu_dm_atomic_commit_tail+0xa9b/0x1970
[ 1795.534897]  ? commit_tail+0x37/0x60
[ 1795.534903]  commit_tail+0x37/0x60
[ 1795.534911]  drm_atomic_helper_commit+0x103/0x110
[ 1795.534919]  drm_mode_obj_set_property_ioctl+0x121/0x2b1
[ 1795.534922]  ? drm_mode_obj_find_prop_id+0x40/0x40
[ 1795.534925]  drm_ioctl_kernel+0xad/0xf0
[ 1795.534928]  drm_ioctl+0x1e6/0x33f
[ 1795.534930]  ? drm_mode_obj_find_prop_id+0x40/0x40
[ 1795.534934]  amdgpu_drm_ioctl+0x44/0x80
[ 1795.534938]  do_vfs_ioctl+0x428/0x6b0
[ 1795.534941]  ? __fget+0x6c/0xa0
[ 1795.534944]  ksys_ioctl+0x59/0x90
[ 1795.534946]  __x64_sys_ioctl+0x11/0x20
[ 1795.534949]  do_syscall_64+0x54/0x1c0
[ 1795.534952]  ? page_fault+0x8/0x30
[ 1795.534954]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1795.534957] RIP: 0033:0x7f06aed32dc7
[ 1795.534960] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 7d d9 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 00 0d 00 f7 d8 64 89 01 48
[ 1795.534962] RSP: 002b:00007ffcf32eaf68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1795.534964] RAX: ffffffffffffffda RBX: 00007ffcf32eafa0 RCX: 00007f06aed32dc7
[ 1795.534965] RDX: 00007ffcf32eafa0 RSI: 00000000c01864ba RDI: 000000000000000c
[ 1795.534967] RBP: 00000000c01864ba R08: 0000000000000052 R09: 00000000cccccccc
[ 1795.534968] R10: 00005598b65824c4 R11: 0000000000000246 R12: 00005598b5438ed0
[ 1795.534969] R13: 000000000000000c R14: 0000000000000003 R15: 0000000000000fff
[ 1795.534972] ---[ end trace 2954f837eadb53a4 ]---


Do you need more infos?
Comment 7 Johannes Hirte 2019-08-30 10:34:00 UTC
some more infos: I see the mentioned error in the logs during normal work, but no other problems. When resuming from S3 suspend, the display stays black and I find dozens of those dcn10_verify_allow_pstate_change_high.cold warnings in the log after reboot. Still happens with kernel 5.3-rc6.
Comment 8 Johannes Hirte 2019-08-30 11:46:15 UTC
For me it's a regression in the 5.2-development. Testing with 5.1-series show no errors. Resume after S3 suspend works without problem.
Comment 9 Johannes Hirte 2019-08-30 15:01:10 UTC
git bisect points me to 

commit df8368be1382b442384507a5147c89978cd60702 (refs/bisect/bad)
Author: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Date:   Wed Feb 27 12:56:36 2019 -0500

    drm/amdgpu: Bump amdgpu version for per-flip plane tiling updates
    
    To help xf86-video-amdgpu and mesa know DC supports updating the
    tiling attributes for a framebuffer per-flip.
    
    Cc: Michel Dänzer <michel@daenzer.net>
    Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-by: Marek Olšák <marek.olsak@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


Does this make any sense?
Comment 10 Nicholas Kazlauskas 2019-08-30 15:13:11 UTC
(In reply to Johannes Hirte from comment #9)
> git bisect points me to 
> 
> commit df8368be1382b442384507a5147c89978cd60702 (refs/bisect/bad)
> Author: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> Date:   Wed Feb 27 12:56:36 2019 -0500
> 
>     drm/amdgpu: Bump amdgpu version for per-flip plane tiling updates
>     
>     To help xf86-video-amdgpu and mesa know DC supports updating the
>     tiling attributes for a framebuffer per-flip.
>     
>     Cc: Michel Dänzer <michel@daenzer.net>
>     Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>     Acked-by: Alex Deucher <alexander.deucher@amd.com>
>     Reviewed-by: Marek Olšák <marek.olsak@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 
> 
> Does this make any sense?

Yes, this is the commit that enabled mesa and xf86-video-amdgpu to use DCC for scanout.

I recently fixed a bug where these warnings could be generated in some use sequences (notably immediate flipping).

Please try amd-staging-drm-next or apply the following series to your kernel:

https://patchwork.freedesktop.org/series/64614/
Comment 11 Johannes Hirte 2019-08-30 15:43:35 UTC
With those two patches on top of v5.3-rc6-129-g265381004994 resume from S3 suspend still hangs with a black screen. I've had to hard reset the system, so I can't say for sure, if this is the same bug.
Comment 12 Johannes Hirte 2019-08-30 17:13:39 UTC
On top of 5.2.11 it doesn't work either. It get even worse. Without the two patches, I can shutdown the system. With both patches applied, the system hangs completely after resume. I have to force it off.
Comment 13 Johannes Hirte 2019-08-30 23:31:00 UTC
It seems DCC is broken on Raven Ridge. So how about disabling it here, until the problems are solved?
Comment 14 peter m 2019-09-04 18:07:28 UTC
Thread with similar problem

https://bugs.freedesktop.org/show_bug.cgi?id=111459
Comment 15 peter m 2019-09-04 18:10:19 UTC
(In reply to peter m from comment #14)
> Thread with similar problem
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=111459

In my case, screen became black after entering password in welcome screen.
Comment 16 Johannes Hirte 2019-09-10 15:34:25 UTC
Seems to be fixed now. Tested with v5.3-rc8-7-g3120b9a6a3f7 and resume from S3 Works without problems. Interestingly even v5.3-rc6-129-g265381004994 works now without additional patches.
Comment 17 peter m 2019-09-11 17:25:53 UTC
updated to kernel 5.2.13-200.fc30.x86_64

dmesg prints no more WARNING messages, but screen still black after login screen
Comment 18 Rohan Lean 2019-09-16 14:17:10 UTC
Created attachment 145376 [details]
5.2.14 kernel messages

Messages similar to this just overflowed my systemd-journal in a couple of minutes, causing high resource use by journald.  An external monitor was attached; I had not noticed any problems apart from the sudden resource use.  I have attached a representative (I hope) portion of the log.
Comment 19 peter m 2019-09-16 15:06:08 UTC
Created attachment 145377 [details]
kernel 5.2.14-200 dmesg output


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.