109628 – WARNING at dcn10_hw_sequencer.c:868 dcn10_verify_allow_pstate_change_high()

Bug 109628 - WARNING at dcn10_hw_sequencer.c:868 dcn10_verify_allow_pstate_change_high()

Summary: WARNING at dcn10_hw_sequencer.c:868 dcn10_verify_allow_pstate_change_high()

Status:	RESOLVED MOVED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/AMDgpu (show other bugs)
Version:	XOrg git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Duplicates (1):	111487 (view as bug list)
Depends on:
Blocks:

Reported:	2019-02-14 09:56 UTC by Rafał Miłecki
Modified:	2019-11-19 09:13 UTC (History)
CC List:	6 users (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg (495.88 KB, text/plain) 2019-02-14 09:58 UTC, Rafał Miłecki	no flags	Details
Xorg.0.log (45.82 KB, text/x-log) 2019-02-14 09:59 UTC, Rafał Miłecki	no flags	Details
dcn10_verify_allow_pstate_change_high() source with line numbers (711 bytes, text/plain) 2019-02-14 10:08 UTC, Rafał Miłecki	no flags	Details
dmesg from 5.0.0-rc7 with WARNINGs (191.88 KB, text/plain) 2019-03-05 09:49 UTC, Rafał Miłecki	no flags	Details
5.2.14 kernel messages (4.96 KB, text/plain) 2019-09-16 14:17 UTC, Rohan Lean	no flags	Details
kernel 5.2.14-200 dmesg output (3.33 KB, text/plain) 2019-09-16 15:06 UTC, peter m	no flags	Details
View All

Description Rafał Miłecki 2019-02-14 09:56:49 UTC

I use HP EliteBook 745 G5 with Ryzen 5 PRO 2500U and external monitor BenQ GW2260.

Today after taking ~10 minutes break amdgpu had some problems with enabling my external monitor back (after putting it in sleep mode or something). It took it about half a minute I think.

I checked dmesg immediately and found there a WARNING:
[65984.999696] WARNING: CPU: 6 PID: 2081 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:868 dcn10_verify_allow_pstate_change_high+0x25/0x260 [amdgpu]

I suspect it may be related to the issue I'm reporting.

So far I was using kernels 4.19 and 4.20 (for the last 2 months) and never saw it. A day ago I've switched to the kernel 5.0.0-rc6. It may be either:
1. A regression
2. A very rare bug

Comment 1 Rafał Miłecki 2019-02-14 09:58:53 UTC

Created attachment 143373 [details]
dmesg

I'm attaching my pretty big dmesg. There are many MCE errors reported which should be harmless:
https://bugzilla.kernel.org/show_bug.cgi?id=202005

Comment 2 Rafał Miłecki 2019-02-14 09:59:33 UTC

Created attachment 143374 [details]
Xorg.0.log

Comment 3 Rafał Miłecki 2019-02-14 10:08:57 UTC

Created attachment 143375 [details]
dcn10_verify_allow_pstate_change_high() source with line numbers

Comment 4 Nicholas Kazlauskas 2019-02-14 14:29:33 UTC

There were some patches recently for Raven that fixed some programming sequence issues during changes for dpms / suspend. I wonder if these would help fix what you're reporting - the pstate warnings usually indicate that something went wrong in the programming sequence and the hardware has potentially hung but it won't always happen consistently.

https://patchwork.freedesktop.org/patch/282418/

I think there was another related to this as well that might be in amd-staging-drm-next

https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next

If you can still reproduce the issue consistently I'd give this tree a try and see if you still have the issue.

Comment 5 Rafał Miłecki 2019-03-05 09:49:16 UTC

Created attachment 143531 [details]
dmesg from 5.0.0-rc7 with WARNINGs

Thanks Nicholas for looking at this.

I kept running 5.0.0-rc7 for few more days, it seems to be reproducible. I'm going to switch to the amd-staging-drm-next now. I'll provide an update in a week or so.

Comment 6 Johannes Hirte 2019-08-29 21:54:41 UTC

seeing similar with a Dell Latitude 5495 with AMD Ryzen 5 PRO 2500U:

kernel is 5.2.10

[ 1795.534761] ------------[ cut here ]------------
[ 1795.534791] WARNING: CPU: 7 PID: 765 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:854 dcn10_verify_allow_pstate_change_high.cold+0xc/0x229
[ 1795.534793] Modules linked in: uas usb_storage algif_aead ecb algif_skcipher cmac sha512_ssse3 sha512_generic md4 algif_hash af_alg btusb btrtl btbcm btintel bluetooth ecdh_generic ecc hid_logitech_hidpp uvcvideo videobuf2_vmalloc videobuf2_memops snd_hda_codec_generic snd_hda_codec_hdmi videobuf2_v4l2 videodev videobuf2_common snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core kvm_amd snd_pcm ccp snd_timer snd kvm soundcore irqbypass crc32_pclmul rtsx_pci_sdmmc mmc_core wmi_bmof dell_wmi joydev dell_laptop aesni_intel ledtrig_audio dell_smbios ath10k_pci dell_wmi_descriptor aes_x86_64 crypto_simd dcdbas ath10k_core cryptd glue_helper ath mac80211 psmouse i2c_piix4 k10temp cfg80211 tg3 ucsi_acpi typec_ucsi libphy rtsx_pci typec wmi dell_rbtn dell_smo8800 rfkill i2c_amd_mp2_plat i2c_amd_mp2_pci hid_logitech_dj pkcs8_key_parser xhci_pci xhci_hcd pinctrl_amd i2c_hid efivarfs autofs4
[ 1795.534838] CPU: 7 PID: 765 Comm: Xorg Not tainted 5.2.10 #2
[ 1795.534841] Hardware name: Dell Inc. Latitude 5495/0G9F45, BIOS 1.2.14 05/29/2019
[ 1795.534844] RIP: 0010:dcn10_verify_allow_pstate_change_high.cold+0xc/0x229
[ 1795.534847] Code: 83 c8 ff e9 9e b6 ff ff 48 c7 c7 30 8a 72 af e8 61 8a 95 ff 0f 0b 83 c8 ff e9 88 b6 ff ff 48 c7 c7 30 8a 72 af e8 4b 8a 95 ff <0f> 0b 80 bb 93 01 00 00 00 75 05 e9 f2 db ff ff 48 8b 83 80 02 00
[ 1795.534849] RSP: 0018:ffffac7ec25578c8 EFLAGS: 00010246
[ 1795.534851] RAX: 0000000000000024 RBX: ffff8be1235ef000 RCX: 0000000000000000
[ 1795.534852] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 00000000ffffffff
[ 1795.534855] RBP: ffff8be1235ef000 R08: 000000000000043a R09: 0000000000000033
[ 1795.534856] R10: ffffac7ec2557788 R11: ffffac7ec255778d R12: ffff8be1260d7c00
[ 1795.534858] R13: 0000000000000002 R14: ffff8be1235ef000 R15: ffff8bdf02588000
[ 1795.534860] FS:  00007f06ae67fd80(0000) GS:ffff8be127fc0000(0000) knlGS:0000000000000000
[ 1795.534862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1795.534864] CR2: 00007f6fb5fe8b08 CR3: 00000002180a0000 CR4: 00000000003406e0
[ 1795.534868] Call Trace:
[ 1795.534880]  dcn10_prepare_bandwidth+0xff/0x120
[ 1795.534884]  dc_commit_updates_for_stream+0xb02/0xc00
[ 1795.534888]  amdgpu_dm_atomic_commit_tail+0xa9b/0x1970
[ 1795.534897]  ? commit_tail+0x37/0x60
[ 1795.534903]  commit_tail+0x37/0x60
[ 1795.534911]  drm_atomic_helper_commit+0x103/0x110
[ 1795.534919]  drm_mode_obj_set_property_ioctl+0x121/0x2b1
[ 1795.534922]  ? drm_mode_obj_find_prop_id+0x40/0x40
[ 1795.534925]  drm_ioctl_kernel+0xad/0xf0
[ 1795.534928]  drm_ioctl+0x1e6/0x33f
[ 1795.534930]  ? drm_mode_obj_find_prop_id+0x40/0x40
[ 1795.534934]  amdgpu_drm_ioctl+0x44/0x80
[ 1795.534938]  do_vfs_ioctl+0x428/0x6b0
[ 1795.534941]  ? __fget+0x6c/0xa0
[ 1795.534944]  ksys_ioctl+0x59/0x90
[ 1795.534946]  __x64_sys_ioctl+0x11/0x20
[ 1795.534949]  do_syscall_64+0x54/0x1c0
[ 1795.534952]  ? page_fault+0x8/0x30
[ 1795.534954]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1795.534957] RIP: 0033:0x7f06aed32dc7
[ 1795.534960] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 7d d9 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 00 0d 00 f7 d8 64 89 01 48
[ 1795.534962] RSP: 002b:00007ffcf32eaf68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1795.534964] RAX: ffffffffffffffda RBX: 00007ffcf32eafa0 RCX: 00007f06aed32dc7
[ 1795.534965] RDX: 00007ffcf32eafa0 RSI: 00000000c01864ba RDI: 000000000000000c
[ 1795.534967] RBP: 00000000c01864ba R08: 0000000000000052 R09: 00000000cccccccc
[ 1795.534968] R10: 00005598b65824c4 R11: 0000000000000246 R12: 00005598b5438ed0
[ 1795.534969] R13: 000000000000000c R14: 0000000000000003 R15: 0000000000000fff
[ 1795.534972] ---[ end trace 2954f837eadb53a4 ]---


Do you need more infos?

Comment 7 Johannes Hirte 2019-08-30 10:34:00 UTC

some more infos: I see the mentioned error in the logs during normal work, but no other problems. When resuming from S3 suspend, the display stays black and I find dozens of those dcn10_verify_allow_pstate_change_high.cold warnings in the log after reboot. Still happens with kernel 5.3-rc6.

Comment 8 Johannes Hirte 2019-08-30 11:46:15 UTC

For me it's a regression in the 5.2-development. Testing with 5.1-series show no errors. Resume after S3 suspend works without problem.

Comment 9 Johannes Hirte 2019-08-30 15:01:10 UTC

git bisect points me to 

commit df8368be1382b442384507a5147c89978cd60702 (refs/bisect/bad)
Author: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Date:   Wed Feb 27 12:56:36 2019 -0500

    drm/amdgpu: Bump amdgpu version for per-flip plane tiling updates
    
    To help xf86-video-amdgpu and mesa know DC supports updating the
    tiling attributes for a framebuffer per-flip.
    
    Cc: Michel Dänzer <michel@daenzer.net>
    Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-by: Marek Olšák <marek.olsak@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


Does this make any sense?

Comment 10 Nicholas Kazlauskas 2019-08-30 15:13:11 UTC

(In reply to Johannes Hirte from comment #9)
> git bisect points me to 
> 
> commit df8368be1382b442384507a5147c89978cd60702 (refs/bisect/bad)
> Author: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> Date:   Wed Feb 27 12:56:36 2019 -0500
> 
>     drm/amdgpu: Bump amdgpu version for per-flip plane tiling updates
>     
>     To help xf86-video-amdgpu and mesa know DC supports updating the
>     tiling attributes for a framebuffer per-flip.
>     
>     Cc: Michel Dänzer <michel@daenzer.net>
>     Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>     Acked-by: Alex Deucher <alexander.deucher@amd.com>
>     Reviewed-by: Marek Olšák <marek.olsak@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 
> 
> Does this make any sense?

Yes, this is the commit that enabled mesa and xf86-video-amdgpu to use DCC for scanout.

I recently fixed a bug where these warnings could be generated in some use sequences (notably immediate flipping).

Please try amd-staging-drm-next or apply the following series to your kernel:

https://patchwork.freedesktop.org/series/64614/

Comment 11 Johannes Hirte 2019-08-30 15:43:35 UTC

With those two patches on top of v5.3-rc6-129-g265381004994 resume from S3 suspend still hangs with a black screen. I've had to hard reset the system, so I can't say for sure, if this is the same bug.

Comment 12 Johannes Hirte 2019-08-30 17:13:39 UTC

On top of 5.2.11 it doesn't work either. It get even worse. Without the two patches, I can shutdown the system. With both patches applied, the system hangs completely after resume. I have to force it off.

Comment 13 Johannes Hirte 2019-08-30 23:31:00 UTC

It seems DCC is broken on Raven Ridge. So how about disabling it here, until the problems are solved?

Comment 14 peter m 2019-09-04 18:07:28 UTC

Thread with similar problem

https://bugs.freedesktop.org/show_bug.cgi?id=111459

Comment 15 peter m 2019-09-04 18:10:19 UTC

(In reply to peter m from comment #14)
> Thread with similar problem
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=111459

In my case, screen became black after entering password in welcome screen.

Comment 16 Johannes Hirte 2019-09-10 15:34:25 UTC

Seems to be fixed now. Tested with v5.3-rc8-7-g3120b9a6a3f7 and resume from S3 Works without problems. Interestingly even v5.3-rc6-129-g265381004994 works now without additional patches.

Comment 17 peter m 2019-09-11 17:25:53 UTC

updated to kernel 5.2.13-200.fc30.x86_64

dmesg prints no more WARNING messages, but screen still black after login screen

Comment 18 Rohan Lean 2019-09-16 14:17:10 UTC

Created attachment 145376 [details]
5.2.14 kernel messages

Messages similar to this just overflowed my systemd-journal in a couple of minutes, causing high resource use by journald.  An external monitor was attached; I had not noticed any problems apart from the sudden resource use.  I have attached a representative (I hope) portion of the log.

Comment 19 peter m 2019-09-16 15:06:08 UTC

Created attachment 145377 [details]
kernel 5.2.14-200 dmesg output

Comment 20 John Smith 2019-09-19 17:49:01 UTC

Still seeing the warning with 5.4.0-0.rc0.git2.2.fc32.x86_64; waking up doesn't work. This is fedora kernel though and there's a possibility those patches aren't integrated there yet; is there a way to check?

Comment 21 Johannes Hirte 2019-09-20 15:57:09 UTC

(In reply to John Smith from comment #20)
> Still seeing the warning with 5.4.0-0.rc0.git2.2.fc32.x86_64; waking up
> doesn't work. This is fedora kernel though and there's a possibility those
> patches aren't integrated there yet; is there a way to check?

Any possibility to test with 5.3 kernel? It seems it's fixed but not backported.

Comment 22 John Smith 2019-09-22 09:02:27 UTC

(In reply to Johannes Hirte from comment #21)
> Any possibility to test with 5.3 kernel? It seems it's fixed but not
> backported.

If I'm understanding it correctly, backported means it already should be in 5.4, no?

Comment 23 vlad 2019-09-22 09:05:41 UTC

I had the same problem with Ryzen 2400G on kernels 5.2, 5.3 and 5.4, but it would only be reproduced when X was running. If I stop X before going to sleep, wakeup would work. I managed to fix it by reverting the following commit in X driver: https://github.com/freedesktop/xorg-xf86-video-amdgpu/commit/a2b32e72fdaff3007a79b84929997d8176c2d512

Comment 24 towo 2019-09-22 20:13:17 UTC

Same problem here on my Ideapad 330

Machine:   Type: Laptop System: LENOVO product: 81D2 v: Lenovo ideapad 330-15ARR serial: <root required> 
           Mobo: LENOVO model: LNVNB161216 v: SDK0J40709 WIN serial: <root required> UEFI: LENOVO v: 7VCN47WW date: 04/25/2019 
Graphics:  Device-1: AMD Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] driver: amdgpu v: kernel 
           Display: server: X.org 1.20.4 driver: amdgpu,ati unloaded: fbdev,modesetting,vesa tty: 211x40

I have not used suspend. Lightdm starts without problem.
Cinnamon is working fine but XFCE4 ends up with that kernel oops and black screen.
I have found out, that the compositor in xfwm4 is the culprint, if i disable that compositor, xfce is running fine. Even if i use compton as compositor, xfce is starting fine.

Then i also reveted 

https://github.com/freedesktop/xorg-xf86-video-amdgpu/commit/a2b32e72fdaff3007a79b84929997d8176c2d512

and now xfce is running without problem/black screen with internal compositor activated.

Comment 25 devbazilio 2019-09-26 08:37:04 UTC

I have the same problem with kernel 5.3.1 - after opening of lid cower after suspen the screen is black, however I can ssh to laptop

CPU/GPU AMD Ryzen 3 2300U with Radeon Vega 6 Mobile Gfx

Please let me know if you need more help or debugging


[drm] pstate TEST_DEBUG_DATA: 0xB7F60000
------------[ cut here ]------------
WARNING: CPU: 3 PID: 50 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:932 dcn10_verify_allow_pstate_change_high.cold+0xc/0x229 [amdgpu]
Modules linked in: rfcomm mousedev edac_mce_amd kvm_amd ccp rng_core kvm irqbypass cmac algif_hash algif_skcipher af_alg bnep crct10dif_pclmul crc32_pclmul ghash_clmulni_intel hp_wmi wmi_b>
 i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart
CPU: 3 PID: 50 Comm: kworker/u32:1 Tainted: G           OE     5.3.1-arch1-1-ARCH #1
Hardware name: HP HP Pavilion Laptop 15-cw0xxx/84E7, BIOS F.34 07/31/2019
Workqueue: events_unbound commit_work [drm_kms_helper]
RIP: 0010:dcn10_verify_allow_pstate_change_high.cold+0xc/0x229 [amdgpu]
Code: 83 c8 ff e9 c1 ee f7 ff 48 c7 c7 18 fa 6c c0 e8 62 d4 4e f9 0f 0b 83 c8 ff e9 ab ee f7 ff 48 c7 c7 18 fa 6c c0 e8 4c d4 4e f9 <0f> 0b 80 bb 9f 01 00 00 00 75 05 e9 93 17 f8 ff 48 8b >
RSP: 0018:ffffb30b002d3a80 EFLAGS: 00010246
RAX: 0000000000000024 RBX: ffff9b7d149d0000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000082 RDI: 00000000ffffffff
RBP: ffff9b7d149d0000 R08: 0000000000000706 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b7d15d101b8
R13: ffff9b7d15d113f8 R14: ffff9b7d15d101b8 R15: 0000000000000004
FS:  0000000000000000(0000) GS:ffff9b7d1b2c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffcc82adca8 CR3: 0000000076132000 CR4: 00000000003406e0
Call Trace:
 dcn10_pipe_control_lock.part.0+0x69/0x70 [amdgpu]
 dc_commit_updates_for_stream+0xec8/0x1390 [amdgpu]
 ? _raw_spin_lock+0x13/0x30
 amdgpu_dm_atomic_commit_tail+0x12a6/0x1d00 [amdgpu]
 ? commit_tail+0x3c/0x70 [drm_kms_helper]
 commit_tail+0x3c/0x70 [drm_kms_helper]
 process_one_work+0x1d1/0x3a0
 worker_thread+0x4a/0x3d0
 kthread+0xfb/0x130
 ? process_one_work+0x3a0/0x3a0
 ? kthread_park+0x80/0x80
 ret_from_fork+0x22/0x40
---[ end trace 3a22aec33a206936 ]---
[drm] pstate TEST_DEBUG_DATA: 0x37F60000

Comment 26 Mirek Kratochvil 2019-10-06 19:38:26 UTC

Hello everyone,

reporting the same on thinkpad E585; blackscreen issue is triggered by starting X with compositor (lightdm works okay though).

Attaching my warning + backtrace + TEST_DEBUG_DATA from debian kernel, just for completeness. (I also tried 5.3 kernel with basically same result)

[   12.766637] [drm] pstate TEST_DEBUG_DATA: 0x37F60000
[   12.766640] ------------[ cut here ]------------
[   12.766844] WARNING: CPU: 5 PID: 1474 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:854 dcn10_verify_allow_pstate_change_high+0x30/0x40 [amdgpu]
[   12.766845] Modules linked in: psnap llc overlay snd_hrtimer snd_seq snd_seq_device cpufreq_userspace cpufreq_powersave cpufreq_conservative arc4 uinput edac_mce_amd kvm_amd ccp rng_core kvm irqbypass crct10dif_pclmul crc32_pclmul binfmt_misc ghash_clmulni_intel nls_ascii nls_cp437 vfat fat btusb btrtl btbcm btintel aesni_intel bluetooth efi_pstore aes_x86_64 crypto_simd cryptd glue_helper ath10k_pci ath10k_core uvcvideo videobuf2_vmalloc ath videobuf2_memops joydev drbg videobuf2_v4l2 efivars videobuf2_common serio_raw mac80211 videodev snd_hda_codec_conexant snd_hda_codec_generic snd_hda_codec_hdmi media wmi_bmof ansi_cprng sp5100_tco ecdh_generic snd_hda_intel k10temp ecc crc16 sg snd_hda_codec watchdog cfg80211 snd_hda_core snd_hwdep snd_pcm snd_timer thinkpad_acpi nvram ledtrig_audio snd ucsi_acpi typec_ucsi soundcore typec rfkill battery ac evdev pcc_cpufreq acpi_cpufreq loop parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 xfs libcrc32c crc32c_generic sd_mod amdgpu
[   12.766899]  gpu_sched i2c_algo_bit ttm ahci libahci drm_kms_helper xhci_pci libata xhci_hcd crc32c_intel psmouse drm sdhci_pci scsi_mod usbcore cqhci sdhci r8169 i2c_piix4 mmc_core nvme usb_common realtek mfd_core libphy nvme_core wmi video i2c_scmi button
[   12.766925] CPU: 5 PID: 1474 Comm: Xorg Not tainted 5.2.0-3-amd64 #1 Debian 5.2.17-1
[   12.766926] Hardware name: LENOVO 20KV000DMC/20KV000DMC, BIOS R0UET74W (1.54 ) 07/23/2019
[   12.767051] RIP: 0010:dcn10_verify_allow_pstate_change_high+0x30/0x40 [amdgpu]
[   12.767054] Code: 53 48 8b 87 80 02 00 00 48 89 fb 48 8b b8 b0 01 00 00 e8 63 21 01 00 84 c0 74 03 5b 5d c3 48 c7 c7 38 f8 81 c0 e8 ce 9c 1c d6 <0f> 0b 80 bb 93 01 00 00 00 74 e6 e9 1c 23 06 00 0f 1f 44 00 00 41
[   12.767056] RSP: 0018:ffffb890425bb7d0 EFLAGS: 00010246
[   12.767059] RAX: 0000000000000024 RBX: ffff9cd1aee06000 RCX: 0000000000000006
[   12.767061] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9cd1bef57680
[   12.767062] RBP: 0000000000000001 R08: 00000000000003e0 R09: 0000000000000004
[   12.767064] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9cd1b54081b8
[   12.767065] R13: ffff9cd1b5409bc8 R14: ffff9cd1b54081b8 R15: ffff9cd1afa01000
[   12.767068] FS:  00007fe2f471cf00(0000) GS:ffff9cd1bef40000(0000) knlGS:0000000000000000
[   12.767070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.767072] CR2: 00007fe2dc096000 CR3: 0000000430576000 CR4: 00000000003406e0
[   12.767073] Call Trace:
[   12.767203]  dcn10_pipe_control_lock.part.20+0x6a/0x70 [amdgpu]
[   12.767320]  dc_stream_set_cursor_attributes+0x11f/0x170 [amdgpu]
[   12.767450]  handle_cursor_update.isra.49+0x1b2/0x310 [amdgpu]
[   12.767580]  amdgpu_dm_commit_cursors.isra.50+0x5b/0x70 [amdgpu]
[   12.767712]  amdgpu_dm_atomic_commit_tail+0x146f/0x1960 [amdgpu]
[   12.767719]  ? _cond_resched+0x15/0x30
[   12.767725]  ? kmem_cache_alloc_trace+0x146/0x1c0
[   12.767736]  ? ttm_bo_validate+0x37/0x130 [ttm]
[   12.767838]  ? amdgpu_bo_pin_restricted+0x23d/0x270 [amdgpu]
[   12.767842]  ? _cond_resched+0x15/0x30
[   12.767847]  ? wait_for_completion_timeout+0x3b/0x1a0
[   12.767851]  ? refcount_inc_checked+0x5/0x30
[   12.767947]  ? amdgpu_bo_ref+0x17/0x20 [amdgpu]
[   12.768075]  ? dm_plane_helper_prepare_fb+0x126/0x300 [amdgpu]
[   12.768091]  ? commit_tail+0x3d/0x70 [drm_kms_helper]
[   12.768103]  commit_tail+0x3d/0x70 [drm_kms_helper]
[   12.768117]  drm_atomic_helper_commit+0xb4/0x120 [drm_kms_helper]
[   12.768129]  drm_atomic_helper_update_plane+0xf1/0x110 [drm_kms_helper]
[   12.768157]  drm_mode_cursor_universal+0x143/0x260 [drm]
[   12.768163]  ? __switch_to+0x147/0x3e0
[   12.768189]  drm_mode_cursor_common+0xc9/0x220 [drm]
[   12.768214]  ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
[   12.768234]  drm_ioctl_kernel+0xac/0xf0 [drm]
[   12.768257]  drm_ioctl+0x201/0x3a0 [drm]
[   12.768282]  ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
[   12.768369]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[   12.768375]  do_vfs_ioctl+0xa4/0x630
[   12.768379]  ksys_ioctl+0x60/0x90
[   12.768383]  ? ksys_read+0x99/0xd0
[   12.768385]  __x64_sys_ioctl+0x16/0x20
[   12.768390]  do_syscall_64+0x53/0x130
[   12.768394]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   12.768398] RIP: 0033:0x7fe2f4c5f5d7
[   12.768401] Code: 00 00 90 48 8b 05 b9 78 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 78 0c 00 f7 d8 64 89 01 48
[   12.768402] RSP: 002b:00007ffceac65e78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   12.768406] RAX: ffffffffffffffda RBX: 0000560127c7ce80 RCX: 00007fe2f4c5f5d7
[   12.768407] RDX: 00007ffceac65ec0 RSI: 00000000c02464bb RDI: 000000000000000d
[   12.768408] RBP: 00007ffceac65ec0 R08: 0000000000000001 R09: 0000000000003fff
[   12.768410] R10: 000000000000007f R11: 0000000000000246 R12: 00000000c02464bb
[   12.768411] R13: 000000000000000d R14: 0000000000000004 R15: 00005601280f7db0
[   12.768415] ---[ end trace d4348d0b513dc5d0 ]---

I will try the workaround in the X driver and see.

Is there any preliminary guess/chance for an official fix in kernel or in xorg?

Thanks,
-mk

Comment 27 Mirek Kratochvil 2019-10-06 20:13:37 UTC

Confirming the posted X driver workaround fixes it on 2700U. Debian 5.2.* kernels and vanilla 5.3.1 work perfectly now.

Anyway, the latest X driver from git is broken as well. Should the issue be reported there, or is it better to fix it in kernel layer?

Thanks again,
-mk

Comment 28 Johannes Hirte 2019-10-07 05:59:58 UTC

(In reply to John Smith from comment #22)
> (In reply to Johannes Hirte from comment #21)
> > Any possibility to test with 5.3 kernel? It seems it's fixed but not
> > backported.
> 
> If I'm understanding it correctly, backported means it already should be in
> 5.4, no?

You're right, I've read it wrong. 

From the reports, it seems to be compositor related. For me, kwin with OpenGL 3.1 backend works fine. xfwm4 seems to trigger the bug, maybe other compositors too.

Comment 29 Michel Dänzer 2019-10-07 08:16:37 UTC

(In reply to Mirek Kratochvil from comment #27)
> Anyway, the latest X driver from git is broken as well. Should the issue be
> reported there, or is it better to fix it in kernel layer?

It should be fixed in the kernel, since the xf86-video-amdgpu change in question is already in the 19.0 releases.

Comment 30 peter m 2019-10-16 17:31:42 UTC

kernel 5.3.5-200.fc30.x86_64 (mockbuild@bkernel04.phx2.fedoraproject.org)
xfce - 4.14

kernel still crushing

------------[ cut here ]------------
WARNING: CPU: 2 PID: 184 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:932 dcn10_verify_allow_pstate_change_high.cold+0xc/0x229 [amdgpu]
Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sunrpc snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel edac_mce_amd snd_hda_codec snd_hda_core ccp kvm snd_hwdep snd_seq snd_seq_device irqbypass snd_pcm snd_timer joydev snd soundcore crct10dif_pclmul crc32_pclmul ghash_clmulni_intel wmi_bmof sp5100_tco k10temp i2c_piix4 gpio_amdpt gpio_generic acpi_cpufreq amdgpu amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper crc32c_intel drm r8169 wmi video pinctrl_amd
CPU: 2 PID: 184 Comm: kworker/u32:8 Not tainted 5.3.5-200.fc30.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. A320M-S2H V2/A320M-S2H V2-CF, BIOS F2 12/25/2018
Workqueue: events_unbound commit_work [drm_kms_helper]
RIP: 0010:dcn10_verify_allow_pstate_change_high.cold+0xc/0x229 [amdgpu]
Code: 83 c8 ff e9 00 1c f8 ff 48 c7 c7 70 6b 5a c0 e8 71 09 c6 de 0f 0b 83 c8 ff e9 ea 1b f8 ff 48 c7 c7 70 6b 5a c0 e8 5b 09 c6 de <0f> 0b 80 bb 9f 01 00 00 00 75 05 e9 d2 42 f8 ff 48 8b 83 f8 02 00
RSP: 0018:ffffbc69803c7af0 EFLAGS: 00010246
RAX: 0000000000000024 RBX: ffff96890d220000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff968918a97900
RBP: ffff96890d220000 R08: 0000000000000001 R09: 0000000000000401
R10: 00000000000169fc R11: 0000000000000003 R12: ffff9688f09501b8
R13: ffff96890b32f800 R14: 0000000000000001 R15: 0000000000000004
FS:  0000000000000000(0000) GS:ffff968918a80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9eba01c000 CR3: 00000001e4092000 CR4: 00000000003406e0
Call Trace:
 dcn10_pipe_control_lock.part.0+0x69/0x70 [amdgpu]
 dc_commit_updates_for_stream+0xfa5/0x1460 [amdgpu]
 amdgpu_dm_atomic_commit_tail+0xb61/0x1c40 [amdgpu]
 ? cpumask_next_and+0x1a/0x20
 ? load_balance+0x1a4/0xb50
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? _cond_resched+0x15/0x30
 ? wait_for_completion_timeout+0x38/0x170
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to+0x152/0x440
 ? commit_tail+0x3c/0x70 [drm_kms_helper]
 commit_tail+0x3c/0x70 [drm_kms_helper]
 process_one_work+0x19d/0x340
 worker_thread+0x50/0x3b0
 kthread+0xfb/0x130
 ? process_one_work+0x340/0x340
 ? kthread_park+0x80/0x80
 ret_from_fork+0x22/0x40
---[ end trace 76057b23d3d7f433 ]---
[drm] pstate TEST_DEBUG_DATA: 0x36F60000
------------[ cut here ]------------

Comment 31 Jiri Slaby 2019-11-04 10:31:15 UTC

*** Bug 111487 has been marked as a duplicate of this bug. ***

Comment 32 peter m 2019-11-05 17:42:17 UTC

updated kernel to 5.3.8-200.fc30, problem still exists

Comment 33 Chris Snook 2019-11-10 05:34:55 UTC

(In reply to Johannes Hirte from comment #28)
> From the reports, it seems to be compositor related. For me, kwin with
> OpenGL 3.1 backend works fine. xfwm4 seems to trigger the bug, maybe other
> compositors too.

Confirmed that the workaround of switching to the kwin OpenGL 3.1 compositor works for me.

kernel: 5.3.0-19-generic (Ubuntu 19.10)
window manager: kwin-x11 4:5.16.5-0ubuntu1
CPU: Ryzen Pro 2500U
machine: Lenovo Thinkpad A485
X server: xserver-xorg 1:7.7+19ubuntu12
userspace driver: xserver-xorg-video-amdgpu 19.0.1-1ubuntu1

I'm happy to test patches or reproduce.

Comment 34 Chris Snook 2019-11-10 06:20:26 UTC

(In reply to Chris Snook from comment #33)
> (In reply to Johannes Hirte from comment #28)
> > From the reports, it seems to be compositor related. For me, kwin with
> > OpenGL 3.1 backend works fine. xfwm4 seems to trigger the bug, maybe other
> > compositors too.
> 
> Confirmed that the workaround of switching to the kwin OpenGL 3.1 compositor
> works for me.
> 
> kernel: 5.3.0-19-generic (Ubuntu 19.10)
> window manager: kwin-x11 4:5.16.5-0ubuntu1
> CPU: Ryzen Pro 2500U
> machine: Lenovo Thinkpad A485
> X server: xserver-xorg 1:7.7+19ubuntu12
> userspace driver: xserver-xorg-video-amdgpu 19.0.1-1ubuntu1
> 
> I'm happy to test patches or reproduce.

I may have spoken too soon. I'm non-deterministically experiencing the basic symptom of hang on resume with a blank screen, sometimes with the backlight on and sometimes without, but I no longer get the traceback in logs, so I can't tell if it's mostly the same bug but without tripping the failure mode that causes it to log, or if there's an unrelated suspend/resume bug. Switching to the OpenGL 3.1 compositor has definitely made that error message stop appearing in my logs though.

Comment 35 peter m 2019-11-14 16:27:48 UTC

updated kernel to 5.3.11-200.fc30, problem still exists

Comment 36 Martin Peres 2019-11-19 09:13:24 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/695.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.