Bug 105018 - Kernel panic when waking up after screen goes blank.
Summary: Kernel panic when waking up after screen goes blank.
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2018-02-09 01:13 UTC by L.S.S.
Modified: 2018-12-10 18:13 UTC (History)
8 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg output with amdgpu.dc_log=1 and drm.debug=6, right after login. (355.62 KB, text/plain)
2018-02-13 03:30 UTC, L.S.S.
no flags Details
Patch 1 Use crtc enable/disable_vblank hooks (1.78 KB, patch)
2018-02-13 16:13 UTC, Harry Wentland
no flags Details | Splinter Review
Patch 2 Return success when enabling interrupt (2.54 KB, patch)
2018-02-13 16:13 UTC, Harry Wentland
no flags Details | Splinter Review
Patch 3 Clean up formatting in irq_service_dce110.c (4.54 KB, patch)
2018-02-13 16:13 UTC, Harry Wentland
no flags Details | Splinter Review
Patch 4 Don't blow up if TG is NULL in dce110_vblank_set (1.05 KB, patch)
2018-02-13 16:14 UTC, Harry Wentland
no flags Details | Splinter Review
Patch 2 Return success when enabling interrupt (2.54 KB, patch)
2018-02-13 18:40 UTC, Harry Wentland
no flags Details | Splinter Review
stacktrace even with patches (4.94 KB, text/plain)
2018-02-15 20:59 UTC, Ainola
no flags Details

Description L.S.S. 2018-02-09 01:13:44 UTC
I'm currently running on latest Manjaro XFCE with the 4.15 kernel just released, and I found that the system would crash when trying to wake up after the screen went blank.

The system is an AMD Laptop (ASUS ROG STRIX GL702ZC), and the problem is 100% reproducible with the following steps:

- Lock the screen, leave the screen blank for at least 3-5 minutes.
- Try wake the screen up, like moving the mouse cursor.

At first I did not find the cause, but after looking into the journalctl I was able to find something that appears to be a kernel panic. It existed since the beginning, with the 4.14 kernel, and remained unsolved even after upgrading to 4.15 kernel.

Feb 07 11:48:59 linuxsys kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
Feb 07 11:48:59 linuxsys kernel: IP: dce110_vblank_set+0x4f/0xb0 [amdgpu]
Feb 07 11:48:59 linuxsys kernel: PGD 7e2ac2067 P4D 7e2ac2067 PUD 7e2a7e067 PMD 0 
Feb 07 11:48:59 linuxsys kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Feb 07 11:48:59 linuxsys kernel: Modules linked in: vmw_vsock_vmci_transport vsock rfcomm fuse bnep vmnet(O) arc4 amdkfd nls_iso8859_1 amd_iommu_v2 nls_cp437 vfat fat amdgpu iwlmvm uvcvideo mac80211 videobuf2_vmalloc edac_mce_amd btusb vide
Feb 07 11:48:59 linuxsys kernel:  rng_core cryptd pcspkr k10temp i2c_piix4 shpchp battery wmi thermal ac tpm_crb tpm_tis tpm_tis_core video tpm asus_wireless i2c_hid button acpi_cpufreq sch_fq_codel vmmon(O) vmw_vmci vboxnetflt(O) vboxnetad
Feb 07 11:48:59 linuxsys kernel: CPU: 15 PID: 1467 Comm: xfwm4 Tainted: G        W  O     4.15.0-1-MANJARO #1
Feb 07 11:48:59 linuxsys kernel: Hardware name: ASUSTeK COMPUTER INC. GL702ZC/GL702ZC, BIOS GL702ZC.303 12/15/2017
Feb 07 11:48:59 linuxsys kernel: RIP: 0010:dce110_vblank_set+0x4f/0xb0 [amdgpu]
Feb 07 11:48:59 linuxsys kernel: RSP: 0018:ffffb4e388c7bbe0 EFLAGS: 00010002
Feb 07 11:48:59 linuxsys kernel: RAX: ffff9b458850c000 RBX: 0000000000000001 RCX: 0000000000000000
Feb 07 11:48:59 linuxsys kernel: RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
Feb 07 11:48:59 linuxsys kernel: RBP: ffff9b4b2f4168e0 R08: 0000000000000000 R09: 0000000000000000
Feb 07 11:48:59 linuxsys kernel: R10: 00007fff89afe9f0 R11: ffff9b4b2b86ac40 R12: ffff9b4b38511a80
Feb 07 11:48:59 linuxsys kernel: R13: ffffffffc12bbba0 R14: ffff9b4b281f0000 R15: ffff9b4b3ab4cb68
Feb 07 11:48:59 linuxsys kernel: FS:  00007f0bdae66980(0000) GS:ffff9b4b3e9c0000(0000) knlGS:0000000000000000
Feb 07 11:48:59 linuxsys kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 07 11:48:59 linuxsys kernel: CR2: 0000000000000000 CR3: 00000007d96c8000 CR4: 00000000003406e0
Feb 07 11:48:59 linuxsys kernel: Call Trace:
Feb 07 11:48:59 linuxsys kernel:  amdgpu_dm_set_crtc_irq_state+0x31/0x60 [amdgpu]
Feb 07 11:48:59 linuxsys kernel:  amdgpu_irq_update+0x55/0x90 [amdgpu]
Feb 07 11:48:59 linuxsys kernel:  drm_vblank_enable+0x84/0x100 [drm]
Feb 07 11:48:59 linuxsys kernel:  drm_vblank_get+0x8d/0xb0 [drm]
Feb 07 11:48:59 linuxsys kernel:  drm_wait_vblank_ioctl+0x12a/0x690 [drm]
Feb 07 11:48:59 linuxsys kernel:  ? unix_stream_recvmsg+0x53/0x70
Feb 07 11:48:59 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Feb 07 11:48:59 linuxsys kernel:  drm_ioctl_kernel+0x5b/0xb0 [drm]
Feb 07 11:48:59 linuxsys kernel:  drm_ioctl+0x2d5/0x370 [drm]
Feb 07 11:48:59 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Feb 07 11:48:59 linuxsys kernel:  ? do_iter_write+0xdc/0x190
Feb 07 11:48:59 linuxsys kernel:  ? vfs_writev+0xb9/0x110
Feb 07 11:48:59 linuxsys kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Feb 07 11:48:59 linuxsys kernel:  do_vfs_ioctl+0xa4/0x630
Feb 07 11:48:59 linuxsys kernel:  ? __sys_recvmsg+0x4e/0x90
Feb 07 11:48:59 linuxsys kernel:  ? __sys_recvmsg+0x7d/0x90
Feb 07 11:48:59 linuxsys kernel:  SyS_ioctl+0x74/0x80
Feb 07 11:48:59 linuxsys kernel:  entry_SYSCALL_64_fastpath+0x20/0x83
Feb 07 11:48:59 linuxsys kernel: RIP: 0033:0x7f0bd74b3d87
Feb 07 11:48:59 linuxsys kernel: RSP: 002b:00007fff89afea38 EFLAGS: 00000246
Feb 07 11:48:59 linuxsys kernel: Code: e8 17 20 04 00 83 e8 4e 0f b6 d0 48 89 d0 48 c1 e0 05 48 01 d0 48 c1 e0 05 49 03 86 60 01 00 00 84 db 48 8b b8 78 02 00 00 74 18 <48> 8b 07 be 02 00 00 00 48 8b 80 d8 00 00 00 e8 6d 43 7e ee 84 
Feb 07 11:48:59 linuxsys kernel: RIP: dce110_vblank_set+0x4f/0xb0 [amdgpu] RSP: ffffb4e388c7bbe0
Feb 07 11:48:59 linuxsys kernel: CR2: 0000000000000000
Feb 07 11:48:59 linuxsys kernel: ---[ end trace 36522610c84ff0f3 ]---

The cause seems to be dce110_vblank_set+0x4f/0xb0 [amdgpu], with the topmost call trace being dce110_vblank_set+0x4f/0xb0 [amdgpu].

The bug report here, which was closed last December, resembled my current issue:
https://lists.freedesktop.org/archives/amd-gfx/2017-November/016236.html

I've thought about the possibility of it being DC-related as I saw similar bug reports, but I was wrong, as at one time I was able to reproduce it even after passing amdgpu.dc=0 during boot. The modules don't seem to be related, as it happened on fresh installs, where I left the screen blank (before I actually adjusted power management options) as I let it download and install packages I wanted in the background.

Additionally, I'm able to find some additional errors prior to the crash, which might have happened when the screen went blank. It could be done by simply locking the screen and leave it as is. (NOTE: When I locked the screen and then immediately move the mouse cursor to wake it up, the crash would not occur. It would only occur if the screen went blank for at least 3-5 minutes.)

Feb 07 11:38:04 linuxsys kernel: [drm] {1920x1080, 2080x1111@138700Khz}
Feb 07 11:38:12 linuxsys kernel: [drm] RBRx2 pass VS=1, PE=0
Feb 07 11:38:12 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:12 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:12 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:12 linuxsys kernel: WARNING: CPU: 12 PID: 1467 at drivers/gpu/drm/drm_vblank.c:612 drm_calc_vbltimestamp_from_scanoutpos+0x2c5/0x340 [drm]
Feb 07 11:38:12 linuxsys kernel: Modules linked in: vmw_vsock_vmci_transport vsock rfcomm fuse bnep vmnet(O) arc4 amdkfd nls_iso8859_1 amd_iommu_v2 nls_cp437 vfat fat amdgpu iwlmvm uvcvideo mac80211 videobuf2_vmalloc edac_mce_amd btusb vide
Feb 07 11:38:12 linuxsys kernel:  rng_core cryptd pcspkr k10temp i2c_piix4 shpchp battery wmi thermal ac tpm_crb tpm_tis tpm_tis_core video tpm asus_wireless i2c_hid button acpi_cpufreq sch_fq_codel vmmon(O) vmw_vmci vboxnetflt(O) vboxnetad
Feb 07 11:38:12 linuxsys kernel: CPU: 12 PID: 1467 Comm: xfwm4 Tainted: G           O     4.15.0-1-MANJARO #1
Feb 07 11:38:12 linuxsys kernel: Hardware name: ASUSTeK COMPUTER INC. GL702ZC/GL702ZC, BIOS GL702ZC.303 12/15/2017
Feb 07 11:38:12 linuxsys kernel: RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x2c5/0x340 [drm]
Feb 07 11:38:12 linuxsys kernel: RSP: 0018:ffffb4e388c7bb50 EFLAGS: 00010086
Feb 07 11:38:12 linuxsys kernel: RAX: ffffffffc12b04c0 RBX: ffff9b4b3ab4c800 RCX: 0000000000000001
Feb 07 11:38:12 linuxsys kernel: RDX: ffffffffc0941068 RSI: 0000000000000001 RDI: ffffffffc093f0d8
Feb 07 11:38:12 linuxsys kernel: RBP: ffffb4e388c7bbb8 R08: 0000000000000000 R09: ffffffffc09214a0
Feb 07 11:38:12 linuxsys kernel: R10: ffffffffc10d6320 R11: ffffffffb056c36d R12: 0000000000000001
Feb 07 11:38:12 linuxsys kernel: R13: ffffb4e388c7bbcc R14: ffffb4e388c7bc00 R15: ffff9b4b2ba84000
Feb 07 11:38:12 linuxsys kernel: FS:  00007f0bdae66980(0000) GS:ffff9b4b3e900000(0000) knlGS:0000000000000000
Feb 07 11:38:12 linuxsys kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 07 11:38:12 linuxsys kernel: CR2: 00007f9ee41080b0 CR3: 00000007d96c8000 CR4: 00000000003406e0
Feb 07 11:38:12 linuxsys kernel: Call Trace:
Feb 07 11:38:12 linuxsys kernel:  drm_get_last_vbltimestamp+0x54/0x90 [drm]
Feb 07 11:38:12 linuxsys kernel:  drm_update_vblank_count+0x77/0x250 [drm]
Feb 07 11:38:12 linuxsys kernel:  drm_vblank_enable+0xbd/0x100 [drm]
Feb 07 11:38:12 linuxsys kernel:  drm_vblank_get+0x8d/0xb0 [drm]
Feb 07 11:38:12 linuxsys kernel:  drm_wait_vblank_ioctl+0x12a/0x690 [drm]
Feb 07 11:38:12 linuxsys kernel:  ? unix_stream_recvmsg+0x53/0x70
Feb 07 11:38:12 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Feb 07 11:38:12 linuxsys kernel:  drm_ioctl_kernel+0x5b/0xb0 [drm]
Feb 07 11:38:12 linuxsys kernel:  drm_ioctl+0x2d5/0x370 [drm]
Feb 07 11:38:12 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Feb 07 11:38:12 linuxsys kernel:  ? do_iter_write+0xdc/0x190
Feb 07 11:38:12 linuxsys kernel:  ? vfs_writev+0xb9/0x110
Feb 07 11:38:12 linuxsys kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Feb 07 11:38:12 linuxsys kernel:  do_vfs_ioctl+0xa4/0x630
Feb 07 11:38:12 linuxsys kernel:  ? __sys_recvmsg+0x4e/0x90
Feb 07 11:38:12 linuxsys kernel:  ? __sys_recvmsg+0x7d/0x90
Feb 07 11:38:12 linuxsys kernel:  SyS_ioctl+0x74/0x80
Feb 07 11:38:12 linuxsys kernel:  entry_SYSCALL_64_fastpath+0x20/0x83
Feb 07 11:38:12 linuxsys kernel: RIP: 0033:0x7f0bd74b3d87
Feb 07 11:38:12 linuxsys kernel: RSP: 002b:00007fff89afea38 EFLAGS: 00000246
Feb 07 11:38:12 linuxsys kernel: Code: e1 48 c7 c2 68 10 94 c0 be 01 00 00 00 48 c7 c7 d8 f0 93 c0 e8 1d 66 fe ff 48 8b 83 98 03 00 00 48 83 78 20 00 0f 84 6f fd ff ff <0f> ff e9 68 fd ff ff 48 c7 c2 30 10 94 c0 31 f6 48 c7 c7 d5 f0 
Feb 07 11:38:12 linuxsys kernel: ---[ end trace 36522610c84ff0f2 ]---
Feb 07 11:38:12 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:12 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:12 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!

For now, I could only prevent the panic from happening by not allowing power saving functions to happen, especially anything that would turn off the screen. I'm also not allowed to lock the screen it would also blank the screen, and the GTK+ greeter could blank the screen in its own way. 

However, it's not feasible for running the system on battery for an extended period without any power saving feature, given the high total TDP the laptop has, and leaving the system unlocked is a bad idea in terms of security and privacy.

By the way, given there were a few similar closed bug reports in the past, I believe the problem might be a regression.
Comment 1 Michel Dänzer 2018-02-09 09:21:31 UTC
(In reply to L.S.S. from comment #0)
> I've thought about the possibility of it being DC-related as I saw similar
> bug reports, but I was wrong, as at one time I was able to reproduce it even
> after passing amdgpu.dc=0 during boot.

The rest of your report mostly points towards a DC specific issue. If you can still reproduce an issue without DC, it would be best to file a separate report about that.
Comment 2 bp8b7me 2018-02-10 17:25:51 UTC
l too have same problem.
Comment 3 bp8b7me 2018-02-10 17:31:34 UTC
Sorry, i'm forget write. 
I too have same problem, but on Desktop*.
Comment 4 L.S.S. 2018-02-11 00:32:21 UTC
Just now I tried reproducing it without dc (passing amdgpu.dc=0) but somehow I was not able to... the system was able to successfully get back to the lock screen after letting it blank after an extended period.

As for that time I did manage to reproduce... maybe I passed the parameter wrong or for some other reasons, but for now, will keep the issue DC-specific as it's always reproducible with DC enabled (Arch/Manjaro enables DC by default including pre-Vega).
Comment 5 Harry Wentland 2018-02-12 15:55:22 UTC
Can you attach a full dmesg log with amdgpu.dc_log=1 and drm.debug=6 passed as kernel options?
Comment 6 L.S.S. 2018-02-13 03:30:19 UTC
Created attachment 137308 [details]
dmesg output with amdgpu.dc_log=1 and drm.debug=6, right after login.

I'm not sure to what extent is a "full" dmesg. Attached is the dmesg I exported right after startup, with the above parameters passed.
Comment 7 L.S.S. 2018-02-13 03:38:54 UTC
And I just crashed my system the same usual way. With those parameters set there are some additional outputs besides the usual ones.

Feb 13 11:31:55 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:1, planes_changed:1, mode_changed:0,active_changed:0,connectors_changed:0
Feb 13 11:31:55 linuxsys kernel: [drm:handle_cursor_update.isra.22 [amdgpu]] handle_cursor_update: crtc_id=0 with size 128 to 128
Feb 13 11:31:55 linuxsys kernel: [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
Feb 13 11:31:55 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:1, planes_changed:1, mode_changed:0,active_changed:0,connectors_changed:0
Feb 13 11:31:55 linuxsys kernel: [drm:handle_cursor_update.isra.22 [amdgpu]] handle_cursor_update: crtc_id=0 with size 0 to 0
Feb 13 11:31:55 linuxsys kernel: [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
Feb 13 11:31:55 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:1, planes_changed:1, mode_changed:0,active_changed:0,connectors_changed:0
Feb 13 11:31:55 linuxsys kernel: [drm:handle_cursor_update.isra.22 [amdgpu]] handle_cursor_update: crtc_id=0 with size 128 to 128
Feb 13 11:31:55 linuxsys kernel: [drm:best_encoder [amdgpu]] Finding the best encoder
Feb 13 11:31:55 linuxsys kernel: [drm:best_encoder [amdgpu]] Finding the best encoder
Feb 13 11:31:55 linuxsys kernel: [drm:update_stream_scaling_settings [amdgpu]] Destination Rectangle x:0  y:0  width:1920  height:1080
Feb 13 11:31:55 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] Mode change not required, setting mode_changed to 0
Feb 13 11:31:55 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:1, planes_changed:0, mode_changed:0,active_changed:0,connectors_changed:0
Feb 13 11:31:55 linuxsys kernel: [drm:best_encoder [amdgpu]] Finding the best encoder
Feb 13 11:31:55 linuxsys kernel: [drm:best_encoder [amdgpu]] Finding the best encoder
Feb 13 11:31:55 linuxsys kernel: [drm:dm_update_planes_state.part.28 [amdgpu]] Disabling DRM plane: 36 on DRM crtc 43
Feb 13 11:31:55 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:0, planes_changed:0, mode_changed:0,active_changed:1,connectors_changed:0
Feb 13 11:31:55 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] Disabling DRM crtc: 43
Feb 13 11:31:55 linuxsys kernel: [drm:update_stream_scaling_settings [amdgpu]] Destination Rectangle x:0  y:0  width:1920  height:1080
Feb 13 11:31:55 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] Mode change not required, setting mode_changed to 0
Feb 13 11:31:55 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:0, planes_changed:0, mode_changed:0,active_changed:1,connectors_changed:0
Feb 13 11:31:55 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:0, planes_changed:1, mode_changed:0,active_changed:1,connectors_changed:0
Feb 13 11:31:55 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] Atomic commit: RESET. crtc id 0:[000000000ce1e17c]
Feb 13 11:31:55 linuxsys kernel: [drm] dc_commit_state: 0 streams
Feb 13 11:31:55 linuxsys kernel: [drm] hwss_edp_backlight_control: backlight action: Off
Feb 13 11:31:55 linuxsys kernel: [drm] hwss_edp_backlight_control: backlight action: Off
Feb 13 11:31:55 linuxsys kernel: [drm:amdgpu_vm_init [amdgpu]] VM update mode is SDMA
Feb 13 11:31:55 linuxsys kernel: [drm] hwss_edp_backlight_control: backlight action: Off
Feb 13 11:31:55 linuxsys kernel: [drm] hwss_edp_power_control: Panel Power action: Off
Feb 13 11:31:56 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] Mode change not required, setting mode_changed to 0
Feb 13 11:31:56 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:0, planes_changed:1, mode_changed:0,active_changed:0,connectors_changed:0
Feb 13 11:31:56 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] Mode change not required, setting mode_changed to 0
Feb 13 11:31:56 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:0, planes_changed:1, mode_changed:0,active_changed:0,connectors_changed:0
Feb 13 11:31:56 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] Mode change not required, setting mode_changed to 0
Feb 13 11:31:56 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:0, planes_changed:1, mode_changed:0,active_changed:0,connectors_changed:0
Feb 13 11:31:56 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] Mode change not required, setting mode_changed to 0
Feb 13 11:31:56 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:0, planes_changed:1, mode_changed:0,active_changed:0,connectors_changed:0
Feb 13 11:32:06 linuxsys kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
Feb 13 11:32:06 linuxsys kernel: IP: dce110_vblank_set+0x4f/0xb0 [amdgpu]
Feb 13 11:32:06 linuxsys kernel: PGD 7d98ee067 P4D 7d98ee067 PUD 7d98ef067 PMD 0 
Feb 13 11:32:06 linuxsys kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Feb 13 11:32:06 linuxsys kernel: Modules linked in: cmac rfcomm fuse bnep vmnet(O) arc4 nls_iso8859_1 nls_cp437 vfat fat amdkfd amd_iommu_v2 iwlmvm amdgpu mac80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core vi
Feb 13 11:32:06 linuxsys kernel:  k10temp i2c_piix4 shpchp thermal wmi battery ac tpm_crb tpm_tis tpm_tis_core tpm video i2c_hid asus_wireless button acpi_cpufreq sch_fq_codel vmmon(O) vmw_vmci vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O
Feb 13 11:32:06 linuxsys kernel: CPU: 10 PID: 1451 Comm: xfwm4 Tainted: G           O     4.15.0-1-MANJARO #1
Feb 13 11:32:06 linuxsys kernel: Hardware name: ASUSTeK COMPUTER INC. GL702ZC/GL702ZC, BIOS GL702ZC.303 12/15/2017
Feb 13 11:32:06 linuxsys kernel: RIP: 0010:dce110_vblank_set+0x4f/0xb0 [amdgpu]
Feb 13 11:32:06 linuxsys kernel: RSP: 0018:ffff994148b27be0 EFLAGS: 00010002
Feb 13 11:32:06 linuxsys kernel: RAX: ffff8c4273cd0000 RBX: 0000000000000001 RCX: 0000000000000000
Feb 13 11:32:06 linuxsys kernel: RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
Feb 13 11:32:06 linuxsys kernel: RBP: ffff8c42b7fcaba0 R08: 0000000000000000 R09: 0000000000000000
Feb 13 11:32:06 linuxsys kernel: R10: 00007ffef0182bf0 R11: ffff8c42b9079d80 R12: ffff8c42b6c96b80
Feb 13 11:32:06 linuxsys kernel: R13: ffffffffc1178ba0 R14: ffff8c42a84d0000 R15: ffff8c42acb3c368
Feb 13 11:32:06 linuxsys kernel: FS:  00007f87374b6980(0000) GS:ffff8c42dee80000(0000) knlGS:0000000000000000
Feb 13 11:32:06 linuxsys kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 13 11:32:06 linuxsys kernel: CR2: 0000000000000000 CR3: 00000007d990c000 CR4: 00000000003406e0
Feb 13 11:32:06 linuxsys kernel: Call Trace:
Feb 13 11:32:06 linuxsys kernel:  amdgpu_dm_set_crtc_irq_state+0x31/0x60 [amdgpu]
Feb 13 11:32:06 linuxsys kernel:  amdgpu_irq_update+0x55/0x90 [amdgpu]
Feb 13 11:32:06 linuxsys kernel:  drm_vblank_enable+0x84/0x100 [drm]
Feb 13 11:32:06 linuxsys kernel:  drm_vblank_get+0x8d/0xb0 [drm]
Feb 13 11:32:06 linuxsys kernel:  drm_wait_vblank_ioctl+0x12a/0x690 [drm]
Feb 13 11:32:06 linuxsys kernel:  ? unix_stream_recvmsg+0x53/0x70
Feb 13 11:32:06 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Feb 13 11:32:06 linuxsys kernel:  drm_ioctl_kernel+0x5b/0xb0 [drm]
Feb 13 11:32:06 linuxsys kernel:  drm_ioctl+0x2d5/0x370 [drm]
Feb 13 11:32:06 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Feb 13 11:32:06 linuxsys kernel:  ? do_iter_write+0xdc/0x190
Feb 13 11:32:06 linuxsys kernel:  ? vfs_writev+0xb9/0x110
Feb 13 11:32:06 linuxsys kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Feb 13 11:32:06 linuxsys kernel:  do_vfs_ioctl+0xa4/0x630
Feb 13 11:32:06 linuxsys kernel:  ? __sys_recvmsg+0x4e/0x90
Feb 13 11:32:06 linuxsys kernel:  ? __sys_recvmsg+0x7d/0x90
Feb 13 11:32:06 linuxsys kernel:  SyS_ioctl+0x74/0x80
Feb 13 11:32:06 linuxsys kernel:  entry_SYSCALL_64_fastpath+0x20/0x83
Feb 13 11:32:06 linuxsys kernel: RIP: 0033:0x7f8733b03d87
Feb 13 11:32:06 linuxsys kernel: RSP: 002b:00007ffef0182c38 EFLAGS: 00000246
Feb 13 11:32:06 linuxsys kernel: Code: e8 17 20 04 00 83 e8 4e 0f b6 d0 48 89 d0 48 c1 e0 05 48 01 d0 48 c1 e0 05 49 03 86 60 01 00 00 84 db 48 8b b8 78 02 00 00 74 18 <48> 8b 07 be 02 00 00 00 48 8b 80 d8 00 00 00 e8 6d 73 92 cb 84 
Feb 13 11:32:06 linuxsys kernel: RIP: dce110_vblank_set+0x4f/0xb0 [amdgpu] RSP: ffff994148b27be0
Feb 13 11:32:06 linuxsys kernel: CR2: 0000000000000000
Feb 13 11:32:06 linuxsys kernel: ---[ end trace de1630a0c4489cb7 ]---
Feb 13 11:32:06 linuxsys kernel: note: xfwm4[1451] exited with preempt_count 3
Feb 13 11:32:32 linuxsys kernel: [drm:best_encoder [amdgpu]] Finding the best encoder
Feb 13 11:32:32 linuxsys kernel: [drm:best_encoder [amdgpu]] Finding the best encoder
Feb 13 11:32:32 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:1, planes_changed:0, mode_changed:0,active_changed:1,connectors_changed:0
Feb 13 11:32:32 linuxsys kernel: [drm:update_stream_scaling_settings [amdgpu]] Destination Rectangle x:0  y:0  width:1920  height:1080
Feb 13 11:32:32 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:1, planes_changed:0, mode_changed:0,active_changed:1,connectors_changed:0
Feb 13 11:32:32 linuxsys kernel: [drm:dm_update_crtcs_state [amdgpu]] Enabling DRM crtc: 43
Feb 13 11:32:32 linuxsys kernel: [drm:dm_update_planes_state.part.28 [amdgpu]] Enabling DRM plane: 36 on DRM crtc 43
Feb 13 11:32:32 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:0 crtc_state_flags: enable:1, active:1, planes_changed:1, mode_changed:0,active_changed:1,connectors_changed:0
Feb 13 11:32:32 linuxsys kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] Atomic commit: SET crtc id 0: [000000000ce1e17c]
Feb 13 11:32:32 linuxsys kernel: [drm] dc_commit_state: 1 streams
Feb 13 11:32:32 linuxsys kernel: [drm] core_stream 0x866b8000: src: 0, 0, 1920, 1080; dst: 0, 0, 1920, 1080, colorSpace:1
Feb 13 11:32:32 linuxsys kernel: [drm]         pix_clk_khz: 138700, h_total: 2080, v_total: 1111, pixelencoder:1, displaycolorDepth:1
Feb 13 11:32:32 linuxsys kernel: [drm]         sink name: , serial: 0
Feb 13 11:32:32 linuxsys kernel: [drm]         link: 0
Feb 13 11:32:32 linuxsys kernel: [drm] [Mode]        [eDP][ConnIdx:0] {1920x1080, 2080x1111@138700Khz}^
Feb 13 11:32:32 linuxsys kernel: [drm] hwss_edp_power_control: Panel Power action: On
Feb 13 11:32:32 linuxsys kernel: [drm] hwss_edp_backlight_control: backlight action: On
Feb 13 11:32:32 linuxsys kernel: [drm] Link: 0 eDP panel mode supported: 1 eDP panel mode enabled: 1 
Feb 13 11:32:32 linuxsys kernel: [drm] [LKTN]        [eDP][ConnIdx:0] RBRx2 pass VS=1, PE=0^
Feb 13 11:32:32 linuxsys kernel: [drm] hwss_edp_backlight_control: backlight action: On
Comment 8 Harry Wentland 2018-02-13 16:13:02 UTC
Created attachment 137322 [details] [review]
Patch 1 Use crtc enable/disable_vblank hooks
Comment 9 Harry Wentland 2018-02-13 16:13:31 UTC
Created attachment 137323 [details] [review]
Patch 2 Return success when enabling interrupt
Comment 10 Harry Wentland 2018-02-13 16:13:58 UTC
Created attachment 137324 [details] [review]
Patch 3 Clean up formatting in irq_service_dce110.c
Comment 11 Harry Wentland 2018-02-13 16:14:28 UTC
Created attachment 137325 [details] [review]
Patch 4 Don't blow up if TG is NULL in dce110_vblank_set
Comment 12 Harry Wentland 2018-02-13 16:14:56 UTC
Are you able to rebuild the kernel with the attached patches and see if that fixes things?
Comment 13 Harry Wentland 2018-02-13 18:40:03 UTC
Created attachment 137327 [details] [review]
Patch 2 Return success when enabling interrupt

Goofed up my original patch 2. This should work.
Comment 14 L.S.S. 2018-02-14 05:55:42 UTC
The first patch got rejected with the most recent 4.15 kernel source pulled using the PKGBUILD file (Feb 14, 2018). The reject file contains:

--- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2523,6 +2545,8 @@ static const struct drm_crtc_funcs amdgpu_dm_crtc_funcs = {
 	.atomic_duplicate_state = dm_crtc_duplicate_state,
 	.atomic_destroy_state = dm_crtc_destroy_state,
 	.set_crc_source = amdgpu_dm_crtc_set_crc_source,
+	.enable_vblank = dm_enable_vblank,
+	.disable_vblank = dm_disable_vblank,
 };
 
 static enum drm_connector_status

In the original amdgpu_dm.c:

/* Implemented only the options currently availible for the driver */
static const struct drm_crtc_funcs amdgpu_dm_crtc_funcs = {
	.reset = dm_crtc_reset_state,
	.destroy = amdgpu_dm_crtc_destroy,
	.gamma_set = drm_atomic_helper_legacy_gamma_set,
	.set_config = drm_atomic_helper_set_config,
	.page_flip = drm_atomic_helper_page_flip,
	.atomic_duplicate_state = dm_crtc_duplicate_state,
	.atomic_destroy_state = dm_crtc_destroy_state,
};

This line:

.set_crc_source = amdgpu_dm_crtc_set_crc_source,

is not there.
Comment 15 L.S.S. 2018-02-14 06:14:18 UTC
Never mind, just figured out how to properly adjust the patch file to match the kernel source I got so the patch file gets to be properly applied... it seems the rest of the patches went through without complaints and the kernel's now building...

However, I still need to be check if this one-line change in the patch will lead to any side effects during build or during runtime...
Comment 16 L.S.S. 2018-02-14 06:52:04 UTC
Just installed and booted the new kernel. It seems to have fixed the issue at least to the extent that it would not totally crash the system like it used to.

However, I still see these in journalctl. This is after I locked the screen then wake the screen up 3 times.

1st time:

Feb 14 14:35:17 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 14 14:35:37 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 14 14:35:57 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 14 14:36:10 linuxsys kernel: [drm] {1920x1080, 2080x1111@138700Khz}
...
Feb 14 14:36:17 linuxsys kernel: [drm] RBRx2 pass VS=1, PE=0
Feb 14 14:36:17 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 14 14:36:17 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 14 14:36:17 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 14 14:36:17 linuxsys kernel: WARNING: CPU: 10 PID: 1485 at drivers/gpu/drm/drm_vblank.c:612 drm_calc_vbltimestamp_from_scanoutpos+0x2c5/0x340 [drm]
Feb 14 14:36:17 linuxsys kernel: Modules linked in: cmac rfcomm fuse bnep vmnet(O) arc4 nls_iso8859_1 nls_cp437 vfat fat amdkfd amd_iommu_v2 amdgpu iwlmvm ax88179_178a usbnet mac80211 mii uvcvideo btusb videobuf2_vmalloc btrtl videobuf2_mem
Feb 14 14:36:17 linuxsys kernel:  rng_core tpm_tis tpm_tis_core k10temp shpchp battery ac rtc_cmos wmi i2c_piix4 tpm asus_wireless i2c_hid pinctrl_amd gpio_amdpt evdev mac_hid acpi_cpufreq sch_fq_codel vmmon(O) vmw_vmci vboxnetflt(O) vboxne
Feb 14 14:36:17 linuxsys kernel: CPU: 10 PID: 1485 Comm: xfwm4 Tainted: G           O     4.15.3-1-MANJARO #1
Feb 14 14:36:17 linuxsys kernel: Hardware name: ASUSTeK COMPUTER INC. GL702ZC/GL702ZC, BIOS GL702ZC.303 12/15/2017
Feb 14 14:36:17 linuxsys kernel: RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x2c5/0x340 [drm]
Feb 14 14:36:17 linuxsys kernel: RSP: 0018:ffffa97548adfb30 EFLAGS: 00010082
Feb 14 14:36:17 linuxsys kernel: RAX: ffffffffc15e54c0 RBX: ffff8a1375ae6800 RCX: 0000000000000001
Feb 14 14:36:17 linuxsys kernel: RDX: ffffffffc0d4d380 RSI: 0000000000000001 RDI: ffffffffc0d4b24e
Feb 14 14:36:17 linuxsys kernel: RBP: ffffa97548adfb98 R08: 0000000000000000 R09: ffffffffc0d2c870
Feb 14 14:36:17 linuxsys kernel: R10: ffffffffc140b320 R11: ffffffffb15c7f2d R12: 0000000000000001
Feb 14 14:36:17 linuxsys kernel: R13: ffffa97548adfbac R14: ffffa97548adfbe0 R15: ffff8a1377b6a000
Feb 14 14:36:17 linuxsys kernel: FS:  00007f2e273eb980(0000) GS:ffff8a137e880000(0000) knlGS:0000000000000000
Feb 14 14:36:17 linuxsys kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 14 14:36:17 linuxsys kernel: CR2: 00007feda6f5a000 CR3: 00000007da154000 CR4: 00000000003406e0
Feb 14 14:36:17 linuxsys kernel: Call Trace:
Feb 14 14:36:17 linuxsys kernel:  ? set_cursor+0x80/0x80
Feb 14 14:36:17 linuxsys kernel:  ? set_cursor+0x80/0x80
Feb 14 14:36:17 linuxsys kernel:  drm_get_last_vbltimestamp+0x54/0x90 [drm]
Feb 14 14:36:17 linuxsys kernel:  drm_update_vblank_count+0x77/0x250 [drm]
Feb 14 14:36:17 linuxsys kernel:  drm_vblank_enable+0xbd/0x100 [drm]
Feb 14 14:36:17 linuxsys kernel:  drm_vblank_get+0x8d/0xb0 [drm]
Feb 14 14:36:17 linuxsys kernel:  drm_wait_vblank_ioctl+0x12a/0x6a0 [drm]
Feb 14 14:36:17 linuxsys kernel:  ? unix_stream_recvmsg+0x53/0x70
Feb 14 14:36:17 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Feb 14 14:36:17 linuxsys kernel:  drm_ioctl_kernel+0x5b/0xb0 [drm]
Feb 14 14:36:17 linuxsys kernel:  drm_ioctl+0x2d5/0x370 [drm]
Feb 14 14:36:17 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Feb 14 14:36:17 linuxsys kernel:  ? do_iter_write+0xdc/0x190
Feb 14 14:36:17 linuxsys kernel:  ? vfs_writev+0xb9/0x110
Feb 14 14:36:17 linuxsys kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Feb 14 14:36:17 linuxsys kernel:  do_vfs_ioctl+0xa4/0x630
Feb 14 14:36:17 linuxsys kernel:  ? __sys_recvmsg+0x4e/0x90
Feb 14 14:36:17 linuxsys kernel:  ? __sys_recvmsg+0x7d/0x90
Feb 14 14:36:17 linuxsys kernel:  SyS_ioctl+0x74/0x80
Feb 14 14:36:17 linuxsys kernel:  do_syscall_64+0x75/0x190
Feb 14 14:36:17 linuxsys kernel:  entry_SYSCALL_64_after_hwframe+0x21/0x86
Feb 14 14:36:17 linuxsys kernel: RIP: 0033:0x7f2e23a38d87
Feb 14 14:36:17 linuxsys kernel: RSP: 002b:00007ffd8b7da1c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Feb 14 14:36:17 linuxsys kernel: RAX: ffffffffffffffda RBX: 00007ffd8b7da1f0 RCX: 00007f2e23a38d87
Feb 14 14:36:17 linuxsys kernel: RDX: 00007ffd8b7da1f0 RSI: 00000000c018643a RDI: 000000000000000c
Feb 14 14:36:17 linuxsys kernel: RBP: 0000000001006d10 R08: 0000000000800109 R09: 0000000000000000
Feb 14 14:36:17 linuxsys kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c018643a
Feb 14 14:36:17 linuxsys kernel: R13: 000000000080619a R14: 00000000010cd380 R15: 0000000000000000
Feb 14 14:36:17 linuxsys kernel: Code: e1 48 c7 c2 80 d3 d4 c0 be 01 00 00 00 48 c7 c7 4e b2 d4 c0 e8 6d 62 fe ff 48 8b 83 98 03 00 00 48 83 78 20 00 0f 84 6f fd ff ff <0f> ff e9 68 fd ff ff 48 c7 c2 48 d3 d4 c0 31 f6 48 c7 c7 4b b2 
Feb 14 14:36:17 linuxsys kernel: ---[ end trace e345f4b7c52fbc5c ]---
Feb 14 14:36:17 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 14 14:36:17 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 14 14:36:17 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 14 14:36:25 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 14 14:36:25 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 14 14:36:25 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 14 14:36:25 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 14 14:36:25 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 14 14:36:25 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!

2nd time:

Feb 14 14:40:37 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 14 14:40:57 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 14 14:41:17 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 14 14:41:37 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 14 14:41:57 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 14 14:42:05 linuxsys kernel: [drm] {1920x1080, 2080x1111@138700Khz}

3rd time:

Feb 14 14:44:26 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 14 14:44:46 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!

The "failed to get VBLANK" errors still appear, and the 1st time seems to have crashed something, but the system still works, and there are no traces of crashes like that from the 1st time, during the 2nd and 3rd time.
Comment 17 Harry Wentland 2018-02-14 14:35:58 UTC
Thanks for fixing the patch conflict. I based them on https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next but should have based them on the 4.15 kernel.

Thanks as well for testing. The patches don't fix the root cause of the issue but make sure you don't crash in this case.

Catching the root cause is a bit more difficult and would require more debugging. I haven't seen the "Failed to get VBLANK" on platforms available to me but will keep an eye out for it.
Comment 18 Ainola 2018-02-14 16:13:25 UTC
I applied the patch to 4.15.3 on archlinux and have tested with xset dpms force {standby,suspend} with success.
Comment 19 Ainola 2018-02-15 20:59:00 UTC
Created attachment 137383 [details]
stacktrace even with patches

I just got another freeze despite using the patches. I'm not sure if this is the same bug since it mentions slub.c but I see amdgpu/drm stuff in the trace. After this trace the journal was flooded with items like "amdgpu_dm_irq_schedule_work FAILED src 4" (it alternates between 2 and 4)
Comment 20 Ainola 2018-02-23 23:23:01 UTC
I've been using this patchset on linux 4.15.3 and 4.14.4 and haven't had a problem since.
Comment 21 L.S.S. 2018-02-27 05:23:02 UTC
Another problem: When I woke up the screen, sometimes the system would have intermittent soft lockups that made the system kind of unusable... This is after I included the patch to the latest 4.15 kernel, 4.15.5 (4.15.5-1-MANJARO)

In journalctl I find the following phenomenon. During its sleep the error "failed to get VBLANK" is being written every 20s.

Feb 27 11:57:29 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 27 11:57:49 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 27 11:58:09 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 27 11:58:29 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 27 11:58:49 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 27 11:59:09 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 27 11:59:29 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Feb 27 11:59:49 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!

After I woke the screen up and I get the intermittent soft lockup issue, the error "dc_stream_state is NULL for crtc '1'" is written multiple times from either dm_vblank_get_counter or dm_crtc_get_scanoutpos every 8 or 18 seconds, which seemed to correspond to the lockup interval. I did not test the issue further as the system was almost unusable due to the lockup and I had to reboot.

Feb 27 12:17:08 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:08 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:08 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:08 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:08 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:08 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:16 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:16 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:16 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:16 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:16 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:16 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:34 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:34 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:34 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:34 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:34 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:34 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:42 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:42 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:42 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:42 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:42 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:17:42 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:00 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:00 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:00 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:00 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:00 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:00 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:08 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:08 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:08 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:08 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:08 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:08 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
...
Feb 27 12:18:26 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:26 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:26 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:26 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:26 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:26 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:35 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:35 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:35 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:35 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:35 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Feb 27 12:18:35 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!

Unfortunately, this issue is random and I could not always reproduce it. It just happens from time to time.
Comment 22 L.S.S. 2018-03-16 00:40:04 UTC
It seems some of the patches (1, 2, and 4) have entered the 4.16 kernel, from what I can tell when building the kernel for Manjaro, where these three patches got rejected, and the exact code changes were found in the original kernel code files.

However, after some testing I found out that the patch 3 (which is not included) is still needed for 4.16 to fix this the problem, as I can still crash the system without it.
Comment 23 L.S.S. 2018-03-20 00:55:00 UTC
EDIT: It seems I'm experiencing some intermittent screen flicker with current 4.16 kernel (on the same system, with only Patch 3 applied as it's the only patch needed for 4.16), although it doesn't really affect normal system usage.

I'm not sure if this flicker is related to this problem, but I'm putting it up here as it's still a continuation of my watching this issue's condition.
Comment 24 Ainola 2018-03-22 00:09:22 UTC
My stability has been fine since I last commented. I'm now on 4.15.10+these patches. However, my monitors won't turn off: When the screen turns off it'll come right back on after a second.

Just now I also had my second panic just like in #19, sadly (really bad time to have that happen, too) :(
Comment 25 Mez 2018-03-24 12:49:05 UTC
(In reply to L.S.S. from comment #23)
> EDIT: It seems I'm experiencing some intermittent screen flicker with
> current 4.16 kernel (on the same system, with only Patch 3 applied as it's
> the only patch needed for 4.16), although it doesn't really affect normal
> system usage.
> 
> I'm not sure if this flicker is related to this problem, but I'm putting it
> up here as it's still a continuation of my watching this issue's condition.

Do you have TearFree on?

https://bugs.freedesktop.org/show_bug.cgi?id=105530
Comment 26 Mez 2018-03-24 12:58:17 UTC
Also propably related:
https://bugs.freedesktop.org/show_bug.cgi?id=101580
Comment 27 L.S.S. 2018-03-26 01:25:38 UTC
(In reply to Mez from comment #25)
> (In reply to L.S.S. from comment #23)
> > EDIT: It seems I'm experiencing some intermittent screen flicker with
> > current 4.16 kernel (on the same system, with only Patch 3 applied as it's
> > the only patch needed for 4.16), although it doesn't really affect normal
> > system usage.
> > 
> > I'm not sure if this flicker is related to this problem, but I'm putting it
> > up here as it's still a continuation of my watching this issue's condition.
> 
> Do you have TearFree on?
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=105530

I don't know about TearFree as I haven't actually configured it, so it should be Manjaro's default setting. And I'm only getting the flicker on 4.16 kernel, on 4.15 it is and has always been fine.
Comment 28 L.S.S. 2018-04-12 00:41:14 UTC
Just updated to the 4.17-rc0 kernel and it seems the problem has been mostly fixed there. The patches are no longer needed (already in there) and trying to reproduce the issue only resulted in a couple of "Failed to get VBLANK!" errors that aren't fatal.

I'm not certain about the flickering issue I mentioned earlier... It looked like one but might actually be some kind of sudden color palette distortion. The problem only appeared since 4.16. I don't recall having the issue in 4.15.

I can partially reproduce it in the Firefox new tab page, by quickly hovering over the links on the "Top Sites" and "Highlights". The whole screen would turn a bit darker in color for a very short instant then returns to normal. It happens randomly and it doesn't seem to produce any errors in the log. Not a major issue, just it can be annoying sometimes.
Comment 29 L.S.S. 2018-04-12 04:19:24 UTC
EDIT: Maybe not really fixed in 4.17 (regression again?!)... just now after the screen went blank, I got another panic and had to reboot... :-(

When the panic occurred, it spawned two errors.

Apr 12 11:32:03 linuxsys systemd[5491]: Started Virtual filesystem service.
Apr 12 11:32:03 linuxsys udisksd[1970]: udisks_mount_get_mount_path: assertion 'mount->type == UDISKS_MOUNT_TYPE_FILESYSTEM' failed
Apr 12 11:32:13 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Apr 12 11:32:13 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Apr 12 11:32:13 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Apr 12 11:32:13 linuxsys kernel: WARNING: CPU: 12 PID: 1761 at drivers/gpu/drm/drm_vblank.c:620 drm_calc_vbltimestamp_from_scanoutpos+0x2a8/0x2f0 [drm]
Apr 12 11:32:13 linuxsys kernel: Modules linked in: btrfs zstd_compress zstd_decompress xxhash xor raid6_pq ufs hfsplus hfs minix ntfs msdos jfs ext4 mbcache jbd2 fscrypto dm_mod vmw_vsock_vmci_transport vsock cmac rfcomm fuse bnep vmnet(O>
Apr 12 11:32:13 linuxsys kernel:  agpgart snd_timer syscopyarea rfkill sysfillrect sysimgblt fb_sys_fops aesni_intel snd tpm_crb aes_x86_64 tpm_tis crypto_simd ccp cryptd soundcore tpm_tis_core sp5100_tco glue_helper pcspkr k10temp i2c_pii>
Apr 12 11:32:13 linuxsys kernel: CPU: 12 PID: 1761 Comm: xfwm4 Tainted: G           O     4.17.0-1-MANJARO #1
Apr 12 11:32:13 linuxsys kernel: Hardware name: ASUSTeK COMPUTER INC. GL702ZC/GL702ZC, BIOS GL702ZC.303 12/15/2017
Apr 12 11:32:13 linuxsys kernel: RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x2a8/0x2f0 [drm]
Apr 12 11:32:13 linuxsys kernel: RSP: 0018:ffffa96189f63b28 EFLAGS: 00010082
Apr 12 11:32:13 linuxsys kernel: RAX: ffffffffc131d9e0 RBX: ffff9d76fa423000 RCX: 0000000000000000
Apr 12 11:32:13 linuxsys kernel: RDX: 0000000000000001 RSI: ffffffffc09b98d0 RDI: 0000000000000001
Apr 12 11:32:13 linuxsys kernel: RBP: ffffa96189f63b90 R08: 0000000000000000 R09: ffffffffc0998ab0
Apr 12 11:32:13 linuxsys kernel: R10: ffff9d76f74131d8 R11: ffffffffc114e500 R12: 0000000000000001
Apr 12 11:32:13 linuxsys kernel: R13: ffff9d76f7413000 R14: ffffa96189f63ba4 R15: ffffa96189f63bd8
Apr 12 11:32:13 linuxsys kernel: FS:  00007fefa4d35980(0000) GS:ffff9d76fe900000(0000) knlGS:0000000000000000
Apr 12 11:32:13 linuxsys kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 12 11:32:13 linuxsys kernel: CR2: 00007f5f09dfa000 CR3: 00000007cc786000 CR4: 00000000003406e0
Apr 12 11:32:13 linuxsys kernel: Call Trace:
Apr 12 11:32:13 linuxsys kernel:  drm_get_last_vbltimestamp+0x54/0x90 [drm]
Apr 12 11:32:13 linuxsys kernel:  drm_update_vblank_count+0x79/0x240 [drm]
Apr 12 11:32:13 linuxsys kernel:  drm_vblank_enable+0xce/0x120 [drm]
Apr 12 11:32:13 linuxsys kernel:  drm_vblank_get+0x8d/0xb0 [drm]
Apr 12 11:32:13 linuxsys kernel:  drm_wait_vblank_ioctl+0x12a/0x620 [drm]
Apr 12 11:32:13 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Apr 12 11:32:13 linuxsys kernel:  drm_ioctl_kernel+0x5b/0xb0 [drm]
Apr 12 11:32:13 linuxsys kernel:  drm_ioctl+0x2c3/0x360 [drm]
Apr 12 11:32:13 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Apr 12 11:32:13 linuxsys kernel:  ? do_iter_write+0xdc/0x190
Apr 12 11:32:13 linuxsys kernel:  ? vfs_writev+0xb9/0x110
Apr 12 11:32:13 linuxsys kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Apr 12 11:32:13 linuxsys kernel:  do_vfs_ioctl+0xa4/0x630
Apr 12 11:32:13 linuxsys kernel:  ? __sys_recvmsg+0x5b/0xa0
Apr 12 11:32:13 linuxsys kernel:  ? __sys_recvmsg+0x8a/0xa0
Apr 12 11:32:13 linuxsys kernel:  ksys_ioctl+0x70/0x80
Apr 12 11:32:13 linuxsys kernel:  SyS_ioctl+0xa/0x10
Apr 12 11:32:13 linuxsys kernel:  do_syscall_64+0x74/0x190
Apr 12 11:32:13 linuxsys kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Apr 12 11:32:13 linuxsys kernel: RIP: 0033:0x7fefa1389d87
Apr 12 11:32:13 linuxsys kernel: RSP: 002b:00007ffedb5d0ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 12 11:32:13 linuxsys kernel: RAX: ffffffffffffffda RBX: 00007ffedb5d0f10 RCX: 00007fefa1389d87
Apr 12 11:32:13 linuxsys kernel: RDX: 00007ffedb5d0f10 RSI: 00000000c018643a RDI: 000000000000000c
Apr 12 11:32:13 linuxsys kernel: RBP: 0000000000ffad10 R08: 0000000000e00109 R09: 0000000000000000
Apr 12 11:32:13 linuxsys kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c018643a
Apr 12 11:32:13 linuxsys kernel: R13: 0000000000f67cbb R14: 00000000010bf7a0 R15: 0000000000000000
Apr 12 11:32:13 linuxsys kernel: Code: e9 b5 fd ff ff 44 89 e2 48 c7 c6 d0 98 9b c0 bf 01 00 00 00 e8 fa e9 ff ff 48 8b 83 98 03 00 00 48 83 78 28 00 0f 84 8c fd ff ff <0f> 0b 45 31 ed e9 85 fd ff ff 48 c7 c7 98 98 9b c0 45 31 ed e8 
Apr 12 11:32:13 linuxsys kernel: ---[ end trace bc02c50ede9b0814 ]---
Apr 12 11:32:13 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Apr 12 11:32:13 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Apr 12 11:32:13 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Apr 12 11:32:22 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Apr 12 11:32:22 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Apr 12 11:32:22 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Apr 12 11:32:22 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Apr 12 11:32:22 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Apr 12 11:32:22 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!


Apr 12 11:42:05 linuxsys kernel: ------------[ cut here ]------------
Apr 12 11:42:05 linuxsys kernel: kernel BUG at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:4692!
Apr 12 11:42:05 linuxsys kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Apr 12 11:42:05 linuxsys kernel: Modules linked in: btrfs zstd_compress zstd_decompress xxhash xor raid6_pq ufs hfsplus hfs minix ntfs msdos jfs ext4 mbcache jbd2 fscrypto dm_mod vmw_vsock_vmci_transport vsock cmac rfcomm fuse bnep vmnet(O>
Apr 12 11:42:05 linuxsys kernel:  agpgart snd_timer syscopyarea rfkill sysfillrect sysimgblt fb_sys_fops aesni_intel snd tpm_crb aes_x86_64 tpm_tis crypto_simd ccp cryptd soundcore tpm_tis_core sp5100_tco glue_helper pcspkr k10temp i2c_pii>
Apr 12 11:42:05 linuxsys kernel: CPU: 6 PID: 5473 Comm: Xorg Tainted: G        W  O     4.17.0-1-MANJARO #1
Apr 12 11:42:05 linuxsys kernel: Hardware name: ASUSTeK COMPUTER INC. GL702ZC/GL702ZC, BIOS GL702ZC.303 12/15/2017
Apr 12 11:42:05 linuxsys kernel: RIP: 0010:dm_update_crtcs_state+0x347/0x3c0 [amdgpu]
Apr 12 11:42:05 linuxsys kernel: RSP: 0018:ffffa9618c3b3b10 EFLAGS: 00010246
Apr 12 11:42:05 linuxsys kernel: RAX: 0000000000000000 RBX: ffff9d76f7bf2000 RCX: 0000000025a00806
Apr 12 11:42:05 linuxsys kernel: RDX: 0000000025a00606 RSI: ffff9d76fe7a7160 RDI: ffff9d76fe006e80
Apr 12 11:42:05 linuxsys kernel: RBP: ffff9d76eec29800 R08: 0000000000027160 R09: ffffffffc125a16d
Apr 12 11:42:05 linuxsys kernel: R10: ffffedaf06a59400 R11: 00000000ffffffff R12: 0000000000000000
Apr 12 11:42:05 linuxsys kernel: R13: ffff9d70a9657400 R14: ffff9d70a9652400 R15: ffff9d76af8b3980
Apr 12 11:42:05 linuxsys kernel: FS:  00007f9cdd552940(0000) GS:ffff9d76fe780000(0000) knlGS:0000000000000000
Apr 12 11:42:05 linuxsys kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 12 11:42:05 linuxsys kernel: CR2: 0000562ed2083fc0 CR3: 00000001a09ae000 CR4: 00000000003406e0
Apr 12 11:42:05 linuxsys kernel: Call Trace:
Apr 12 11:42:05 linuxsys kernel:  amdgpu_dm_atomic_check+0x2a0/0x4d0 [amdgpu]
Apr 12 11:42:05 linuxsys kernel:  drm_atomic_check_only+0x33a/0x4f0 [drm]
Apr 12 11:42:05 linuxsys kernel:  drm_atomic_commit+0x13/0x50 [drm]
Apr 12 11:42:05 linuxsys kernel:  drm_atomic_connector_commit_dpms+0xe5/0xf0 [drm]
Apr 12 11:42:05 linuxsys kernel:  drm_mode_obj_set_property_ioctl+0x170/0x290 [drm]
Apr 12 11:42:05 linuxsys kernel:  ? drm_mode_connector_set_obj_prop+0x70/0x70 [drm]
Apr 12 11:42:05 linuxsys kernel:  drm_mode_connector_property_set_ioctl+0x3e/0x60 [drm]
Apr 12 11:42:05 linuxsys kernel:  drm_ioctl_kernel+0x5b/0xb0 [drm]
Apr 12 11:42:05 linuxsys kernel:  drm_ioctl+0x2c3/0x360 [drm]
Apr 12 11:42:05 linuxsys kernel:  ? drm_mode_connector_set_obj_prop+0x70/0x70 [drm]
Apr 12 11:42:05 linuxsys kernel:  ? __handle_mm_fault+0xbff/0x14d0
Apr 12 11:42:05 linuxsys kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Apr 12 11:42:05 linuxsys kernel:  do_vfs_ioctl+0xa4/0x630
Apr 12 11:42:05 linuxsys kernel:  ? handle_mm_fault+0x10b/0x260
Apr 12 11:42:05 linuxsys kernel:  ? __do_page_fault+0x317/0x5a0
Apr 12 11:42:05 linuxsys kernel:  ksys_ioctl+0x70/0x80
Apr 12 11:42:05 linuxsys kernel:  SyS_ioctl+0xa/0x10
Apr 12 11:42:05 linuxsys kernel:  do_syscall_64+0x74/0x190
Apr 12 11:42:05 linuxsys kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Apr 12 11:42:05 linuxsys kernel: RIP: 0033:0x7f9cdae0cd87
Apr 12 11:42:05 linuxsys kernel: RSP: 002b:00007ffcd39d6308 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 12 11:42:05 linuxsys kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f9cdae0cd87
Apr 12 11:42:05 linuxsys kernel: RDX: 00007ffcd39d6340 RSI: 00000000c01064ab RDI: 000000000000000e
Apr 12 11:42:05 linuxsys kernel: RBP: 00007ffcd39d6340 R08: 0000562ed21dc3e0 R09: 0000000000000000
Apr 12 11:42:05 linuxsys kernel: R10: 00007f9cdae85220 R11: 0000000000000246 R12: 00000000c01064ab
Apr 12 11:42:05 linuxsys kernel: R13: 000000000000000e R14: 0000562ed0451470 R15: 0000562ed0456040
Apr 12 11:42:05 linuxsys kernel: Code: 18 c6 00 01 0f 84 f7 fd ff ff e9 e9 fd ff ff 45 0f b6 4d 0a 41 f6 c1 0e 0f 84 5c fd ff ff 48 c7 04 24 00 00 00 00 e9 16 fe ff ff <0f> 0b 48 83 bb 08 0d 00 00 00 0f 84 13 ff ff ff 48 83 3c 24 00 
Apr 12 11:42:05 linuxsys kernel: RIP: dm_update_crtcs_state+0x347/0x3c0 [amdgpu] RSP: ffffa9618c3b3b10
Apr 12 11:42:05 linuxsys kernel: ---[ end trace bc02c50ede9b0815 ]---
Apr 12 11:42:16 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Apr 12 11:42:36 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Apr 12 11:42:56 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Apr 12 11:43:16 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Comment 30 L.S.S. 2018-04-13 00:25:02 UTC
EDIT 2: I couldn't reproduce the issue on 4.17 this time even after 2 wakeups. However, the issue I encountered was similar (system apparently froze when trying to wake up the screen). 

After looking into it, I found that along with the errors that showed up like that in Comment 16 (which did not crash the system), there was an additional "kernel BUG" related to dm_update_crtcs_state which was called by amdgpu_dm_atomic_check, but the log appeared to have been cut (the log entries between the two errors were apparently unrelated to the error so I did not include them, and the new error began with a "--[ cut here ]--"). This additional error (not 100% reproducible) might be what actually crashed the system that time.
Comment 31 Öyvind Saether 2018-04-30 14:10:19 UTC
Kernel 4.17.0-rc3-linus.git-keumjo4.17.0-rc3-linus.git-keumjo on a 2400G with a RX 560 GPU:

> login to xfce desktop
> type "xset dpms force standby" in a terminal
screens go blank and there is no more response from the box, looks dead. But it's possible to ssh into it and find the following information in dmesg:

[12743.692027] ------------[ cut here ]------------
[12743.692030] kernel BUG at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:4708!
[12743.692039] invalid opcode: 0000 [#1] SMP NOPTI
[12743.692041] Modules linked in: twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common loop serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic rfcomm fuse rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle lz4 lz4_compress ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic
[12743.692069]  snd_hda_codec_hdmi snd_hda_intel btusb snd_hda_codec btrtl btbcm edac_mce_amd btintel bluetooth snd_hda_core snd_hwdep snd_seq kvm_amd ccp wmi_bmof kvm snd_seq_device snd_pcm ecdh_generic snd_timer irqbypass rfkill joydev pcspkr snd soundcore shpchp i2c_piix4 k10temp wmi video acpi_cpufreq binfmt_misc dm_crypt amdkfd raid1 amd_iommu_v2 amdgpu chash i2c_algo_bit gpu_sched drm_kms_helper ttm drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel r8169 mii
[12743.692091] CPU: 6 PID: 3708 Comm: Xorg Not tainted 4.17.0-rc3-linus.git-keumjo #3
[12743.692093] Hardware name: System manufacturer System Product Name/TUF B350M-PLUS GAMING, BIOS 4009 04/14/2018
[12743.692154] RIP: 0010:dm_update_crtcs_state+0x419/0x480 [amdgpu]
[12743.692156] RSP: 0018:ffffb4e688f07b30 EFLAGS: 00010246
[12743.692158] RAX: ffff8b720090d001 RBX: ffff8b720527d000 RCX: 000000000008dccc
[12743.692160] RDX: 000000000008dccb RSI: ffff8b723eda6160 RDI: ffff8b723e806e80
[12743.692162] RBP: ffff8b720275c000 R08: 0000000000026160 R09: 0000000000000000
[12743.692163] R10: ffffdb50a0024200 R11: 0000000000000a00 R12: 0000000000000000
[12743.692165] R13: ffff8b7205189800 R14: ffff8b720090ec00 R15: ffff8b71f16c8a00
[12743.692167] FS:  00007f88668e2ac0(0000) GS:ffff8b723ed80000(0000) knlGS:0000000000000000
[12743.692169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12743.692170] CR2: 00007f885f886000 CR3: 00000007fbc90000 CR4: 00000000003406e0
[12743.692172] Call Trace:
[12743.692231]  amdgpu_dm_atomic_check+0x1b1/0x3b0 [amdgpu]
[12743.692248]  drm_atomic_check_only+0x360/0x4f0 [drm]
[12743.692264]  drm_atomic_commit+0x13/0x50 [drm]
[12743.692278]  drm_atomic_connector_commit_dpms+0xdb/0x100 [drm]
[12743.692292]  drm_mode_obj_set_property_ioctl+0x178/0x280 [drm]
[12743.692307]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[12743.692320]  drm_mode_connector_property_set_ioctl+0x39/0x60 [drm]
[12743.692333]  drm_ioctl_kernel+0x5b/0xb0 [drm]
[12743.692346]  drm_ioctl+0x1b3/0x370 [drm]
[12743.692359]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[12743.692364]  ? _cond_resched+0x15/0x30
[12743.692404]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[12743.692408]  do_vfs_ioctl+0xa4/0x610
[12743.692411]  ksys_ioctl+0x60/0x90
[12743.692414]  ? ksys_read+0x9c/0xb0
[12743.692416]  __x64_sys_ioctl+0x16/0x20
[12743.692420]  do_syscall_64+0x5b/0x160
[12743.692423]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[12743.692425] Code: ff ff e9 21 fe ff ff 48 83 3c 24 00 0f 84 2f fe ff ff 48 8b 3c 24 e8 e7 99 07 00 80 7c 24 0f 00 0f 85 02 fe ff ff e9 40 ff ff ff <0f> 0b 48 83 c4 20 b8 ea ff ff ff 5b 5d 41 5c 41 5d 41 5e 41 5f 
[12743.692510] RIP: dm_update_crtcs_state+0x419/0x480 [amdgpu] RSP: ffffb4e688f07b30
[12743.692525] ---[ end trace fb5e2b69e8f8d9c9 ]---

There's no recovery from this; service lightdm restart doesn't restart it. shutdown -r now doens't reboot either, just hangs (can't tell why since screens are blank and ssh closes)
Comment 32 L.S.S. 2018-04-30 14:55:54 UTC
So it seems there definitely is a regression on 4.17 on this issue (the patches are not required as the lines were already there in 4.17). This time it isn't a kernel panic, but an "invalid opcode" error caused by dm_update_crtcs_state that was called by amdgpu_dm_atomic_check.

The issue is not 100% reproducible, but still means going to standby with an AMD GPU and DC is still dangerous and may result in losses of unsaved work.
Comment 33 Öyvind Saether 2018-05-01 13:37:14 UTC
In my case the problem was not having xorg-x11-drv-amdgpu installed (how embarrassing) which made xorg use xorg-x11-drv-ati. Yes, really. I assumed Fedora 28 beta installed it along with all the other drives and didn't realize until comparing the X logs on a box which didn't have a problem with one that did. I did file the Fedora bug kindly asking xorg-x11-drv-amdgpu to be installed as a default. 

Simply installing xorg-x11-drv-amdgpu solved this error and amdgpu kernel crash:
[12743.692030] kernel BUG at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:4708!

I realize the immensity of my stupidity not realizing xorg-x11-drv-amdgpu will make you laugh but it's not like the kernel panic message warned me.
Comment 34 Michel Dänzer 2018-05-01 13:56:26 UTC
While it's good to hear that xf86-video-amdgpu doesn't trigger it, the kernel BUG is still a kernel driver bug.
Comment 35 Adam Bolte 2018-06-23 13:26:26 UTC
I believe I've been seeing the same bug as of late.

[Sat Jun 23 23:02:04 2018] ------------[ cut here ]------------
[Sat Jun 23 23:02:04 2018] kernel BUG at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:4713!
[Sat Jun 23 23:02:04 2018] invalid opcode: 0000 [#1] SMP PTI
[Sat Jun 23 23:02:04 2018] Modules linked in: ipt_REJECT(E) nf_reject_ipv4(E) tun(E) bridge(E) stp(E) llc(E) fuse(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) snd_hrtimer(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_rawmidi(E) snd_seq(E) snd_seq_device(E) cpufreq_conservative(E) cpufreq_powersave(E) cpufreq_userspace(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_multiport(E) xt_conntrack(E) iptable_filter(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) xt_CHECKSUM(E) xt_tcpudp(E) iptable_mangle(E) binfmt_misc(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) mxm_wmi(E) amdkfd(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) amdgpu(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_codec_hdmi(E)
[Sat Jun 23 23:02:04 2018]  chash(E) gpu_sched(E) snd_hda_intel(E) kvm_intel(E) ttm(E) snd_hda_codec(E) efi_pstore(E) drm_kms_helper(E) snd_hda_core(E) snd_pcsp(E) kvm(E) snd_hwdep(E) snd_pcm_oss(E) drm(E) irqbypass(E) snd_mixer_oss(E) intel_cstate(E) snd_pcm(E) mei_me(E) i2c_algo_bit(E) intel_uncore(E) snd_timer(E) coretemp(E) vhba(OE) snd(E) iTCO_wdt(E) intel_rapl_perf(E) efivars(E) joydev(E) evdev(E) iTCO_vendor_support(E) soundcore(E) shpchp(E) mei(E) sg(E) intel_pch_thermal(E) wmi(E) video(E) acpi_pad(E) button(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) parport_pc(E) ppdev(E) sunrpc(E) lp(E) parport(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) btrfs(E) zstd_decompress(E) zstd_compress(E) xxhash(E) algif_skcipher(E) af_alg(E) raid10(E) raid456(E)
[Sat Jun 23 23:02:04 2018]  async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid1(E) multipath(E) linear(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_crypt(E) dm_mod(E) raid0(E) md_mod(E) hid_generic(E) usbhid(E) hid(E) sr_mod(E) cdrom(E) sd_mod(E) uas(E) usb_storage(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) pcbc(E) aesni_intel(E) ahci(E) xhci_pci(E) aes_x86_64(E) libahci(E) crypto_simd(E) nvme(E) xhci_hcd(E) cryptd(E) glue_helper(E) libata(E) i2c_i801(E) alx(E) mdio(E) nvme_core(E) scsi_mod(E) usbcore(E) fan(E) thermal(E)
[Sat Jun 23 23:02:04 2018] CPU: 2 PID: 1340 Comm: Xorg Tainted: G        W  OE     4.17.2+ #2
[Sat Jun 23 23:02:04 2018] Hardware name: MSI MS-7976/Z170A GAMING M7 (MS-7976), BIOS 1.J0 12/07/2017
[Sat Jun 23 23:02:04 2018] RIP: 0010:dm_update_crtcs_state+0x424/0x4b0 [amdgpu]
[Sat Jun 23 23:02:04 2018] RSP: 0018:ffffb84fc4affa90 EFLAGS: 00010246
[Sat Jun 23 23:02:04 2018] RAX: 0000000000000000 RBX: ffff9d7e34528280 RCX: fffff1505f079c9f
[Sat Jun 23 23:02:04 2018] RDX: 0000000000000017 RSI: ffff9d7e41f63800 RDI: 0000000000000286
[Sat Jun 23 23:02:04 2018] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[Sat Jun 23 23:02:04 2018] R10: ffffb84fc4affa90 R11: 00000000000005a0 R12: ffff9d7e41f63800
[Sat Jun 23 23:02:04 2018] R13: ffff9d7ec0f61800 R14: ffff9d7ec66a8c00 R15: 0000000000000000
[Sat Jun 23 23:02:04 2018] FS:  00007f614ec0ba40(0000) GS:ffff9d7eeec80000(0000) knlGS:0000000000000000
[Sat Jun 23 23:02:04 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Jun 23 23:02:04 2018] CR2: 00007f1dbb1ec0c8 CR3: 000000081aa56005 CR4: 00000000003606e0
[Sat Jun 23 23:02:04 2018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Sat Jun 23 23:02:04 2018] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Sat Jun 23 23:02:04 2018] Call Trace:
[Sat Jun 23 23:02:04 2018]  amdgpu_dm_atomic_check+0x1a1/0x3d0 [amdgpu]
[Sat Jun 23 23:02:04 2018]  drm_atomic_check_only+0x3f3/0x4f0 [drm]
[Sat Jun 23 23:02:04 2018]  ? handle_conflicting_encoders+0x26c/0x280 [drm_kms_helper]
[Sat Jun 23 23:02:04 2018]  drm_atomic_commit+0x13/0x50 [drm]
[Sat Jun 23 23:02:04 2018]  drm_atomic_helper_set_config+0x67/0x90 [drm_kms_helper]
[Sat Jun 23 23:02:04 2018]  __drm_mode_set_config_internal+0x67/0x110 [drm]
[Sat Jun 23 23:02:04 2018]  drm_mode_setcrtc+0x452/0x5a0 [drm]
[Sat Jun 23 23:02:04 2018]  ? amdgpu_cs_wait_ioctl+0xe5/0x160 [amdgpu]
[Sat Jun 23 23:02:04 2018]  ? drm_mode_getcrtc+0x170/0x170 [drm]
[Sat Jun 23 23:02:04 2018]  drm_ioctl_kernel+0x67/0xb0 [drm]
[Sat Jun 23 23:02:04 2018]  drm_ioctl+0x2d1/0x390 [drm]
[Sat Jun 23 23:02:04 2018]  ? drm_mode_getcrtc+0x170/0x170 [drm]
[Sat Jun 23 23:02:04 2018]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[Sat Jun 23 23:02:04 2018]  do_vfs_ioctl+0xa2/0x620
[Sat Jun 23 23:02:04 2018]  ? __x64_sys_futex+0x88/0x180
[Sat Jun 23 23:02:04 2018]  ksys_ioctl+0x70/0x80
[Sat Jun 23 23:02:04 2018]  __x64_sys_ioctl+0x16/0x20
[Sat Jun 23 23:02:04 2018]  do_syscall_64+0x55/0x100
[Sat Jun 23 23:02:04 2018]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Sat Jun 23 23:02:04 2018] RIP: 0033:0x7f614c650dd7
[Sat Jun 23 23:02:04 2018] RSP: 002b:00007ffd9280d9d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Sat Jun 23 23:02:04 2018] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f614c650dd7
[Sat Jun 23 23:02:04 2018] RDX: 00007ffd9280da10 RSI: 00000000c06864a2 RDI: 000000000000000d
[Sat Jun 23 23:02:04 2018] RBP: 00007ffd9280da10 R08: 0000000000000000 R09: 000055a69128dca0
[Sat Jun 23 23:02:04 2018] R10: 00007ffd9280dad0 R11: 0000000000000246 R12: 00000000c06864a2
[Sat Jun 23 23:02:04 2018] R13: 000000000000000d R14: 000055a690954b90 R15: 000055a69128dca0
[Sat Jun 23 23:02:04 2018] Code: 4c 89 ee 48 89 c7 e8 bc f5 ff ff 84 c0 0f 84 b7 fe ff ff e9 a0 fe ff ff 48 83 b8 08 0d 00 00 00 0f 85 67 ff ff ff e9 f5 fe ff ff <0f> 0b 41 8b 4f 60 48 c7 c2 d0 95 c7 c0 48 c7 c6 a0 57 ca c0 bf 
[Sat Jun 23 23:02:04 2018] RIP: dm_update_crtcs_state+0x424/0x4b0 [amdgpu] RSP: ffffb84fc4affa90
[Sat Jun 23 23:02:04 2018] ---[ end trace 293f9551ffc27adc ]---

This is on a Fiji card. I have a 144Hz FreeSync-capable monitor, and can easily reproduce the error with this command (where 143.86 is the xrandr-advertised maximum frequency):

xrandr --output DisplayPort-0 --mode 2560x1440 --rate 143.86 --set "scaling mode" "Full aspect"

Interestingly xrandr reports 59.95*+ as the current frequency, but my monitor says 144Hz. I tried firing up Grey Goo under Wine and that game reports my monitor running at 144Hz also. If I just run:

xrandr --output DisplayPort-0 --mode 2560x1440 --rate 143.86

then xrandr correctly reports 143.86* indicating that that frequency is now selected.

I can also run the following:

xrandr --output DisplayPort-0 --mode 2560x1440 --set "scaling mode" "Full aspect"

But if I combine these options as per the first command above, I get GUI crash.

The symptoms are simiar. In my case the screen is still on (not blank) but completely frozen. I was able to SSH in to get the above trace from the dmesg command. The machine cannot successfully shutdown or reboot and I need to physically hard reset the box at this point.

As others have said, this is definitely a regression. This didn't happen in older kernels.
Comment 36 Harry Wentland 2018-06-27 18:51:48 UTC
The kernel BUG should be fixed since 4.17 by this commit

commit 20d4ac659c76034586a3ab79489b0940631a65de
Author: Leo (Sunpeng) Li <sunpeng.li@amd.com>
Date:   Tue May 29 09:51:51 2018 -0400

    drm/amd/display: Fix BUG_ON during CRTC atomic check update

    For cases where the CRTC is inactive (DPMS off), where a modeset is not
    required, yet the CRTC is still in the atomic state, we should not
    attempt to update anything on it.

    Previously, we were relying on the modereset_required() helper to check
    the above condition. However, the function returns false immediately if
    a modeset is not required, ignoring the CRTC's enable/active state
    flags. The correct way to filter is by looking at these flags instead.

    Fixes: e277adc5a06c "drm/amd/display: Hookup color management functions"
    Bugzilla: https://bugs.freedesktop.org/106194

    Signed-off-by: Leo (Sunpeng) Li <sunpeng.li@amd.com>
    Reviewed-by: Harry Wentland <harry.wentland@amd.com>
    Tested-by: Michel Dänzer <michel.daenzer@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


I'll mark this as resolved but please reopen if it still reproduces.
Comment 37 L.S.S. 2018-07-05 05:56:23 UTC
Attempted to lock the screen and leave it blank for a while to see what happens, and it seems for the first time there are still errors related to VBLANK, but they appear minor as the system woke up just fine.

I tried letting the screen blank twice this time, and the errors did not show up the second time. Only during first time did the errors appear, though this is probably not enough to prove much. But still, I don't see that kernel bug this time so that must have been fixed.

I'm yet to assess whether the bug can no longer be reproduced as I've been avoiding having to leave the screen blank, as back then this bug has caused losses of unsaved work and other problems due to the crash.

Jul 05 13:35:42 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Jul 05 13:35:55 linuxsys blueman-mechanism[2281]: Exiting
Jul 05 13:36:02 linuxsys kernel: [drm:dm_logger_write [amdgpu]] *ERROR* Failed to get VBLANK!
Jul 05 13:36:22 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Jul 05 13:36:22 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Jul 05 13:36:22 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Jul 05 13:36:22 linuxsys kernel: WARNING: CPU: 14 PID: 1998 at drivers/gpu/drm/drm_vblank.c:620 drm_calc_vbltimestamp_from_scanoutpos+0x2af/0x2f0 [drm]
Jul 05 13:36:22 linuxsys kernel: Modules linked in: cmac rfcomm fuse bnep vmnet(O) nls_iso8859_1 nls_cp437 vfat fat arc4 amdkfd amd_iommu_v2 amdgpu iwlmvm chash gpu_sched i2c_algo_bit ttm mac80211 drm_kms_helper btusb uvcvideo btrtl iwlwif>
Jul 05 13:36:22 linuxsys kernel:  glue_helper pcspkr k10temp i2c_piix4 rtc_cmos shpchp tpm_tis_core battery ac tpm wmi rng_core asus_wireless gpio_amdpt i2c_hid pinctrl_amd evdev mac_hid acpi_cpufreq vmmon(O) vmw_vmci vboxnetflt(O) vboxnet>
Jul 05 13:36:22 linuxsys kernel: CPU: 14 PID: 1998 Comm: xfwm4 Tainted: G           O      4.17.3-1-MANJARO #1
Jul 05 13:36:22 linuxsys kernel: Hardware name: ASUSTeK COMPUTER INC. GL702ZC/GL702ZC, BIOS GL702ZC.303 12/15/2017
Jul 05 13:36:22 linuxsys kernel: RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x2af/0x2f0 [drm]
Jul 05 13:36:22 linuxsys kernel: RSP: 0018:ffff9bae0b6d7b38 EFLAGS: 00010082
Jul 05 13:36:22 linuxsys kernel: RAX: ffffffffc114e400 RBX: ffff8f4ff7840800 RCX: 0000000000000000
Jul 05 13:36:22 linuxsys kernel: RDX: 0000000000000001 RSI: ffffffffc0e128a0 RDI: 0000000000000001
Jul 05 13:36:22 linuxsys kernel: RBP: ffff9bae0b6d7b98 R08: 0000000000000000 R09: ffffffffc0df1770
Jul 05 13:36:22 linuxsys kernel: R10: 0000000000000000 R11: ffffffffc0f814d0 R12: 0000000000000001
Jul 05 13:36:22 linuxsys kernel: R13: ffff8f4fe8eb01d8 R14: ffff8f4fe8eb0000 R15: ffff9bae0b6d7bac
Jul 05 13:36:22 linuxsys kernel: FS:  00007f4c76bcdfc0(0000) GS:ffff8f4ffe980000(0000) knlGS:0000000000000000
Jul 05 13:36:22 linuxsys kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 05 13:36:22 linuxsys kernel: CR2: 00007f8de800bf98 CR3: 00000007520fc000 CR4: 00000000003406e0
Jul 05 13:36:22 linuxsys kernel: Call Trace:
Jul 05 13:36:22 linuxsys kernel:  drm_get_last_vbltimestamp+0x78/0x90 [drm]
Jul 05 13:36:22 linuxsys kernel:  drm_update_vblank_count+0x79/0x230 [drm]
Jul 05 13:36:22 linuxsys kernel:  drm_vblank_enable+0x101/0x120 [drm]
Jul 05 13:36:22 linuxsys kernel:  drm_vblank_get+0x8d/0xb0 [drm]
Jul 05 13:36:22 linuxsys kernel:  drm_wait_vblank_ioctl+0x138/0x630 [drm]
Jul 05 13:36:22 linuxsys kernel:  ? import_iovec+0x37/0xd0
Jul 05 13:36:22 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Jul 05 13:36:22 linuxsys kernel:  drm_ioctl_kernel+0x5b/0xb0 [drm]
Jul 05 13:36:22 linuxsys kernel:  drm_ioctl+0x1b7/0x370 [drm]
Jul 05 13:36:22 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Jul 05 13:36:22 linuxsys kernel:  ? do_iter_write+0xdc/0x190
Jul 05 13:36:22 linuxsys kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Jul 05 13:36:22 linuxsys kernel:  do_vfs_ioctl+0xa4/0x610
Jul 05 13:36:22 linuxsys kernel:  ? __sys_recvmsg+0x83/0xa0
Jul 05 13:36:22 linuxsys kernel:  ksys_ioctl+0x60/0x90
Jul 05 13:36:22 linuxsys kernel:  __x64_sys_ioctl+0x16/0x20
Jul 05 13:36:22 linuxsys kernel:  do_syscall_64+0x5b/0x170
Jul 05 13:36:22 linuxsys kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 05 13:36:22 linuxsys kernel: RIP: 0033:0x7f4c731df667
Jul 05 13:36:22 linuxsys kernel: RSP: 002b:00007ffcd9a51ae8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jul 05 13:36:22 linuxsys kernel: RAX: ffffffffffffffda RBX: 00007ffcd9a51b10 RCX: 00007f4c731df667
Jul 05 13:36:22 linuxsys kernel: RDX: 00007ffcd9a51b10 RSI: 00000000c018643a RDI: 000000000000000c
Jul 05 13:36:22 linuxsys kernel: RBP: 00000000017bde80 R08: 0000000001000109 R09: 0000000000000000
Jul 05 13:36:22 linuxsys kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c018643a
Jul 05 13:36:22 linuxsys kernel: R13: 00000000010037c3 R14: 000000000189dd10 R15: 0000000000000000
Jul 05 13:36:22 linuxsys kernel: Code: e9 af fd ff ff 44 89 e2 48 c7 c6 a0 28 e1 c0 bf 01 00 00 00 e8 53 ea ff ff 48 8b 83 98 03 00 00 48 83 78 28 00 0f 84 89 fd ff ff <0f> 0b 45 31 f6 e9 82 fd ff ff 48 c7 c7 68 28 e1 c0 45 31 f6 e8 
Jul 05 13:36:22 linuxsys kernel: ---[ end trace e82ad29a813c3d81 ]---
Jul 05 13:36:22 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Jul 05 13:36:22 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Jul 05 13:36:22 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '1'!
Comment 38 L.S.S. 2018-07-05 06:03:00 UTC
EDIT: Forgot to mention that I'm currently on 4.17.3-1-MANJARO kernel, which is the latest at the time of writing.
Comment 39 ecomer 2018-09-06 18:12:33 UTC
I have this same problem wirh 4.18.5-1-MANJARO x86_64
$ inxi
CPU: Dual Core AMD A6-9500 RADEON R5 8 COMPUTE CORES 2C+6G (-MCP-) 
speed/min/max: 1622/1400/3500 MHz Kernel: 4.18.5-1-MANJARO x86_64 Up: 14m 
Mem: 1365.6/15029.9 MiB (9.1%) Storage: 1.93 TiB (23.8% used) Procs: 189 
Shell: bash 4.4.23 inxi: 3.0.21 
Running an XFCE desktop.
Comment 40 Michel Dänzer 2018-09-07 07:40:40 UTC
(In reply to ecomer from comment #39)
> I have this same problem wirh 4.18.5-1-MANJARO x86_64

Please file your own report. The reporter of this one says it's fixed.
Comment 41 Ainola 2018-09-07 22:35:13 UTC
ecomer, as I suspect we are both experiencing the same issue, here's a new report you (and anyone else lurking) can follow:

https://bugs.freedesktop.org/show_bug.cgi?id=107863


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.