Bug 108037

Summary: Turning monitors off and on again makes the kernel panic and system freeze
Product: DRI Reporter: Öyvind Saether <oyvinds>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: FD, harry.wentland, nicholas.kazlauskas, sunpeng.li
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
kernel panic messages
none
Found that 4.18.9 has the same problem.
none
Turned monitors off, went out for some hours, came back, turned them on, picture didn't come on montiors, instead sleep mode, PC froze
none
Tried kernel 4.19.0-rc7 with Xorg Option "DPMS" "Disable", still happened
none
This is still a problem with kernel 4.20.3, happens sometime when turning multiple monitors on at the same time
none
Kernel 5.0.0-rc5 still has this problem none

Description Öyvind Saether 2018-09-24 08:14:38 UTC
Created attachment 141707 [details]
kernel panic messages

>have 3 displays connected with DisplayPort
>turn them off
>turn them on again later
>kernel crashes, RIP: 0010:dal_gpio_service_open+0x1c5/0x220 [amdgpu]
This was with 4.19.0-rc4 on Fedora 29. Graphics card RX 570. This is new, did not use to be a problem before. Isn't a problem with 4.18.5. /var/log/messages did have a backtrace which is attached. A small interesting detail from the log was that the machine appeared to function normally until 2 minutes before reboot which is what makes me sure it was turning the monitors on that made the kernel panic.
Comment 1 Michel Dänzer 2018-09-24 08:18:37 UTC
Can you bisect?

P.S. FYI, this is what's called an oops, not a panic.
Comment 2 Alex Deucher 2018-09-24 14:18:45 UTC
When you say turn them off/on, do you mean via software (e.g., dpms) or via the switch on the monitor?
Comment 3 Öyvind Saether 2018-09-24 16:46:31 UTC
(In reply to Michel Dänzer from comment #1)
> Can you bisect?
> P.S. FYI, this is what's called an oops, not a panic.

oops, then. yes I now know how to bisect. I can.

(In reply to Alex Deucher from comment #2)
> When you say turn them off/on, do you mean via software (e.g., dpms) or via
> the switch on the monitor?

I used the actual power button on the monitors that have them and the stupid joystick thing on the third monitor (works like a button) when this first happens but I just found that it doesn't matter, triggered it on 4.18.9 with dpms too.

On 4.18.9:
> set xscreensaver to immediately turn monitors off with dpms
> lock screen
> wait for monitors go into suspend
> press button on keyboard
> monitors don't come back on, system freeze
will attach same-problem-4.18.9.txt with that one. 4.18.5 is fine. I notice there was a lot of amdgpu changes between 4.18.7 and 4.18.8 in the log. I have used 4.19 git kernels for some time and this didn't happen to me until 4.19rc4 but that could simply be because I didn't turn off the monitors(???). 

bisecting will take time.
Comment 4 Öyvind Saether 2018-09-24 16:47:03 UTC
Created attachment 141716 [details]
Found that 4.18.9 has the same problem.
Comment 5 Öyvind Saether 2018-09-28 20:43:10 UTC
After 1 day, 12:48 with 4.19.0-rc5-ChaeKyung-April I've yet to trigger this and I've tried. I don't see why it would magically be fixed but I can't really provide any useful information. I could between rc1 and rc3 but I couldn't trigger it too reliably and do an accurate bisect. Might as well re-open if I it happens again and I can provide any actual useful information.
Comment 6 Öyvind Saether 2018-10-02 15:48:52 UTC
Created attachment 141836 [details]
Turned monitors off, went out for some hours, came back, turned them on, picture didn't come on montiors, instead sleep mode, PC froze
Comment 7 Öyvind Saether 2018-10-02 15:53:10 UTC
Looks like this is still a problem with 4.19.0-rc6-ChaeKyung-April. Happened when turning monitors off and leaving for hours and coming back & turning them on again.

If that what's required to make sure this triggers then bisecting will take ages. Last time I tried I got a irrelevant result which is probably because it doesn't happen if I turn them off and very soon turn them on again. Also didn't seem to happen with 4.19.0-rc5 but I didn't try turning them on and leaving for a few hours and turning them on when I got the impression that one works fine. Perhaps it doesn't.
Comment 9 Öyvind Saether 2018-10-08 13:28:28 UTC
Created attachment 141939 [details]
Tried kernel 4.19.0-rc7 with Xorg Option "DPMS" "Disable", still happened

Kernel 4.19.0-rc7 and setting Xorg Option "DPMS" "Disable" in the X configuration file (it's actually disabled according to xset) didn't help, turned off monitors, turned them back on later, X was frozen but this time it was possible to ssh into the box and dmesg>amdgpu-fail.txt. of course X couldn't be stopped or restarted at that point. 

since it says mod_freesync_set_user_enable+0x11f/0x150 somewhere I've turned freesync off on the one display that supports it ASUS VP28U. The other two are ASUS PB27U.
Comment 10 Öyvind Saether 2018-11-18 13:48:15 UTC
I turned off freesync on one monitor and have been using

Section "Extensions"
    Option      "DPMS" "Disable"
EndSection

in /etc/X11/xorg.conf.d/20-amdgpu.conf which appears to prevent this from happening. Tried commenting "DPMS" "Disable" on kernel 4.19.2-ChaeKyung to see what would happen. The box froze when I turned monitors off for some time and turned them back on again. This is clearly still a bug in 4.19.2 which is unfortunate since that kernel is a LTS kernel.
Comment 11 Öyvind Saether 2019-01-22 16:32:21 UTC
Created attachment 143199 [details]
This is still a problem with kernel 4.20.3, happens sometime when turning multiple monitors on at the same time

turn 3 monitors on after they've been of for a while and this bug likely happens and X is frozen. Can still ssh in. A workaround seems to be to turn one monitor on and wait a bit and then turn on another and wait a bit until turning on the third.
Comment 12 Öyvind Saether 2019-02-04 15:26:58 UTC
Created attachment 143286 [details]
Kernel 5.0.0-rc5 still has this problem

[27295.165873] Call Trace:                                                                                                                                       
[27295.165917]  ? dc_validate_stream+0x5d/0x90 [amdgpu]                                                                                                          
[27295.165921]  ? radix_tree_delete_item+0x69/0xc0
[27295.165958]  dc_stream_release+0x28/0x50 [amdgpu]
[27295.165997]  dc_resource_state_destruct+0x4d/0x70 [amdgpu]
[27295.166035]  dc_state_free+0x15/0x20 [amdgpu]
[27295.166076]  dm_atomic_destroy_state+0x1c/0x30 [amdgpu]
[27295.166089]  drm_atomic_state_default_clear+0x201/0x280 [drm]
[27295.166099]  __drm_atomic_state_free+0x13/0x50 [drm]
[27295.166105]  drm_atomic_helper_set_config+0x5a/0x90 [drm_kms_helper]
[27295.166115]  drm_mode_setcrtc+0x191/0x670 [drm]
[27295.166148]  ? amdgpu_cs_wait_ioctl+0x92/0x160 [amdgpu]
[27295.166157]  ? drm_mode_getcrtc+0x180/0x180 [drm]
[27295.166165]  drm_ioctl_kernel+0xa9/0xf0 [drm]
[27295.166174]  drm_ioctl+0x207/0x3c0 [drm]
[27295.166183]  ? drm_mode_getcrtc+0x180/0x180 [drm]
[27295.166213]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[27295.166216]  do_vfs_ioctl+0xa5/0x620
[27295.166218]  ksys_ioctl+0x60/0x90
[27295.166219]  __x64_sys_ioctl+0x16/0x20
[27295.166221]  do_syscall_64+0x55/0x150
[27295.166224]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[27295.166226] RIP: 0033:0x7f78fac8909b
Comment 13 Öyvind Saether 2019-04-25 06:52:43 UTC
I don't have this problem with 5.1.0-rc6.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.