Bug 109668 - ctrl-alt-f2 + alt-f1 always crashes X server
Summary: ctrl-alt-f2 + alt-f1 always crashes X server
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-18 22:04 UTC by Jan Kratochvil
Modified: 2019-09-03 19:45 UTC (History)
5 users (show)

See Also:
i915 platform: KBL
i915 features: display/Other


Attachments
kernel log before the crash with: drm.debug=14 log_buf_len=16M (381.61 KB, text/plain)
2019-02-18 22:04 UTC, Jan Kratochvil
no flags Details
libdrm debug output - #26+#27 are the error=-22 ones (1.52 KB, application/octet-stream)
2019-02-18 22:05 UTC, Jan Kratochvil
no flags Details
libdrm debug patch (+errors suppression) (1.91 KB, patch)
2019-02-18 22:06 UTC, Jan Kratochvil
no flags Details | Splinter Review
xorg-x11-server-1.20.4-1.fc29 patch (835 bytes, patch)
2019-06-03 11:56 UTC, Jan Kratochvil
no flags Details | Splinter Review
i915 successful resume after xlock -dpmoff 1 (5.95 KB, text/plain)
2019-06-11 08:29 UTC, Jan Kratochvil
no flags Details
i915 failed resume after alt-f1 back to the X (1.80 KB, text/plain)
2019-06-11 08:33 UTC, Jan Kratochvil
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Kratochvil 2019-02-18 22:04:26 UTC
Created attachment 143399 [details]
kernel log before the crash with: drm.debug=14 log_buf_len=16M

00:02.0 0300: 8086:5917 (rev 07)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
vendor_id	: GenuineIntel
cpu family	: 6
model		: 142
model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
stepping	: 10
microcode	: 0x9a
        Manufacturer: LENOVO
        Product Name: 20KGS23S08
        Version: ThinkPad X1 Carbon 6th

There is a suspend-resume bug in Fedora:
modeset(0): failed to set mode: No such file or directory | modeset(0): failed to set mode: Invalid argument
https://bugzilla.redhat.com/show_bug.cgi?id=1662057

The suspend-resume bug happens only sometimes.  But a very similar bug happens just with always-reproducible ctrl-alt-f2 + alt-f1 (switch to text console and back) so I am trying to debug that one first.

The problem happens in kernel DRM_IOCTL_MODE_ATOMIC, expecting i915.ko.
It is called by libdrm drmModeAtomicCommit().
kernel returns -22=EINVAL=Invalid argument

If one clears the error (the patch does this) X server no longer crashes but the display remains black.

Tried these Fedora kernels but they all behave the same:
FAIL kernel-4.20.8-200.fc29.x86_64
FAIL kernel-5.0.0-0.rc6.git1.1.fc30.x86_64
FAIL kernel-4.18.18-200.fc28.x86_64
Tried drm-tip kernel but it did hang when it should ask for LUKS password on my system.  So I could not test drm-tip.  It was built by kernel.spec as vanilla from:
  https://github.com/freedesktop/drm-tip.git
  7f6ace5f10a9d6c5d277b95e39f862eff87fdb45 = drm-tip

The failed DRM_IOCTL_MODE_ATOMIC data is (I haven't decoded it more):
atomic=0x7ffe3fceaa00 00 05 00 00 03 00 00 00 e0 7d ea ab 97 55 00 00 f0 c2 b1 ab 97 55 00 00 00 a9 33 ab 97 55 00 00 d0 33 d1 ab 97 55 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
objs_ptr=0x5597abea7de0 1c 00 00 00 29 00 00 00 67 00 00 00
count_props_ptr=0x5597abea7de0 0a 00 00 00 02 00 00 00 01 00 00 00
props_ptr=0x5597ab33a900 13 00 00 00 10 00 00 00 0f 00 00 00 0e 00 00 00 0d 00 00 00 0c 00 00 00 0b 00 00 00 0a 00 00 00 09 00 00 00 08 00 00 00 15 00 00 00 14 00 00 00 13 00 00 00
prop_values_ptr=0x5597abd133d0 29 00 00 00 00 00 00 00 6b 00 00 00 00 00 00 00 70 08 00 00 00 00 00 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70 08 00 00 00 00 00 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6e 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 29 00 00 00 00 00 00 00
user_data=(nil)
errno=4
ret=-22,errno=22
Comment 1 Jan Kratochvil 2019-02-18 22:05:42 UTC
Created attachment 143400 [details]
libdrm debug output - #26+#27 are the error=-22 ones
Comment 2 Jan Kratochvil 2019-02-18 22:06:26 UTC
Created attachment 143401 [details] [review]
libdrm debug patch (+errors suppression)
Comment 3 Jan Kratochvil 2019-02-18 22:10:47 UTC
Screen 0: minimum 320 x 200, current 3840 x 2160, maximum 8192 x 8192
eDP-1 connected (normal left inverted right x axis y axis)
   1920x1080     60.01 +  60.01    59.97    59.96    59.93    48.01  
...
DP-1-2 connected primary 3840x2160+0+0 (normal left inverted right x axis y axis) 600mm x 340mm
   3840x2160     60.00*+  30.00  
...

Using only the external 3840x2160 display, 1920x1080 is turned off in X (MATE).
Comment 4 Jan Kratochvil 2019-02-18 22:17:19 UTC
In Xorg.0.log the crash (due to libdrm error) is shown as:

[125371.707] (EE) modeset(0): failed to set mode: No such file or directory
[125371.707] (EE)
Fatal server error:
[125371.707] (EE) EnterVT failed for screen 0
...
[125372.555] (EE) Server terminated with error (1). Closing log file.
Comment 5 Lakshmi 2019-02-19 07:40:01 UTC
Jan, correct if I misunderstood the issue.

switching to text console and back (using ctrl-alt-f2 and ctrl-alt-f1) crashes X server.
But, after applying the attached patch libdrm debug patch (+errors suppression), you noticed that X server didn't crash but display remains black.
Comment 6 Lakshmi 2019-02-22 11:30:54 UTC
(In reply to Lakshmi from comment #5)
> Jan, correct if I misunderstood the issue.
> 
> switching to text console and back (using ctrl-alt-f2 and ctrl-alt-f1)
> crashes X server.
> But, after applying the attached patch libdrm debug patch (+errors
> suppression), you noticed that X server didn't crash but display remains
> black.

Jan replied back saying "Yes, right." to the above question.
Comment 7 martin+foss 2019-04-10 12:37:32 UTC
I have similar issues. I have seen the same log entry with modeset 0 when switching to vt.

Sometimes the screens doesnt come up after sleep.

And sometimes the computer goes slow and then almost freezes, down to the point the keyboard doesn't respond, but mouse reacts sometimes.

It is on two different older intel laptops. 

Most issues is with the one connected in a docking station, which was the one I saw this error for.

It isn't new, but it has been better or worse.

Brgds from Martin
Comment 8 Jan Kratochvil 2019-05-02 13:27:08 UTC
(In reply to martin+foss from comment #7)
> Most issues is with the one connected in a docking station,

I have found the problem may be in the Thuderbold 3 Lenovo docking station itself.  As even after powercycling the laptop (X1 Carbon) it still did not work until I powercycled the docking station.

Maybe because I still have the original firmware 1.0 as I failed to upgrade it to 2.0 as it is Windows-only (and the firmware upgrade even failed in Windows I installed for it) - they have some new firmware upgrade tool now but it is again only for Windows:
https://support.lenovo.com/us/en/solutions/acc100356
Comment 9 Richard Feiler 2019-05-14 22:07:12 UTC
I have the same problem on my T420 with Intel IGP. I noticed this only when I'm hooked on external monitor via displayport (I do not have docking station)
Comment 10 Jan Kratochvil 2019-06-03 11:56:30 UTC
Created attachment 144426 [details] [review]
xorg-x11-server-1.20.4-1.fc29 patch

xorg-x11-server-1.20.4-1.fc29 patch although it is equivalent to what the libdrm patch 143401 does - ignore any errors from drmModeAtomicCommit().

I can confirm it is unrelated to docking station.  When I removed the docking station (connected to LG 27UK650 (Xorg.log id "LG HDR 4K") by DisplayPort) and connected the display directly to Lenovo X1 Carbon HDMI port it did behaved the same.

Besides ctrl-alt-Fx switching consoles the problem also happens after DPMS Off (xlock -dpmsoff), I have changed it now to DPMS Suspend (xlock -dpmssuspend) and it looks as the locked up display does not happen anymore.

With this patch screen remains black and one can recover it by disconnecting and reconnected the display (after resuming from DPMS Off); the same can be done by disconnecting+reconnecting the docking station (or powercycling the docking station).

I have also updated main BIOS of Lenovo X1 Carbon to 1.38 now, no effect (it even appears to me it happens more often than with 1.34 before).
Comment 11 Jan Kratochvil 2019-06-11 08:29:40 UTC
Created attachment 144500 [details]
i915 successful resume after xlock -dpmoff 1
Comment 12 Jan Kratochvil 2019-06-11 08:33:33 UTC
Created attachment 144501 [details]
i915 failed resume after alt-f1 back to the X

The problem is that "enabled/connectors mismatch" but where is the problem?
kernel-5.1.8-200.fc29.x86_64

-=working xlock resume
+=black/failing alt-f1
 i915 0000:00:02.0: [drm] crtc[47]: pipe A
 i915 0000:00:02.0: [drm]       enable=0
 i915 0000:00:02.0: [drm]       active=0
 i915 0000:00:02.0: [drm]       planes_changed=0
 i915 0000:00:02.0: [drm]       mode_changed=0
 i915 0000:00:02.0: [drm]       active_changed=0
 i915 0000:00:02.0: [drm]       connectors_changed=0
 i915 0000:00:02.0: [drm]       color_mgmt_changed=0
 i915 0000:00:02.0: [drm]       plane_mask=1
-i915 0000:00:02.0: [drm]       connector_mask=0
-i915 0000:00:02.0: [drm]       encoder_mask=4
+i915 0000:00:02.0: [drm]       connector_mask=1
+i915 0000:00:02.0: [drm]       encoder_mask=1
 i915 0000:00:02.0: [drm]       mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
-i915 0000:00:02.0: [drm] connector[86]: DP-5
-i915 0000:00:02.0: [drm]       crtc=(null)
 [drm:drm_atomic_check_only [drm]] checking 000000004664b9ab
 [drm:drm_atomic_helper_check_modeset [drm_kms_helper]] [CRTC:47:pipe A] mode changed
 [drm:drm_atomic_helper_check_modeset [drm_kms_helper]] [CRTC:47:pipe A] enable changed
 [drm:drm_atomic_helper_check_modeset [drm_kms_helper]] [CRTC:47:pipe A] active changed
-[drm:drm_atomic_helper_check_modeset [drm_kms_helper]] Updating routing for [CONNECTOR:86:DP-5]
-[drm:drm_atomic_helper_check_modeset [drm_kms_helper]] Disabling [CONNECTOR:86:DP-5]
+[drm:drm_atomic_helper_check_modeset [drm_kms_helper]] [CRTC:47:pipe A] enabled/connectors mismatch
Comment 13 Jan Kratochvil 2019-06-15 20:39:59 UTC
It has been fixed (workarounded?) by Driver "intel" from:
  https://bugzilla.redhat.com/show_bug.cgi?id=1630367#c18
There is also: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=1697591

It in effect does:
        -LoadModule: "fb"
        -LoadModule: "fbdevhw"
        -LoadModule: "glamoregl"
        -LoadModule: "modesetting"
        +LoadModule: "dri2"
        +LoadModule: "dri3"
        +LoadModule: "intel"
        +LoadModule: "present"
Comment 14 Lakshmi 2019-06-17 06:01:52 UTC
(In reply to Jan Kratochvil from comment #13)
> It has been fixed (workarounded?) by Driver "intel" from:
>   https://bugzilla.redhat.com/show_bug.cgi?id=1630367#c18
> There is also: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=1697591
> 
> It in effect does:
>         -LoadModule: "fb"
>         -LoadModule: "fbdevhw"
>         -LoadModule: "glamoregl"
>         -LoadModule: "modesetting"
>         +LoadModule: "dri2"
>         +LoadModule: "dri3"
>         +LoadModule: "intel"
>         +LoadModule: "present"

Thanks for the feedback. Would you mind if I close this issue?
Comment 15 Jan Kratochvil 2019-06-17 06:33:52 UTC
So it is not a bug in the userland Intel driver but isn't there some bug in the kernel i915 driver? Why is the 'connector' missing there in some cases?
Comment 16 Swati Sharma 2019-08-22 14:21:30 UTC
Hi Kratochvil,

drm_atomic_helper_check_modeset() is used to validate state object for modeset changes. This function is returning EINVAL. In turn, IOCTL is getting failed. 

We entered into the above scenario because blob which is used for setting mode property for CRTC is NULL which leads to NOMODE and setting state->enable to false.
[drm:drm_atomic_set_mode_prop_for_crtc [drm]] Set [NOMODE] for [CRTC:47:pipe A] state 00000000b992248f

Which later leads to "enabled/connectors mismatch".

Looks like it may not be a driver issue and it could be how commits are being sent from the userland in case of switching from console to graphics mode.

Tried with vanilla Ubuntu 16.04 with 2 external displays and kernel version 5.1.0+ however was not able to reproduce issue.

If you are still getting the issue with the latest kernel, please send proper steps to reproduce the issue with proper configuration like which all displays getting used, kernel version, display_info, userland library details etc. 

#assessed


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.