Bug 109806

Summary: [REGRESSION] X fails to start properly in 4.20.13
Product: DRI Reporter: Jason Tibbitts <tibbs>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: 2kmm, andreesteve, bugs, intel-gfx-bugs, opensuser, rosenpapazov
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard: Triaged
i915 platform: HSW i915 features:
Attachments:
Description Flags
dmesg from 4.20.13
none
Xorg.0.log from 4.20.13
none
dmesg from 4.20.12
none
Xorg.0.log from 4.20.12
none
dmesg from drm-tip
none
Xorg.0.log from drm-tip
none
dmesg drm-tip
none
Xorg.0.log from drm-tip
none
Xorg.0.log from drm-tip with drm.debug=0x1f
none
dmesg from drm-tip with drm.debug=0x1f
none
Xorg.0.log from drm-tip working
none
dmesg from drm-tip working none

Description Jason Tibbitts 2019-03-01 05:14:46 UTC
After updating from 4.20.12 to 4.20.13, I found that a couple of my machines failed to start X properly.  Things work when booting back to 4.20.12 with no other changes made to the system.  I note that there are a few i915-related patches which are new to 4.20.13.  I will be away for a few days but next week I'll try reverting some of them.

I'm running 4.20.13 on well over a hundred other hosts and haven't seen any complaints about them yet, though I haven't personally checked them all either.  So what's special about these two hosts?

Both machines have i5-4670 CPUs and are running with the integrated graphics, which I guess is Haswell. lspci says:

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 
v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

Certainly not a Xeon but I guess that's a generic PCI ID.

Both machines have two of a particular type of cheap monitor, and I think that's why these two machines are special as none of my other machines have those monitors.  Looking at the EDID info, all of the displays have the same manufacturer, model, year/week of manufacture and serial number ("0").  I've had a few problems in the past with desktop environments trying to be smart about the monitor layout and being confused by the monitors being essentially identical.

I wonder if it's possible that this new-in-4.20.13 patch is related?
    drm/i915/fbdev: Actually configure untiled displays
I guess that would be the first thing I'd try to revert.

In any case, I have no other machines that have two of those particular annoying monitors.

I'll attach dmesg from each kernel when booted with drm.debug=0x06, as well as the Xorg logs.
Comment 1 Jason Tibbitts 2019-03-01 05:15:22 UTC
Created attachment 143505 [details]
dmesg from 4.20.13
Comment 2 Jason Tibbitts 2019-03-01 05:15:48 UTC
Created attachment 143506 [details]
Xorg.0.log from 4.20.13
Comment 3 Jason Tibbitts 2019-03-01 05:16:11 UTC
Created attachment 143507 [details]
dmesg from 4.20.12
Comment 4 Jason Tibbitts 2019-03-01 05:16:35 UTC
Created attachment 143508 [details]
Xorg.0.log from 4.20.12
Comment 5 Jason Tibbitts 2019-03-01 05:39:06 UTC
I should add that when X fails to start properly, it does seem to actually have started, but the displays are black instead of displaying the greeter (which has a colored background).  The mouse cursor does appear and can be moved, but it looks like the display arrangement is off.  By default it would span the two monitors, and it looks like it still tries, but it seems like everything is offset by one monitor to the left.  So the mouse won't go any further right than the right edge of the left monitor, and will go off the left edge into nothingness.

You can switch to a different VT and get a console login, but it is shown only on the left monitor.  Normally I think it would clone.

I'm happy to do any testing which you think might help, but note that I will be away from those machines until Monday.
Comment 6 Chris Wilson 2019-03-01 08:59:10 UTC
(In reply to Jason Tibbitts from comment #0)
> I wonder if it's possible that this new-in-4.20.13 patch is related?
>     drm/i915/fbdev: Actually configure untiled displays
> I guess that would be the first thing I'd try to revert.

Well that is the only thing that changed. But the dmesg and Xorg.log look healthy and don't ostensibly look any different (same configurations). So it's likely due to the modeset being mishandled due to now inheriting state and not being forced to disable during boot.
Comment 7 marty.lists 2019-03-04 14:51:13 UTC
Hey!

I had the same issue but reverting
    drm/i915/fbdev: Actually configure untiled displays
(d179b88deb3bf6fed4991a31fd6f0f2cad21fab5)
Comment 8 marty.lists 2019-03-04 14:52:29 UTC
Hey!

I had the same issue but reverting
    drm/i915/fbdev: Actually configure untiled displays
    (d179b88deb3bf6fed4991a31fd6f0f2cad21fab5)
fixed it for me.
Comment 9 Jason Tibbitts 2019-03-04 18:00:20 UTC
I guess the Xorg log looks healthy until the last line.  I'm not sure if the other differences are just timing.  Not entirely sure what I should be looking for in the boot logs.

But after dropping this on 130 desktops and then taking a long weekend, it turns out that it does break more machines than just those two.  So far the problematic machines have all had 4th gen Intel CPUs (i5-4670 or i3-4130), Asus H81I-plus or H87I-plus motherboards and dual monitors.

Most of the monitors do have proper serial numbers, so that bit was a red herring.
Comment 10 CountMurphy 2019-03-05 14:10:33 UTC
I too am hit with the same bug, though running different hardware.

CPU: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series. 

GPU: VGA compatible controller: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Integrated Graphics Controller (rev 21).  The only display is the laptop screen. I am using nomode set like the others though

Compiling the kernel with d179b88deb3bf6fed4991a31fd6f0f2cad21fab5 reverted has fixed this issue for me.
Comment 11 Lakshmi 2019-03-06 07:39:43 UTC
Jason, Can you please try to verify the issue with latest drmtip? You can check with any of the machine first.
(https://cgit.freedesktop.org/drm-tip)

If the issue persists with the latest drmtip, attach the dmesg and xorg logs here.
Comment 12 lferi 2019-03-06 12:52:52 UTC
Hit by the same issue on a:

VGA compatible controller: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series PCI Configuration Registers (rev 36)

Present also with kernel 5.0rc8 for me.

I also confirm that reverting the commit d179b88deb3bf6fed4991a31fd6f0f2cad21fab5
fixes the issue on 4.20.13.
Comment 13 Jason Tibbitts 2019-03-07 01:54:28 UTC
Created attachment 143556 [details]
dmesg from drm-tip

I cloned and built drm-tip and it appears to have the same issue.  Here's the dmesg with drm.debug=0x06.  The Xorg log will follow.
Comment 14 Jason Tibbitts 2019-03-07 01:54:55 UTC
Created attachment 143557 [details]
Xorg.0.log from drm-tip
Comment 15 Jason Tibbitts 2019-03-07 03:19:37 UTC
I noticed that several changes were merged since I originally made the clone, so I pulled and tried again but there was no change to mention.

The dmesg and log I attached were from the tree at 171d156257eeb2ae4171adae3807700da724451d but the test I just did was with the tree at 5d4de376b9a03c2f74e049ee6a8221df96687ba0.  At least assuming I've done this all properly.
Comment 16 lferi 2019-03-07 05:12:56 UTC
Created attachment 143558 [details]
dmesg drm-tip
Comment 17 lferi 2019-03-07 05:13:59 UTC
Created attachment 143559 [details]
Xorg.0.log from drm-tip
Comment 18 CountMurphy 2019-03-07 21:42:02 UTC
I can confirm that drm-tip still contains this issue. Log files available if needed.
Comment 19 Maarten Lankhorst 2019-03-08 06:41:11 UTC
Could you try running with drm.debug=0x1f and then post dmesg and xorg log from the same run?
Comment 20 lferi 2019-03-08 13:44:08 UTC
Created attachment 143595 [details]
Xorg.0.log from drm-tip with drm.debug=0x1f
Comment 21 lferi 2019-03-08 13:45:19 UTC
Created attachment 143597 [details]
dmesg from drm-tip with drm.debug=0x1f
Comment 22 lferi 2019-03-09 20:36:58 UTC
Created attachment 143602 [details]
Xorg.0.log from drm-tip working

Reverting the commit d179b88deb3bf6fed4991a31fd6f0f2cad21fab5 with drm-tip still fixes the issue. Logs from a working run with drm-tip attached.
Comment 23 lferi 2019-03-09 20:38:06 UTC
Created attachment 143603 [details]
dmesg from drm-tip working
Comment 24 Maarten Lankhorst 2019-03-10 15:38:04 UTC
Not a bug in the kernel, xorg's modesetting driver needs to set all connectors/crtc's directly when using atomic. The legacy path disabled crtc B for you if you stole all its connectors for a different crtc. In the atomic case you need to disable it yourself.

This is a bug in x.org's modesetting driver.
Comment 25 lferi 2019-03-11 05:00:48 UTC
Looks like it has been already submitted to xorg https://gitlab.freedesktop.org/xorg/xserver/issues/542.
Comment 26 Maarten Lankhorst 2019-03-11 09:25:57 UTC

*** This bug has been marked as a duplicate of bug 107100 ***
Comment 27 ROSEN PAPAZOV 2019-04-08 16:41:06 UTC
*** Bug 110319 has been marked as a duplicate of this bug. ***
Comment 28 Ville Syrjala 2019-05-03 12:20:26 UTC
*** Bug 110591 has been marked as a duplicate of this bug. ***
Comment 29 Ville Syrjala 2019-05-03 12:22:29 UTC
The kernel stuff was reverted:

commit 9fa246256e09dc30820524401cdbeeaadee94025
Author: Dave Airlie <airlied@redhat.com>
Date:   Wed Apr 24 10:47:56 2019 +1000

    Revert "drm/i915/fbdev: Actually configure untiled displays"
Comment 30 Lakshmi 2019-05-29 11:23:43 UTC
*** Bug 110030 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.