Bug 107441 - Black or broken VT when loading i915.ko "late"
Summary: Black or broken VT when loading i915.ko "late"
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Jan-Marek Glogowski
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-01 10:58 UTC by Jan-Marek Glogowski
Modified: 2018-09-07 14:40 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Broken LiMux plymouth bootsplash logo (2.45 MB, image/jpeg)
2018-08-01 10:58 UTC, Jan-Marek Glogowski
no flags Details
dmesg 4.4 only (371.60 KB, text/plain)
2018-08-01 11:01 UTC, Jan-Marek Glogowski
no flags Details
dmesg 4.4 with 4.15 load after initramfs (275.22 KB, text/plain)
2018-08-01 11:22 UTC, Jan-Marek Glogowski
no flags Details
dmesg 4.4 with 4.15 load in initramfs (234.40 KB, text/plain)
2018-08-01 11:22 UTC, Jan-Marek Glogowski
no flags Details
Filter script for dmesg to compare drm output (131 bytes, application/x-shellscript)
2018-08-01 11:24 UTC, Jan-Marek Glogowski
no flags Details
dmesg 4.18 with i915.ko in initramfs (579.61 KB, text/plain)
2018-08-01 12:57 UTC, Jan-Marek Glogowski
no flags Details
dmesg 4.18 with i915.ko just in the root fs (393.72 KB, text/plain)
2018-08-01 12:58 UTC, Jan-Marek Glogowski
no flags Details
dmesg 4.18+drmtip with i915.ko in initramfs (593.84 KB, text/plain)
2018-08-13 13:57 UTC, Jan-Marek Glogowski
no flags Details
dmesg 4.18+drmtip with i915.ko just in the root fs (388.88 KB, text/plain)
2018-08-13 14:10 UTC, Jan-Marek Glogowski
no flags Details
initramfs-tools hook to remove i915.ko module (313 bytes, application/x-shellscript)
2018-08-17 11:05 UTC, Jan-Marek Glogowski
no flags Details

Description Jan-Marek Glogowski 2018-08-01 10:58:47 UTC
Created attachment 140911 [details]
Broken LiMux plymouth bootsplash logo

My hardware is a Skylake notebook (Fujitsu U757) and desktop (Acer Veriton n4640g).
I'm on Ubuntu 12.04 with 14.04 HWE kernel, but this problem also happens with current 4.18rc7.
Both HWs have two monitors connected via DisplayPort, but it also happens with a single monitor.

When loading the i915.ko driver late in the boot process (after initramfs), when replacing the vesafb with the inteldrmfb, plymouth boot splash breaks in various ways. Sometimes the image offset is wrong (see attached image), sometimes the stride is wrong and sometimes the the screen is black.
If I manually stop and start the plymouth daemon, everything is fine.
The VT is correct, if pressing ESC and pressing ESC again restores the broken VT.

For me it looks like the main difference between a broken and working kernel is the changed framebuffer allocation as a result of:

commit 3774eb507e7b7df7f9b7d8d867eea330c7146aaa
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Mon Aug 10 14:57:32 2015 -0300

    drm/i915: fix stolen bios_reserved checks

as this forces the framebuffer to be newly allocated instead of claiming the stolen memory from the VESA BIOS, which is in a reserved area.

After testing various kernels and drivers and checking DRM debug output, I realized it's probably a timing problem:

* If I put the driver into the initramfs, it works correctly
* If I remove the driver from the initramfs, it always breaks

It can be broken in various ways:
* Black screen
* Shifted image due to wrong offset, like in the attached photo
* Shifted image due to a wrong width / stride, which is just a mess

I tested this behavior with:
* Ubuntu 12.04 + linux-image-4.4.0-130-generic (4.4.0-130.156~14.04.1) + my 4.15 DRM backport
* Ubuntu 18.04 + linux-image-4.15.0-29-generic (4.15.0-29.31)
* Ubuntu 18.04 + linux-image-4.16.18-041618-generic (4.16.18-041618.201806252030)
* Ubuntu 18.04 + linux-image-4.17.11-041711-generic (4.17.11-041711.201807280505)
* Ubuntu 18.04 + linux-image-4.18.0-041800rc7-generic (4.18.0-041800rc7.201807292230)

Now I don't know if this is really a kernel or a plymouth problem, because eventually the i915 driver in the initramfs is loaded before plymouth.

------ Background info

I'm trying to backport DRM 4.15 to 4.4 (AKA Ubuntu Bionic 18.04 => Ubuntu Trusty 12.04 + Xenial HWE).
And I'm mainly interested in the i915 driver and the DisplayPort fixes, as I know Ubuntu 18.04 is able to bring up the 2nd monitor on the Acer HW, which fails with 4.4 kernel.

I also hope the 4.15 backport fixes some power management wakeup problem, where the monitor doesn't wakes up after lunch. Most times the workaround seems to be to power-cycle the monitor, but sometimes people also got PIPE underrun errors, which need a reboot. Since this is not really reproducible, I don't know how to debug this, as it just happens with the users already running the HW.

I also tried the code from https://launchpad.net/~canonical-hwe-team/+archive/ubuntu/ppa, which contains a port of 4.9 => 4.4.
This works correctly for this problem (it still steals the memory from the vesafb), but doesn't contain the DisplayPort fixes I'm actually interested in. The fixes allows my HW to use the 2nd DP, which works with 4.15 (and which got broken again in 4.17 AFAIK, but that's an other bug I'll open).
Comment 1 Jan-Marek Glogowski 2018-08-01 11:01:30 UTC
Created attachment 140912 [details]
dmesg 4.4 only
Comment 2 Chris Wilson 2018-08-01 11:10:06 UTC
Dmesg of before/after 3774eb507e7b7df7f9b7d8d867eea330c7146aaa would be nice.
Comment 3 Jan-Marek Glogowski 2018-08-01 11:22:09 UTC
Created attachment 140913 [details]
dmesg 4.4 with 4.15 load after initramfs
Comment 4 Jan-Marek Glogowski 2018-08-01 11:22:31 UTC
Created attachment 140914 [details]
dmesg 4.4 with 4.15 load in initramfs
Comment 5 Jan-Marek Glogowski 2018-08-01 11:24:14 UTC
Created attachment 140915 [details]
Filter script for dmesg to compare drm output
Comment 6 Chris Wilson 2018-08-01 11:29:33 UTC
So it's the regression from around 3.2 where disabling active outputs that we fail to takeover, with the insistence by the HW people that we had to avoid using stolen offset==0 in bdw.
Comment 7 Chris Wilson 2018-08-01 11:37:03 UTC
commit 011f22eb545a35f972036bb6a245c95c2e7e15a0
Author: Hans de Goede <j.w.r.degoede@gmail.com>
Date:   Fri Apr 20 11:59:33 2018 +0200

    drm/i915: Do NOT skip the first 4k of stolen memory for pre-allocated buffers v2
    
    Before this commit the WaSkipStolenMemoryFirstPage workaround code was
    skipping the first 4k by passing 4096 as start of the address range passed
    to drm_mm_init(). This means that calling drm_mm_reserve_node() to try and
    reserve the firmware framebuffer so that we can inherit it would always
    fail, as the firmware framebuffer starts at address 0.
    
    Commit d43537610470 ("drm/i915: skip the first 4k of stolen memory on
    everything >= gen8") says in its commit message: "This is confirmed to fix
    Skylake screen flickering issues (probably caused by the fact that we
    initialized a ring in the first page of stolen, but I didn't 100% confirm
    this theory)."
    
    Which suggests that it is safe to use the first page for a linear
    framebuffer as the firmware is doing (see note below).
    
    This commit always passes 0 as start to drm_mm_init() and works around
    WaSkipStolenMemoryFirstPage in i915_gem_stolen_insert_node_in_range()
    by insuring the start address passed by to drm_mm_insert_node_in_range()
    is always 4k or more. All entry points to i915_gem_stolen.c go through
    i915_gem_stolen_insert_node_in_range(), so that any newly allocated
    objects such as ring-buffers will not be allocated in the first 4k.
    
    The one exception is i915_gem_object_create_stolen_for_preallocated()
    which directly calls drm_mm_reserve_node() which now will be able to
    use the first 4k.
    
    This fixes the i915 driver no longer being able to inherit the firmware
    framebuffer on gen8+, which fixes the video output changing from the
    vendor logo to a black screen as soon as the i915 driver is loaded
    (on systems without fbcon).
    
    Some notes about the mapping of the BIOS framebuffer:
    
    v1 led to some discussion if the assumption of the intel_display.c code
    that the firmware framebuffer is a linear mapping of the stolen memory
    starting at offset 0 is still correct, because that would mean that the
    GOP does not implement the WaSkipStolenMemoryFirstPage workaround.
    
    To verify this the following code was added at the end of
    i915_gem_object_create_stolen_for_preallocated() :
    
    pr_err("first ggtt entry before bind: 0x%016llx\n",
           readq(dev_priv->ggtt.gsm));
    ret = i915_vma_bind(vma,
                HAS_LLC(dev_priv) ? I915_CACHE_LLC : I915_CACHE_NONE,
                PIN_UPDATE);
    pr_err("i915_vma_bind ret %d\n", ret);
    pr_err("first ggtt entry after bind: 0x%016llx\n",
           readq(dev_priv->ggtt.gsm));
    
    Which prints the mapping of the first page, then does a vma_bind() to
    force update the mapping with our linear view of the framebuffer and
    then prints the mapping of the first page again.
    
    On an Asrock B150M Pro4S/D3 mainboard with i5-6500 CPU this prints:
    
    [    1.651141] first ggtt entry before bind: 0x0000000078c00001
    [    1.651151] i915_vma_bind ret 0
    [    1.651152] first ggtt entry after bind: 0x0000000078c00083
    
    And "sudo cat /proc/iomem | grep Stolen" gives:
      78c00000-88bfffff : Graphics Stolen Memory
    
    There are no visual changes with this patch (BIOS vendor logo still
    stays in place when we inherit the BIOS framebuffer), so the vma_bind()
    does not impact which memory is being scanned out.
    
    The address of the first ggtt entry matches with the start of stolen
    and the i915_vma_bind call only changes the first gtt entry's flags,
    or-ing in _PAGE_RW (BIT(1)) and PPAT_CACHED (BIT(7)), which perfectly
    matches what we would expect based on gen8_pte_encode()'s behavior.
    
    So it seems that the GOP indeed does NOT implement the wa and the i915's
    code assuming a linear mapping at the start of stolen for the BIOS fb
    still holds true for gen8+.
    
    I've also tested this on a Cherry Trail based device (a GPD Win)
    with identical results (the flags are 0x1b after the vma_bind
    on CHT, which matches with I915_CACHE_NONE).
    
    Changed in v2: No code changes, extended the commit message with the
    verification that the intel_display.c BIOS framebuffer mapping is still
    correct.
    
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180420095933.16442-1-hdegoede@redhat.com
Comment 8 Chris Wilson 2018-08-01 11:38:09 UTC
Which made it into 4.18-rc7, so another dmesg from current is required as it may be a different issue.
Comment 9 Jan-Marek Glogowski 2018-08-01 12:00:22 UTC
So I applied the patch

commit 011f22eb545a35f972036bb6a245c95c2e7e15a0
Author: Hans de Goede <j.w.r.degoede@gmail.com>
Date:   Fri Apr 20 11:59:33 2018 +0200

    drm/i915: Do NOT skip the first 4k of stolen memory for pre-allocated buffers v2

on top of my DRM backport and now the dmesg output is again

[    5.912030] [drm:intelfb_create [i915_bpo]] re-using BIOS fb
[    5.912121] [drm:intelfb_create [i915_bpo]] allocated 1366x768 fb: 0x00000000

instead of the broken

[    4.244504] [drm:intelfb_create [i915_bpo]] no BIOS fb, allocating a new one
[    4.244538] [drm:intelfb_create [i915_bpo]] allocated fb from stolen memory
[    4.246099] [drm:intelfb_create [i915_bpo]] allocated 1366x768 fb: 0x00040000

Is that what you expected?
Comment 10 Jan-Marek Glogowski 2018-08-01 12:02:53 UTC
Oh - and at least two boots had the correct plymouth boot splash images on the U757. Will test my other HW.
Comment 11 Chris Wilson 2018-08-01 12:10:01 UTC
(In reply to Jan-Marek Glogowski from comment #9)
> So I applied the patch
> 
> commit 011f22eb545a35f972036bb6a245c95c2e7e15a0
> Author: Hans de Goede <j.w.r.degoede@gmail.com>
> Date:   Fri Apr 20 11:59:33 2018 +0200
> 
>     drm/i915: Do NOT skip the first 4k of stolen memory for pre-allocated
> buffers v2
> 
> on top of my DRM backport and now the dmesg output is again
> 
> [    5.912030] [drm:intelfb_create [i915_bpo]] re-using BIOS fb
> [    5.912121] [drm:intelfb_create [i915_bpo]] allocated 1366x768 fb:
> 0x00000000
> 
> instead of the broken
> 
> [    4.244504] [drm:intelfb_create [i915_bpo]] no BIOS fb, allocating a new
> one
> [    4.244538] [drm:intelfb_create [i915_bpo]] allocated fb from stolen
> memory
> [    4.246099] [drm:intelfb_create [i915_bpo]] allocated 1366x768 fb:
> 0x00040000
> 
> Is that what you expected?

Yes. But I'm worrying about your earlier result with 4.18rc7 as that includes the patch.
Comment 12 Jan-Marek Glogowski 2018-08-01 12:56:48 UTC
(In reply to Chris Wilson from comment #11)
> (In reply to Jan-Marek Glogowski from comment #9)
> > So I applied the patch
> > 
> > commit 011f22eb545a35f972036bb6a245c95c2e7e15a0
> > Author: Hans de Goede <j.w.r.degoede@gmail.com>
> > Date:   Fri Apr 20 11:59:33 2018 +0200
> > 
> >     drm/i915: Do NOT skip the first 4k of stolen memory for pre-allocated
> > buffers v2
> > 
> > on top of my DRM backport and now the dmesg output is again
> > 
> > [    5.912030] [drm:intelfb_create [i915_bpo]] re-using BIOS fb
> > [    5.912121] [drm:intelfb_create [i915_bpo]] allocated 1366x768 fb:
> > 0x00000000
> > 
> > instead of the broken
> > 
> > [    4.244504] [drm:intelfb_create [i915_bpo]] no BIOS fb, allocating a new
> > one
> > [    4.244538] [drm:intelfb_create [i915_bpo]] allocated fb from stolen
> > memory
> > [    4.246099] [drm:intelfb_create [i915_bpo]] allocated 1366x768 fb:
> > 0x00040000
> > 
> > Is that what you expected?
> 
> Yes. But I'm worrying about your earlier result with 4.18rc7 as that
> includes the patch.

So I've retested the HW. The patch works on the Fujitsu U757 with the Ubuntu 12.04 + kernel 4.4 + my 4.15 DRM backport.

I'm not sure what's happening for 4.18 on the Acer n4640g HW.

* It's working when booting with i915.ko in the initramfs
* The monitor is blank, if booting with i915.ko just in the root fs

$ grep "re-using BIOS fb" dmesg-4.18*
dmesg-4.18_initramfs:[    2.042702] [drm:intelfb_create [i915]] re-using BIOS fb
dmesg-4.18_rootfs:   [   18.468958] [drm:intelfb_create [i915]] re-using BIOS fb

To be more precise, the "blank" screen is a monitor with backlight switched on. I can switch to the VT with a plymouth running with the correct image, which I didn't check earlier.

Now all of this might be a systemd or plymouth problem; but still in the initramfs case I have a splash on the correct VT from the beginning.
Comment 13 Jan-Marek Glogowski 2018-08-01 12:57:29 UTC
Created attachment 140918 [details]
dmesg 4.18 with i915.ko in initramfs
Comment 14 Jan-Marek Glogowski 2018-08-01 12:58:05 UTC
Created attachment 140919 [details]
dmesg 4.18 with i915.ko just in the root fs
Comment 15 Jan-Marek Glogowski 2018-08-01 13:03:22 UTC
And to be more precise I also see the boot splash in the rootfs case until the i915.ko is loaded (I guess), which is really late on the hardware. It's just a "Celeron® G3900T" and a HDD.
Comment 16 Jani Saarinen 2018-08-13 09:58:14 UTC
Reporter, can you still try using latest https://cgit.freedesktop.org/drm-tip and send dmesg with debugs?
Comment 17 Jan-Marek Glogowski 2018-08-13 13:57:20 UTC
Created attachment 141064 [details]
dmesg 4.18+drmtip with i915.ko in initramfs

commit 3010d040760de7eb87c6eb54985ba6220447bbba
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Aug 13 13:59:02 2018 +0100

    drm-tip: 2018y-08m-13d-12h-57m-37s UTC integration manifest
Comment 18 Jan-Marek Glogowski 2018-08-13 14:10:51 UTC
Created attachment 141065 [details]
dmesg 4.18+drmtip with i915.ko just in the root fs

$ grep "re-using BIOS fb" *tip*
dmesg-4.18-drmtip_initramfs:[    1.727499] [drm:intelfb_create [i915]] re-using BIOS fb
dmesg-4.18-drmtip_rootfs:[   19.842296] [drm:intelfb_create [i915]] re-using BIOS fb

1. i915.ko in initramfs: works as expected (splash animation from boot)

2. i915.ko loaded late from rootfs: I still just see the splash image for a second and then have a black screen. If I switch the VT manually, I can switch to the correct splash image, which is no longer broken.
Comment 19 Maarten Lankhorst 2018-08-15 08:53:35 UTC
Plymouth is loaded before i915 in the rootfs case?
Comment 20 Jani Saarinen 2018-08-17 06:20:01 UTC
Jan-Marek, can you answer to Maarten's question.
Comment 21 Jan-Marek Glogowski 2018-08-17 11:05:07 UTC
Created attachment 141167 [details]
initramfs-tools hook to remove i915.ko module

(In reply to Maarten Lankhorst from comment #19)
> Plymouth is loaded before i915 in the rootfs case?

There was a public holiday here in Bavaria / Germany, so sorry for the delay…

Yup. On Ubuntu / Debian plymouthd is started via an initramfs-tools script (/usr/share/initramfs-tools/scripts/init-premount/plymouth), so you get a nice splash screen for cryptsetup.

AFAIK that splash doesn't work without a DRM device, so if I kick the i915.ko module from the initramfs, I get no splash but still plymouthd is started (hook script goes to /etc/initramfs-tools/hooks/)

I updated my system, and now I always end up with black screen (still Ubuntu 16.04 with a kernel 4.18).

In the "early case" I get the splash until some point. Then I can manually change the VT from 1 => 2 => 1 and it re-appears.

In the late case, now all VTs are black.

I both cases 'plymouth --quit' quits plymouthd and restores the VTs, which would normally happen anyway, as I disabled that services to see the splash after boot.

All this might be a user space problem. Probably some timing changed due to the new packages. Normally a systemd service would stop plymouth, which I changed to /bin/true for the testing.

If you have any other idea for me to test, feel free to ask me.
Comment 22 Maarten Lankhorst 2018-08-30 05:57:00 UTC
plymouth should probably be started after i915 is, so that would explain. :)

I'm leaning more towards thinking of a bug with plymouth than i915.
Comment 23 Lakshmi 2018-09-04 06:25:25 UTC
Reporter, this seems to be not a i915 bug from the above investigation. Can I close this bug?
Comment 24 Lakshmi 2018-09-07 14:40:30 UTC
This issue is not related to i915 driver. Closing as NOTOURBUG.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.