Bug 105518 - Framebuffer corruption (briefly) when logging into Xorg sessions ("Xorg -background none") (Intel gen9 only)
Summary: Framebuffer corruption (briefly) when logging into Xorg sessions ("Xorg -back...
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/modesetting (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Daniel Stone
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-15 09:31 UTC by Daniel van Vugt
Modified: 2018-12-13 18:10 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel van Vugt 2018-03-15 09:31:00 UTC
Xorg sessions are suddenly showing corruption upon login, briefly before the shell starts up.

This seems to be related to the use of "-background none" by the DM. If I remove "-background none" from the DM code then the corruption is replaced by a less bad second or two of blackness.

This problem seems to have started when Mesa 18.0.0 was introduced to Ubuntu 18.04. I can't seem to correlate it with anything else, and can't relate it to a specific Xorg update.
Comment 1 Michel Dänzer 2018-03-15 09:34:21 UTC
Please attach the corresponding Xorg log file.
Comment 2 Daniel van Vugt 2018-03-15 09:37:24 UTC
There's a log in the downstream bug:
https://launchpad.net/bugs/1753776
Comment 3 Michel Dänzer 2018-03-15 09:41:27 UTC
FWIW, you're more likely to get action on this if you reassign to the Mesa i965 driver.
Comment 4 Daniel van Vugt 2018-03-15 09:42:48 UTC
I'm not sure if the problem is confined to the Mesa i965 driver. It is possible.

I have verified that the corruption occurs with both Xorg drivers though: modesetting and intel.
Comment 5 Timo Aaltonen 2018-03-15 12:54:52 UTC
seems to be limited to gen9 (and up?), can't reproduce on BDW but can on SKL/KBL/CFL
Comment 6 Daniel van Vugt 2018-03-16 01:24:54 UTC
I agree with that assessment so far. KBL is affected, but HAS is not.
Comment 7 Daniel van Vugt 2018-03-16 01:42:34 UTC
KBL is affected, but HSW is not.
Comment 8 Daniel van Vugt 2018-03-16 05:53:12 UTC
OK, if the problem is only Intel and only gen9 then I am less concerned. But I will try to bisect this by next week. Certainly sounds like a Mesa 18 bug now.
Comment 9 Daniel van Vugt 2018-03-19 01:40:58 UTC
I can now also confirm in more detail that:

HSW is not affected
BDW is not affected
SKL is affected
KBL is affected
Comment 10 Daniel van Vugt 2018-03-19 09:42:52 UTC
Bisected:

1efd73df39b39589d26f44d4927d2c65697bbe6e is the first bad commit
commit 1efd73df39b39589d26f44d4927d2c65697bbe6e
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Tue May 30 17:24:06 2017 +0530

    i965: Advertise the CCS modifier
    
    v2: Rename modifier to be more smart (Jason)
    
    FINISHME: Use the kernel's final choice for the fb modifier
    
    bwidawsk@norris2:~/intel-gfx/kmscube (modifiers $) ~/scripts/measure_bandwidth.sh ./kmscube none
    Read bandwidth: 603.91 MiB/s
    Write bandwidth: 615.28 MiB/s
    bwidawsk@norris2:~/intel-gfx/kmscube (modifiers $) ~/scripts/measure_bandwidth.sh ./kmscube ytile
    Read bandwidth: 571.13 MiB/s
    Write bandwidth: 555.51 MiB/s
    bwidawsk@norris2:~/intel-gfx/kmscube (modifiers $) ~/scripts/measure_bandwidth.sh ./kmscube ccs
    Read bandwidth: 259.34 MiB/s
    Write bandwidth: 337.83 MiB/s
    
    v2: Move all references to the new fourcc code(s) to this patch.
    v3: Rebase, remove Yf_CCS (Daniel)
    
    Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
    Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
    Acked-by: Daniel Stone <daniels@collabora.com>
    Reviewed-by: Chad Versace <chadversary@chromium.org>

:040000 040000 1770769b46d6a9425e80b380e8338059841921e2 f790e3c0d1c5246c4d1385f14eac032b90acf68c M	src
Comment 11 Andriy Khulap 2018-03-19 14:51:16 UTC
I am able to reproduce this issue on Skylake GT2 (0x1916) with Ubuntu 18.04 beta default mesa. Downgrading mesa to mesa-17.3.6 doesn't help.
Reverting the bisected commit (1efd73df39b3) solved the issue. As well as using mesa-17.2.8 where that commit was not applied.
Comment 12 Jason Ekstrand 2018-03-19 15:52:07 UTC
This isn't a mesa bug.  The problem is that, if '-background none' is set, X use whatever random garbage happens to be in the framebuffer when it starts up.  I think what is likely going on is that it's suddenly treating that data as compressed and so it gets corrupted.  In any case, I think this is highly unlikely to be an actual mesa bug.
Comment 13 Daniel Stone 2018-03-19 16:50:12 UTC
That's really odd. -background none will just do GBM_BO_IMPORT_FD (which passes DRM_FORMAT_MOD_INVALID) from the current DRM FB to get a gbm_bo, then import that BO into an EGLImage, which it copies from.

I wonder if the original image we try to import is Y_CCS, but we import it as plain Y (no CCS) because drmModeGetFb doesn't tell us the modifier.

Daniel, do either of the following fix this for your users?

a) inlined kernel patch to refuse getfb for modifiers we can't determine via implicit means, or multi-planar
b) in whatever your DM (GDM? LightDM?) uses to create a KMS display, filtering the supported modifier set to DRM_FORMAT_MOD_LINEAR / I915_FORMAT_MOD_X_TILED / I915_FORMAT_MOD_Y_TILED

I think the kernel patch is the most correct, but then we might have to create a getfb2 to support -background none. :(


diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 92d94d2684f4..ef157585fc32 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13934,6 +13934,16 @@ static int intel_user_framebuffer_create_handle(struct drm_framebuffer *fb,
 		return -EINVAL;
 	}
 
+	switch (fb->modifier) {
+	case DRM_FORMAT_MOD_LINEAR:
+	case I915_FORMAT_MOD_X_TILED:
+	case I915_FORMAT_MOD_Y_TILED:
+		break;
+	default:
+		DRM_DEBUG_KMS("rejecting getfb for exotic modifier\n");
+		return -EINVAL;
+	}
+
 	return drm_gem_handle_create(file, &obj->base, handle);
 }
Comment 14 Daniel van Vugt 2018-03-20 08:32:54 UTC
(In reply to Daniel Stone from comment #13)

> a) inlined kernel patch to refuse getfb for modifiers we can't determine via
> implicit means, or multi-planar

Your provided kernel patch replaces the corruption with blackness, which was probably expected. So the seamless transition from gdm3 into gnome-shell (or whatever) is gone. We can achieve the same blackness workaround by removing '-background none' from the gdm3 source, but that hurts all users including Broadwell and earlier who are not yet affected by the bug.

> b) in whatever your DM (GDM? LightDM?) uses to create a KMS display,
> filtering the supported modifier set to DRM_FORMAT_MOD_LINEAR /
> I915_FORMAT_MOD_X_TILED / I915_FORMAT_MOD_Y_TILED

It's gdm3 so the code you want is mutter, right? I think we'd lean more toward patching the offending commit out of Mesa before we go proposing more mutter patches. I'm hesitant to do either right now. New ideas are welcome.
Comment 15 Daniel Stone 2018-03-20 08:49:56 UTC
(In reply to Daniel van Vugt from comment #14)
> (In reply to Daniel Stone from comment #13)
> 
> > a) inlined kernel patch to refuse getfb for modifiers we can't determine via
> > implicit means, or multi-planar
> 
> Your provided kernel patch replaces the corruption with blackness, which was
> probably expected. So the seamless transition from gdm3 into gnome-shell (or
> whatever) is gone. We can achieve the same blackness workaround by removing
> '-background none' from the gdm3 source, but that hurts all users including
> Broadwell and earlier who are not yet affected by the bug.

Correct.

> > b) in whatever your DM (GDM? LightDM?) uses to create a KMS display,
> > filtering the supported modifier set to DRM_FORMAT_MOD_LINEAR /
> > I915_FORMAT_MOD_X_TILED / I915_FORMAT_MOD_Y_TILED
> 
> It's gdm3 so the code you want is mutter, right? I think we'd lean more
> toward patching the offending commit out of Mesa before we go proposing more
> mutter patches. I'm hesitant to do either right now. New ideas are welcome.

That's not a long-term fix I'm suggesting, but to very specifically narrow down the problem, i.e. prove the thesis that the issue is caused by attempting to infer the tiling layout of a framebuffer with auxiliary compression planes. The whitelist I suggested would make sure Mutter only picks single-plane formats: if that results in the corruption disappearing, then we know the problem is exactly that and nothing else.

There are exactly three options to fix this:
  a) stop using -background none (I didn't notice this since my DM launches a Wayland session, and Xorg is something I only start from a VT without bg none); or
  b) revert the Mesa commit and take a hit from not having renderbuffer compression enabled; or
  c) add a GetFB2 ioctl to the kernel + libdrm which includes multi-planar and modifier information, use that in Xorg as well as a modifier-aware gbm_bo import interface.

The latter is obviously the most complete, but it's not zero work.
Comment 16 Daniel van Vugt 2018-03-20 08:59:24 UTC
(In reply to Daniel Stone from comment #15)

> There are exactly three options to fix this:
>   a) stop using -background none (I didn't notice this since my DM launches
> a Wayland session, and Xorg is something I only start from a VT without bg
> none); or

I think (a) is worse than doing nothing. At least if we do nothing then a subset of users still have a nice seamless login.

>   b) revert the Mesa commit and take a hit from not having renderbuffer
> compression enabled; or

We might do this in the short term for Ubuntu at least. Not having compression enabled is better than us having to deal with recurring bug reports from users about their corrupt screens.

>   c) add a GetFB2 ioctl to the kernel + libdrm which includes multi-planar
> and modifier information, use that in Xorg as well as a modifier-aware
> gbm_bo import interface.

Sounds like a good long term solution.

Medium-term there's probably also another option:

  d) Add a small feature to mutter to allow the caller (gdm3) to specify that only simple (linear?) formats are allowed, so as to enable easy transitions for all display server types. gdm3 probably cares less about the benefits of compression than gnome-shell-proper does.
Comment 17 Daniel Stone 2018-03-20 09:01:47 UTC
(In reply to Daniel van Vugt from comment #16)
>   d) Add a small feature to mutter to allow the caller (gdm3) to specify
> that only simple (linear?) formats are allowed, so as to enable easy
> transitions for all display server types. gdm3 probably cares less about the
> benefits of compression than gnome-shell-proper does.

It should be enough to exclude any modifier for which gbm_device_get_format_modifier_plane_count() returns > 1. I don't have any plans to do this in Mutter, I can look at the long-term getfb2 solution.
Comment 18 Daniel Stone 2018-03-21 12:08:03 UTC
The first part of the fix to reject multi-plane buffers for getfb is in drm-misc-fixes:
https://cgit.freedesktop.org/drm/drm-misc/commit/?h=drm-misc-fixes&id=b24791fe00f8b089d5b10cb7bcc4e1ae88b4831b

I'm working on a getfb2 ioctl, but that won't land for a little while.
Comment 19 Daniel van Vugt 2018-03-22 01:43:52 UTC
Thanks.

I slightly fear that the RESOLVED FIXED status might mislead people though. How many of the fixes will we need before the symptom is gone? Does the drm fix restore seamless logins or will that only be after the kernel fix? Or another?
Comment 20 Daniel Stone 2018-03-22 08:00:28 UTC
(In reply to Daniel van Vugt from comment #19)
> I slightly fear that the RESOLVED FIXED status might mislead people though.
> How many of the fixes will we need before the symptom is gone? Does the drm
> fix restore seamless logins or will that only be after the kernel fix? Or
> another?

That will only come after we add a GetFB2 ioctl to the kernel, a wrapper to libdrm, and then make Xorg use it. I've typed most of that up now.
Comment 22 Timo Aaltonen 2018-11-29 08:52:00 UTC
getfb2 is not upstream yet, so reopening
Comment 23 GitLab Migration User 2018-12-13 18:10:01 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/33.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.