Xorg sessions are suddenly showing corruption upon login, briefly before the shell starts up. This seems to be related to the use of "-background none" by the DM. If I remove "-background none" from the DM code then the corruption is replaced by a less bad second or two of blackness. This problem seems to have started when Mesa 18.0.0 was introduced to Ubuntu 18.04. I can't seem to correlate it with anything else, and can't relate it to a specific Xorg update.
Please attach the corresponding Xorg log file.
There's a log in the downstream bug: https://launchpad.net/bugs/1753776
FWIW, you're more likely to get action on this if you reassign to the Mesa i965 driver.
I'm not sure if the problem is confined to the Mesa i965 driver. It is possible. I have verified that the corruption occurs with both Xorg drivers though: modesetting and intel.
seems to be limited to gen9 (and up?), can't reproduce on BDW but can on SKL/KBL/CFL
I agree with that assessment so far. KBL is affected, but HAS is not.
KBL is affected, but HSW is not.
OK, if the problem is only Intel and only gen9 then I am less concerned. But I will try to bisect this by next week. Certainly sounds like a Mesa 18 bug now.
I can now also confirm in more detail that: HSW is not affected BDW is not affected SKL is affected KBL is affected
Bisected: 1efd73df39b39589d26f44d4927d2c65697bbe6e is the first bad commit commit 1efd73df39b39589d26f44d4927d2c65697bbe6e Author: Ben Widawsky <ben@bwidawsk.net> Date: Tue May 30 17:24:06 2017 +0530 i965: Advertise the CCS modifier v2: Rename modifier to be more smart (Jason) FINISHME: Use the kernel's final choice for the fb modifier bwidawsk@norris2:~/intel-gfx/kmscube (modifiers $) ~/scripts/measure_bandwidth.sh ./kmscube none Read bandwidth: 603.91 MiB/s Write bandwidth: 615.28 MiB/s bwidawsk@norris2:~/intel-gfx/kmscube (modifiers $) ~/scripts/measure_bandwidth.sh ./kmscube ytile Read bandwidth: 571.13 MiB/s Write bandwidth: 555.51 MiB/s bwidawsk@norris2:~/intel-gfx/kmscube (modifiers $) ~/scripts/measure_bandwidth.sh ./kmscube ccs Read bandwidth: 259.34 MiB/s Write bandwidth: 337.83 MiB/s v2: Move all references to the new fourcc code(s) to this patch. v3: Rebase, remove Yf_CCS (Daniel) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Chad Versace <chadversary@chromium.org> :040000 040000 1770769b46d6a9425e80b380e8338059841921e2 f790e3c0d1c5246c4d1385f14eac032b90acf68c M src
I am able to reproduce this issue on Skylake GT2 (0x1916) with Ubuntu 18.04 beta default mesa. Downgrading mesa to mesa-17.3.6 doesn't help. Reverting the bisected commit (1efd73df39b3) solved the issue. As well as using mesa-17.2.8 where that commit was not applied.
This isn't a mesa bug. The problem is that, if '-background none' is set, X use whatever random garbage happens to be in the framebuffer when it starts up. I think what is likely going on is that it's suddenly treating that data as compressed and so it gets corrupted. In any case, I think this is highly unlikely to be an actual mesa bug.
That's really odd. -background none will just do GBM_BO_IMPORT_FD (which passes DRM_FORMAT_MOD_INVALID) from the current DRM FB to get a gbm_bo, then import that BO into an EGLImage, which it copies from. I wonder if the original image we try to import is Y_CCS, but we import it as plain Y (no CCS) because drmModeGetFb doesn't tell us the modifier. Daniel, do either of the following fix this for your users? a) inlined kernel patch to refuse getfb for modifiers we can't determine via implicit means, or multi-planar b) in whatever your DM (GDM? LightDM?) uses to create a KMS display, filtering the supported modifier set to DRM_FORMAT_MOD_LINEAR / I915_FORMAT_MOD_X_TILED / I915_FORMAT_MOD_Y_TILED I think the kernel patch is the most correct, but then we might have to create a getfb2 to support -background none. :( diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 92d94d2684f4..ef157585fc32 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13934,6 +13934,16 @@ static int intel_user_framebuffer_create_handle(struct drm_framebuffer *fb, return -EINVAL; } + switch (fb->modifier) { + case DRM_FORMAT_MOD_LINEAR: + case I915_FORMAT_MOD_X_TILED: + case I915_FORMAT_MOD_Y_TILED: + break; + default: + DRM_DEBUG_KMS("rejecting getfb for exotic modifier\n"); + return -EINVAL; + } + return drm_gem_handle_create(file, &obj->base, handle); }
(In reply to Daniel Stone from comment #13) > a) inlined kernel patch to refuse getfb for modifiers we can't determine via > implicit means, or multi-planar Your provided kernel patch replaces the corruption with blackness, which was probably expected. So the seamless transition from gdm3 into gnome-shell (or whatever) is gone. We can achieve the same blackness workaround by removing '-background none' from the gdm3 source, but that hurts all users including Broadwell and earlier who are not yet affected by the bug. > b) in whatever your DM (GDM? LightDM?) uses to create a KMS display, > filtering the supported modifier set to DRM_FORMAT_MOD_LINEAR / > I915_FORMAT_MOD_X_TILED / I915_FORMAT_MOD_Y_TILED It's gdm3 so the code you want is mutter, right? I think we'd lean more toward patching the offending commit out of Mesa before we go proposing more mutter patches. I'm hesitant to do either right now. New ideas are welcome.
(In reply to Daniel van Vugt from comment #14) > (In reply to Daniel Stone from comment #13) > > > a) inlined kernel patch to refuse getfb for modifiers we can't determine via > > implicit means, or multi-planar > > Your provided kernel patch replaces the corruption with blackness, which was > probably expected. So the seamless transition from gdm3 into gnome-shell (or > whatever) is gone. We can achieve the same blackness workaround by removing > '-background none' from the gdm3 source, but that hurts all users including > Broadwell and earlier who are not yet affected by the bug. Correct. > > b) in whatever your DM (GDM? LightDM?) uses to create a KMS display, > > filtering the supported modifier set to DRM_FORMAT_MOD_LINEAR / > > I915_FORMAT_MOD_X_TILED / I915_FORMAT_MOD_Y_TILED > > It's gdm3 so the code you want is mutter, right? I think we'd lean more > toward patching the offending commit out of Mesa before we go proposing more > mutter patches. I'm hesitant to do either right now. New ideas are welcome. That's not a long-term fix I'm suggesting, but to very specifically narrow down the problem, i.e. prove the thesis that the issue is caused by attempting to infer the tiling layout of a framebuffer with auxiliary compression planes. The whitelist I suggested would make sure Mutter only picks single-plane formats: if that results in the corruption disappearing, then we know the problem is exactly that and nothing else. There are exactly three options to fix this: a) stop using -background none (I didn't notice this since my DM launches a Wayland session, and Xorg is something I only start from a VT without bg none); or b) revert the Mesa commit and take a hit from not having renderbuffer compression enabled; or c) add a GetFB2 ioctl to the kernel + libdrm which includes multi-planar and modifier information, use that in Xorg as well as a modifier-aware gbm_bo import interface. The latter is obviously the most complete, but it's not zero work.
(In reply to Daniel Stone from comment #15) > There are exactly three options to fix this: > a) stop using -background none (I didn't notice this since my DM launches > a Wayland session, and Xorg is something I only start from a VT without bg > none); or I think (a) is worse than doing nothing. At least if we do nothing then a subset of users still have a nice seamless login. > b) revert the Mesa commit and take a hit from not having renderbuffer > compression enabled; or We might do this in the short term for Ubuntu at least. Not having compression enabled is better than us having to deal with recurring bug reports from users about their corrupt screens. > c) add a GetFB2 ioctl to the kernel + libdrm which includes multi-planar > and modifier information, use that in Xorg as well as a modifier-aware > gbm_bo import interface. Sounds like a good long term solution. Medium-term there's probably also another option: d) Add a small feature to mutter to allow the caller (gdm3) to specify that only simple (linear?) formats are allowed, so as to enable easy transitions for all display server types. gdm3 probably cares less about the benefits of compression than gnome-shell-proper does.
(In reply to Daniel van Vugt from comment #16) > d) Add a small feature to mutter to allow the caller (gdm3) to specify > that only simple (linear?) formats are allowed, so as to enable easy > transitions for all display server types. gdm3 probably cares less about the > benefits of compression than gnome-shell-proper does. It should be enough to exclude any modifier for which gbm_device_get_format_modifier_plane_count() returns > 1. I don't have any plans to do this in Mutter, I can look at the long-term getfb2 solution.
The first part of the fix to reject multi-plane buffers for getfb is in drm-misc-fixes: https://cgit.freedesktop.org/drm/drm-misc/commit/?h=drm-misc-fixes&id=b24791fe00f8b089d5b10cb7bcc4e1ae88b4831b I'm working on a getfb2 ioctl, but that won't land for a little while.
Thanks. I slightly fear that the RESOLVED FIXED status might mislead people though. How many of the fixes will we need before the symptom is gone? Does the drm fix restore seamless logins or will that only be after the kernel fix? Or another?
(In reply to Daniel van Vugt from comment #19) > I slightly fear that the RESOLVED FIXED status might mislead people though. > How many of the fixes will we need before the symptom is gone? Does the drm > fix restore seamless logins or will that only be after the kernel fix? Or > another? That will only come after we add a GetFB2 ioctl to the kernel, a wrapper to libdrm, and then make Xorg use it. I've typed most of that up now.
Lo and behold: https://lists.freedesktop.org/archives/dri-devel/2018-March/170512.html https://lists.x.org/archives/xorg-devel/2018-March/056363.html
getfb2 is not upstream yet, so reopening
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/33.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.