Bug 102806 - [bisected] Kwin compositing broken after 4486d199bd3bcb5b2b8ad9bc54eb11604d9bd653
Summary: [bisected] Kwin compositing broken after 4486d199bd3bcb5b2b8ad9bc54eb11604d9b...
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/Ext/GLX (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Thomas Hellström
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-16 19:04 UTC by Nick Sarnie
Modified: 2017-11-04 04:03 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
bad (60.86 KB, image/png)
2017-09-16 19:04 UTC, Nick Sarnie
no flags Details
good (64.19 KB, image/png)
2017-09-16 19:05 UTC, Nick Sarnie
no flags Details
glxinfo bad (101.84 KB, text/plain)
2017-09-18 22:53 UTC, Nick Sarnie
no flags Details
glxinfogood (101.84 KB, text/plain)
2017-09-18 22:53 UTC, Nick Sarnie
no flags Details
Patch to make the compositing visual non-srgb (695 bytes, patch)
2017-09-18 23:46 UTC, Thomas Hellström
no flags Details | Splinter Review
attachment-17113-0.html (2.44 KB, text/html)
2017-09-21 19:41 UTC, Thomas Hellström
no flags Details
Patch that duplicates fbconfigs for compositing visuals (6.62 KB, patch)
2017-09-26 21:27 UTC, Thomas Hellström
no flags Details | Splinter Review
glxinfo with patch applied (37.54 KB, text/plain)
2017-10-01 12:29 UTC, Tobias Klausmann
no flags Details

Description Nick Sarnie 2017-09-16 19:04:46 UTC
Created attachment 134277 [details]
bad

Hi all,

If I use 4486d199bd3bcb5b2b8ad9bc54eb11604d9bd653, my KDE Plasma 5 menus and UI elements become dark gray, instead of light gray. It is fixed if I revert the above commit. Also, it works if I include all xorg commits, but revert the accompanying mesa changes that were committed by Thomas in August.

I am using Mesa git, LLVM git, Xorg git, Kernel 4.13.2, and radeonsi on RX480.

Please see the attached screenshots.

Thanks,
Sarnex
Comment 1 Nick Sarnie 2017-09-16 19:05:00 UTC
Created attachment 134278 [details]
good
Comment 2 Nick Sarnie 2017-09-16 19:06:34 UTC
It is also fixed if I disable KWin compositing. If I use XRender instead of OpenGL for the rendering backend, the color is not light nor dark gray, it is medium gray which is still incorrect.
Comment 3 Thomas Hellström 2017-09-18 20:26:55 UTC
Nick, is there a way you could turn off dri3 and see if the problem persists?

There might be a Radeon xserver driver option to do that or to have kwin launched with the LIBGL_DRI3_DISABLE=1 environment variable set.

/Thomas
Comment 4 Thomas Hellström 2017-09-18 22:21:33 UTC
Could you please also attach the output of 'glxinfo' both with xorg master and with the failing commit reverted?

Thanks,
Thomas
Comment 5 Nick Sarnie 2017-09-18 22:53:25 UTC
Created attachment 134322 [details]
glxinfo bad

Hi Thomas,

Unfortunately, it only complicates from here. It seems if I use DRI2(with or without this patch reverted) at all, I get really strange corruption which is unrelated to this issue, which you can see in these videos:

Enabling DRI2 in the amdgpu driver or by setting LIBGL_DRI3_DISABLE=true in the env: https://www.youtube.com/watch?v=zND2GS_UYdU

Starting Kwin with LIBGL_DRI3_DISABLE=true: https://www.youtube.com/watch?v=d9GO1jBAAIk

In the second example which is using Xorg with this commit (so the color is bad), the color of the menu bar DOES return to normal after running the command, so I think if we ignore the screen corruption which I think is unrelated, it works on DRI2.

I've attached the good and bad glxinfos.

Thanks,
Sarnex
Comment 6 Nick Sarnie 2017-09-18 22:53:36 UTC
Created attachment 134323 [details]
glxinfogood
Comment 7 Nick Sarnie 2017-09-18 22:58:38 UTC
If it matters, I should mention that unrelated corruption can be fixed by disabling compositing.
Comment 8 Thomas Hellström 2017-09-18 23:33:12 UTC
(In reply to Nick Sarnie from comment #5)
> Created attachment 134322 [details]
> glxinfo bad
> 
> Hi Thomas,
> [...]
> I've attached the good and bad glxinfos.
> 
> Thanks,
> Sarnex

So it looks like the compositing visual in the "bad case" ends up to be sRGB capable, which it is not in the "good case".
Comment 9 Nick Sarnie 2017-09-18 23:44:49 UTC
(In reply to Thomas Hellström from comment #8)
> (In reply to Nick Sarnie from comment #5)
> > Created attachment 134322 [details]
> > glxinfo bad
> > 
> > Hi Thomas,
> > [...]
> > I've attached the good and bad glxinfos.
> > 
> > Thanks,
> > Sarnex
> 
> So it looks like the compositing visual in the "bad case" ends up to be sRGB
> capable, which it is not in the "good case".

Hey Thomas,

Thanks for taking a look. This is a bit out of my knowledge area, so if you have any suggestions on further steps, patches, or ideas on why it works on DRI2, please let me know and I'm happy to test anything.

Sarnex
Comment 10 Thomas Hellström 2017-09-18 23:46:52 UTC
Created attachment 134325 [details] [review]
Patch to make the compositing visual non-srgb

Could you try the attached patch? I haven't compile-tested, though, since I don't have an xserver compile environment where I'm currently at.
Comment 11 Nick Sarnie 2017-09-18 23:53:54 UTC
(In reply to Thomas Hellström from comment #10)
> Created attachment 134325 [details] [review] [review]
> Patch to make the compositing visual non-srgb
> 
> Could you try the attached patch? I haven't compile-tested, though, since I
> don't have an xserver compile environment where I'm currently at.

Hi Thomas,

Thanks for the quick patch. Unfortunately it has no effect. The colors are still dark gray when using xserver git + this patch.

Sarnex
Comment 12 Thomas Hellström 2017-09-19 22:57:34 UTC
OK. 

Downloading Fedora-KDE now to see if I can replicate.
Comment 13 Thomas Hellström 2017-09-20 02:15:11 UTC
OK, been able to reproduce the problem with vmware/svga.

Although I see it both with dri3 and dri2, so this should be purely a visual selection problem.

/Thomas
Comment 14 Nick Sarnie 2017-09-20 02:59:49 UTC
(In reply to Thomas Hellström from comment #13)
> OK, been able to reproduce the problem with vmware/svga.
> 
> Although I see it both with dri3 and dri2, so this should be purely a visual
> selection problem.
> 
> /Thomas

Nice work, let me know if you have a patch for me to try.
Comment 15 Thomas Hellström 2017-09-20 13:40:47 UTC
Actually, everything here points to an application problem, it looks like the application side visual / fbconfig selection code isn't particularly robust and more or less works by chance.

Even if I back out all related changes in mesa and xorg, and then restrict the number of visuals / fbconfigs in the dri driver by only supporting a single swap method, the bars become dark grey. Note that some dri drivers, (like classic radeon I think) already has this restriction.

The change in Xorg that triggers this basically only reorder configs.

So you should file a bug against the application (kwin?) and if it turns out that they have a special visual / fbconfig combination requirement, then we could see what we can do on the GLX side.
Comment 16 Nick Sarnie 2017-09-20 13:51:41 UTC
(In reply to Thomas Hellström from comment #15)
> Actually, everything here points to an application problem, it looks like
> the application side visual / fbconfig selection code isn't particularly
> robust and more or less works by chance.
> 
> Even if I back out all related changes in mesa and xorg, and then restrict
> the number of visuals / fbconfigs in the dri driver by only supporting a
> single swap method, the bars become dark grey. Note that some dri drivers,
> (like classic radeon I think) already has this restriction.
> 
> The change in Xorg that triggers this basically only reorder configs.
> 
> So you should file a bug against the application (kwin?) and if it turns out
> that they have a special visual / fbconfig combination requirement, then we
> could see what we can do on the GLX side.

I've reported it to Kwin, and I'll keep this bug open until we know if they need any more changes in Xserver.

KWin bug link: https://bugs.kde.org/show_bug.cgi?id=384882

Thanks,
Sarnex
Comment 17 Martin Flöser 2017-09-20 14:54:46 UTC
KWin's framebuffer selection code is:

bool GlxBackend::initFbConfig()
{
    const int attribs[] = {
        GLX_RENDER_TYPE,    GLX_RGBA_BIT,
        GLX_DRAWABLE_TYPE,  GLX_WINDOW_BIT,
        GLX_RED_SIZE,       1,
        GLX_GREEN_SIZE,     1,
        GLX_BLUE_SIZE,      1,
        GLX_ALPHA_SIZE,     0,
        GLX_DEPTH_SIZE,     0,
        GLX_STENCIL_SIZE,   0,
        GLX_CONFIG_CAVEAT,  GLX_NONE,
        GLX_DOUBLEBUFFER,   true,
        0
    };

    // Try to find a double buffered configuration
    int count = 0;
    GLXFBConfig *configs = glXChooseFBConfig(display(), DefaultScreen(display()), attribs, &count);

    struct FBConfig {
        GLXFBConfig config;
        int depth;
        int stencil;
    };

    std::deque<FBConfig> candidates;

    for (int i = 0; i < count; i++) {
        int depth, stencil;
        glXGetFBConfigAttrib(display(), configs[i], GLX_DEPTH_SIZE,   &depth);
        glXGetFBConfigAttrib(display(), configs[i], GLX_STENCIL_SIZE, &stencil);

        candidates.emplace_back(FBConfig{configs[i], depth, stencil});
    }

    if (count > 0)
        XFree(configs);

    std::stable_sort(candidates.begin(), candidates.end(), [](const FBConfig &left, const FBConfig &right) {
        if (left.depth < right.depth)
            return true;

        if (left.stencil < right.stencil)
            return true;

        return false;
    });

    if (candidates.size() > 0) {
        fbconfig = candidates.front().config;

        int fbconfig_id, visual_id, red, green, blue, alpha, depth, stencil;
        glXGetFBConfigAttrib(display(), fbconfig, GLX_FBCONFIG_ID,  &fbconfig_id);
        glXGetFBConfigAttrib(display(), fbconfig, GLX_VISUAL_ID,    &visual_id);
        glXGetFBConfigAttrib(display(), fbconfig, GLX_RED_SIZE,     &red);
        glXGetFBConfigAttrib(display(), fbconfig, GLX_GREEN_SIZE,   &green);
        glXGetFBConfigAttrib(display(), fbconfig, GLX_BLUE_SIZE,    &blue);
        glXGetFBConfigAttrib(display(), fbconfig, GLX_ALPHA_SIZE,   &alpha);
        glXGetFBConfigAttrib(display(), fbconfig, GLX_DEPTH_SIZE,   &depth);
        glXGetFBConfigAttrib(display(), fbconfig, GLX_STENCIL_SIZE, &stencil);

        qCDebug(KWIN_X11STANDALONE, "Choosing GLXFBConfig %#x X visual %#x depth %d RGBA %d:%d:%d:%d ZS %d:%d",
                fbconfig_id, visual_id, visualDepth(visual_id), red, green, blue, alpha, depth, stencil);
    }

    if (fbconfig == nullptr) {
        qCCritical(KWIN_X11STANDALONE) << "Failed to find a usable framebuffer configuration";
        return false;
    }

    return true;
}
Comment 18 Nick Sarnie 2017-09-21 17:44:15 UTC
Hey Thomas,

I have reproduced this issue using the OpenGL ES Compositor on Kwin, so Martin believes that this is in fact an Xserver bug. See his post on the KDE Bugzilla for more info.
 
Let me know how to proceed.

Thanks,
Sarnex
Comment 19 Thomas Hellström 2017-09-21 19:41:09 UTC
Created attachment 134415 [details]
attachment-17113-0.html

Hi. It's definitely not a new Xserver bug since it's possible to trigger it using simple dri driver capability advertising changes. What happens is that Kwin (or whatever is drawing this) is getting a different visual or fbconfig than what it's used to. So what's needed is to understand why that new visual renders differently than the old one.



21 sep. 2017 kl. 10:44 skrev "bugzilla-daemon@freedesktop.org<mailto:bugzilla-daemon@freedesktop.org>" <bugzilla-daemon@freedesktop.org<mailto:bugzilla-daemon@freedesktop.org>>:


Comment # 18<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D102806-23c18&d=DwMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=MAZVOEDTef653675sRScCjmWxpVn4wXK9pUnQn2H8DA&s=JlbiD9lyTsQZRcNUM6aOEjK7jfRIIT9XD51Fpfx21dk&e=> on bug 102806<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D102806&d=DwMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=MAZVOEDTef653675sRScCjmWxpVn4wXK9pUnQn2H8DA&s=qtbifUDN0DCfXij6E5kZrQaYNBn5nTECwecqvJhIR08&e=> from Nick Sarnie<mailto:commendsarnex@gmail.com>

Hey Thomas,

I have reproduced this issue using the OpenGL ES Compositor on Kwin, so Martin
believes that this is in fact an Xserver bug. See his post on the KDE Bugzilla
for more info.

Let me know how to proceed.

Thanks,
Sarnex

________________________________
You are receiving this mail because:

  *   You are on the CC list for the bug.
  *   You are the assignee for the bug.
Comment 20 Nick Sarnie 2017-09-21 20:32:08 UTC
Are you saying the KWin developers should investigate to determine why the rendering is different? 

Hopefully we can determine who's bug this is :)
Comment 21 Thomas Hellström 2017-09-21 22:44:27 UTC
Yes I think Kwin developers should take a look at this. I'll attach a patch later that breaks Kwin hitting the same issue. IF it turns out that a particular visual they want has too limited functionality we can look at the problem from there
Comment 22 Martin Flöser 2017-09-22 13:41:22 UTC
I'm sorry to say, but it is unrealistic that KWin developers will investigate this. We do not use X anymore, we use Wayland. We don't have a development setup of the xserver stack to just test. We can only test once the change has been released and is shipped in the distros the devs use (in my case that would be Debian testing). That's a point in time way too late.

It's also unrealistic that we can adjust the code. Our code needs to support:
* current Xorg
* future Xorg
* NVIDIA blob
* AMD blob

Getting a combination which works is difficult and fragile. If we adjust anything here to make it work again with the new Xorg than it might break with other setups.

I'm totally honest here: I don't care why it broke or whether our code is wrong or not. It used to work and now it doesn't. Writing this here in the bug tracker is already more time I have to spend on the issue than I want to. If our code breaks due to changes in X, it's X's fault. Be pragmatic here, take a look at how the kernel handles user space promises.
Comment 23 Martin Flöser 2017-09-22 13:44:04 UTC
Oh and just saying the obvious: if KWin gets broken you are lucky that you have a prominent software getting broken used by many users. Just think about how many more software you might have broken with the change here. All those devs will have a hard time handling the bug reports and trying to fix it, not understanding why. Lot's of software will stay broken, due to not being maintained anymore, devs not getting time to fix it, etc. Please think about whether the change which broke things is really required.
Comment 24 Thomas Hellström 2017-09-22 14:36:24 UTC
(In reply to Martin Flöser from comment #23)
> Oh and just saying the obvious: if KWin gets broken you are lucky that you
> have a prominent software getting broken used by many users. Just think
> about how many more software you might have broken with the change here. All
> those devs will have a hard time handling the bug reports and trying to fix
> it, not understanding why. Lot's of software will stay broken, due to not
> being maintained anymore, devs not getting time to fix it, etc. Please think
> about whether the change which broke things is really required.

The thing here is that with KWin's visual selection It will at some point be broken anyway. If, for example, DRI3 drivers would have done the right thing and disabled SWAP_COPY functionality from the start, You would have hit this bug. And I think you will hit it in the future, depending on what set of configs the DRI driver will decide to expose, and finally anybody running the classic Radeon driver would probably see it now, even without my changes.

Now having *you* debug *your* app, and telling me what you need would hopefully have helped me to find a way for slightly modify the glx layer to craft a visual / fbconfig list that would have worked for everybody.

Having said that, I will ask Ajax to back out this change.  But that will probably break other not so prominent software requesting a GLX_SWAP_COPY fbconfig.

Thanks,
Thomas


I
Comment 25 Thomas Hellström 2017-09-22 14:55:52 UTC
(In reply to Thomas Hellström from comment #24)
> (In reply to Martin Flöser from comment #23)

[...]

> 
> Having said that, I will ask Ajax to back out this change.  But that will
> probably break other not so prominent software requesting a GLX_SWAP_COPY
> fbconfig.
> 
> Thanks,
> Thomas
> 
> 

And BTW, backing out this will cause the compositing visual to be GLX_SWAP_COPY, which means all swapbuffers using that visual will typically see an extra copy on DRI3.

And also what you are actually implying is that all dri drivers must agree to never alter their exposed config list, because that might break Kwin.
Just FYI.

/Thomas
Comment 26 Martin Flöser 2017-09-22 15:19:19 UTC
As far as I can see KWin does not request GLX_SWAP_COPY. I posted KWin's selection code above.
Comment 27 Thomas Hellström 2017-09-22 16:35:20 UTC
Yes, that's true, but fixing GLX_SWAP_METHOD_OML, revealed that the transparent 32 bit visual typically becomes GLX_SWAP_COPY. It has never really worked before.

This has the implication that if you select GLX_SWAP_COPY, you typically get the compositing visual, which is typically really bad. It also has the implication that if you select the transparent visual, you will get GLX_SWAP_COPY which may or may not be bad.

So in the fix for that, I made sure the swap method wouldn't correlate with the transparent visual. 

That reordered the fbconfigs and that's what broke Kwin. Obviously there are other ways to reorder the fbconfigs, or even limit their numbers and some of them will definitely break kwin.
Comment 28 Adam Jackson 2017-09-22 19:06:26 UTC
(In reply to Martin Flöser from comment #22)

> I'm totally honest here: I don't care why it broke or whether our code is
> wrong or not.

You sound like a lovely person. I'm definitely motivated to work with you to fix your bugs.

It turns out that sRGB fbconfig sorting is completely unspecified, so Xorg implements... something? What it implements is just walking through the list the DRI driver returns in order and convert them into visuals. Then, since sRGB sorting is unspecified, Mesa's fbconfig sorting treats it as a "don't care", and you get whatever comes first in the list.

Now sRGB is probably something you do not want unless you explicitly ask for it, so the server could punish srgb fbconfigs into a worse select group (which mesa already internally sorts). And Mesa's libGL could be changed to match fbconfigs as exact instead of don't-care.

In a practical sense Mesa and Xorg should do whatever nvidia's driver does though, since that's effectively the spec at this point. So I've asked them, and when I get an answer, we might know what to do.
Comment 29 Thomas Hellström 2017-09-25 16:40:23 UTC
OK, I have what I think is a "make everybody happy" solution, that also fixes the problem with Kwin. (which I think is due to the fact that Kwin is very picky about the capabilities of the compositing visual).

Will take a couple of days to finalize and clean up, though.

/Thomas
Comment 30 Thomas Hellström 2017-09-26 21:27:27 UTC
Created attachment 134498 [details] [review]
Patch that duplicates fbconfigs for compositing visuals

So here's a patch that fixes the kwin issue at my end. Please test.
Comment 31 Nick Sarnie 2017-09-26 23:26:02 UTC
(In reply to Thomas Hellström from comment #30)
> Created attachment 134498 [details] [review] [review]
> Patch that duplicates fbconfigs for compositing visuals
> 
> So here's a patch that fixes the kwin issue at my end. Please test.

Hi Thomas,

Tested-By: Nick Sarnie <commendsarnex@gmail.com>

Thanks,
Sarnex
Comment 32 Tobias Klausmann 2017-09-30 15:29:33 UTC
(In reply to Thomas Hellström from comment #30)
> Created attachment 134498 [details] [review] [review]
> Patch that duplicates fbconfigs for compositing visuals
> 
> So here's a patch that fixes the kwin issue at my end. Please test.

Mh, does not help on my end for some reason. If you have any hints about further actions, let me know!
Comment 33 Thomas Hellström 2017-09-30 19:50:54 UTC
(In reply to Tobias Klausmann from comment #32)
> (In reply to Thomas Hellström from comment #30)
> > Created attachment 134498 [details] [review] [review] [review]
> > Patch that duplicates fbconfigs for compositing visuals
> > 
> > So here's a patch that fixes the kwin issue at my end. Please test.
> 
> Mh, does not help on my end for some reason. If you have any hints about
> further actions, let me know!

Please post the output of glxconfig.
Comment 34 Tobias Klausmann 2017-10-01 12:29:22 UTC
Created attachment 134593 [details]
glxinfo with patch applied
Comment 35 Thomas Hellström 2017-10-01 16:16:24 UTC
Tobias, 

Are you 100% sure the Intel driver you are using actually worked correctly before commit 4486d199? Due to the limited number of fbconfigs it exposes, from what I can tell, commit 4486d199 shouldn't have altered the exported visuals or fbconfigs in any way?
Comment 36 Tobias Klausmann 2017-10-01 18:25:19 UTC
(In reply to Thomas Hellström from comment #35)
> Tobias, 
> 
> Are you 100% sure the Intel driver you are using actually worked correctly
> before commit 4486d199? Due to the limited number of fbconfigs it exposes,
> from what I can tell, commit 4486d199 shouldn't have altered the exported
> visuals or fbconfigs in any way?

So after checking the commit hash, i found myself embarrassed, as it is an commit to the xsever repo, not mesa. The wrong colors (looking the same as shown in the above "bad" screenshot) started showing up after a mesa git commit around the same time (I asked about it in IRC the time around you seemed to initially talk about this bug). My apologies for the false clues! I'll investigate the problematic git commit and make a new bug if necessary.
Comment 37 Thomas Hellström 2017-10-09 10:06:44 UTC
So the Intel (i965) driver is a different issue altogether. While the panel and menu look similar to those in this bug, the dark grey is actually transparent, and the correct fbconfigs / visuals chosen.

I did a mesa bisect and the first failing mesa commit is

Author: Jason Ekstrand <jason.ekstrand@intel.com>  2017-09-12 23:26:04
Committer: Jason Ekstrand <jason.ekstrand@intel.com>  2017-09-18 21:16:55
Parent: e97f4b748094466567c7f3bad1a02ecee13db9c8 (i965: Reset miptree aux state on update_image_buffer)
Branches: master, remotes/fdo/master
Follows: 17.2-branchpoint
Precedes: 

    i965/tex_image: Reference the renderbuffer miptree in setTexBuffer2

/Thomas
Comment 38 Nick Sarnie 2017-11-04 04:03:00 UTC
Fixed in f84e59a4f474d22860bac8aec2947798a86db69b


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.