Bug 107117

Summary: mesa-18.1: regression with TFP on intel with modesettings and glamor acceleration
Product: Mesa Reporter: Olivier Fourdan <fourdan>
Component: Drivers/DRI/i965Assignee: Jason Ekstrand <jason>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: andrey.simiklit, corsac, daniel, kenneth, mark.a.janes
Version: gitKeywords: bisected, regression
Hardware: Other   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=107287
https://bugs.freedesktop.org/show_bug.cgi?id=106910
Whiteboard:
i915 platform: i915 features:
Attachments: Simple reproducer
Simple reproducer
Piglit patch to add GL_TEXTURE_RECTANGLE support to the tfp test
results from piglit patch on Mesa 17.2.8 and Mesa 18.2.0-devel
Simple reproducer
[PATCH] dri3: Do not get supported modifiers on pixmaps
[PATCH v2] dri3: For 1.2, use root window instead of pixmap drawable
[PATCH v3] dri3: For 1.2, use root window instead of pixmap drawable

Description Olivier Fourdan 2018-07-05 07:03:24 UTC
Created attachment 140469 [details]
Simple reproducer

Description:

Received a report downstream (xfce) that GL backend of the compositor would produce a blank screen after upgrading mesa to 18.1 (works with 18.0).

I am by no mean a GL expert, so it might be an issue with my code, a simple reproducer is attached.

How reproducible:

Always

Steps to reproduce:

1. Build and run the attached reproducer
   $ gcc -g tfp-test.c -o tfp-test $(pkg-config --cflags --libs epoxy xrender xfixes xcomposite x11)
   $ ./tfp-test

Actual result:

A blank window.

Expected result:

A window with a red square.

Additional information:

 * Using texture type GL_TEXTURE_RECTANGLE_ARB
 * Using texture target GLX_TEXTURE_RECTANGLE_EXT
 * Using texture format GLX_TEXTURE_FORMAT_RGBA_EXT
 * And renderer is “Mesa DRI Intel(R) Haswell Mobile ”

I ran a “git bisect” but half of the steps would end with an assertion failed in mesa so I am not entirely sure the bisection is accurate (I marked the segfault as bad commit as well, lacking a actual result to see).

The assertion failed is:

intel_mipmap_tree.c:1301: intel_miptree_match_image: Assertion `image->TexObject->Target == mt->target' failed.

Eventually, “git bisect” gave:

00926a2730190500a6a854659b193b022b92db2b is the first bad commit
commit 00926a2730190500a6a854659b193b022b92db2b
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Tue Sep 12 14:26:04 2017 -0700

    i965/tex_image: Reference the renderbuffer miptree in setTexBuffer2
    
    The old code made a new miptree that referenced the same BO as the
    renderbuffer and just trusted in the memory aliasing to work.  There are
    only two ways in which the new miptree is liable to differ from the one
    in the renderbuffer and neither of them matter:
    
     1) It may have a different target.  The only targets that we can ever
        see in intelSetTexBuffer2 are GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE
        and the difference between the two doesn't matter as far as the
        miptree is concerned; genX(update_sampler_state) only looks at the
        gl_texture_object and not the miptree when determining whether or
        not to use normalized coordinates.
    
     2) It may have a very slightly different format.  Again, this doesn't
        matter because we've supported texture views for quite some time so
        we always look at the gl_texture_object format instead of the
        miptree format for hardware setup anyway.
    
    On the other hand, because we were recreating the miptree, we were using
    intel_miptree_create_for_bo which doesn't understand modifiers.  We
    really want this function to work without doing a resolve so long as you
    have modifiers so we need to fix that.
    
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Chad Versace <chadversary@chromium.org>

:040000 040000 ed8106be921e3dcba1f784d4bb28b8983a23ebeb d095e5ec1ce035cae4770fc2a8d8b06433cfb1d7 M	src
Comment 1 asimiklit 2018-07-10 11:40:27 UTC
Hi, i managed to reproduce this bug using your solution. 
I have suggested simple solution for that:
https://patchwork.freedesktop.org/patch/237490/
Comment 2 Mark Janes 2018-07-10 15:27:22 UTC
Does this bug require a piglit test, to prevent similar regressions in the future?
Comment 3 Jason Ekstrand 2018-07-10 15:34:24 UTC
(In reply to Mark Janes from comment #2)
> Does this bug require a piglit test, to prevent similar regressions in the
> future?

Yes, we should either improve the current TFP test to also test GL_RECTANGLE or turn the test provided here into a piglit test.  Oliver, would it be ok to include that code in piglit under a BSD license?
Comment 4 Olivier Fourdan 2018-07-10 16:41:57 UTC
(In reply to Jason Ekstrand from comment #3)
> (In reply to Mark Janes from comment #2)
> > Does this bug require a piglit test, to prevent similar regressions in the
> > future?
> 
> Yes, we should either improve the current TFP test to also test GL_RECTANGLE
> or turn the test provided here into a piglit test.  Oliver, would it be ok
> to include that code in piglit under a BSD license?

Sure, no problem with me, but it would probably require a bit of cleanup, it's just a quick (and ugly) hack to demonstrate the issue.

However, I still see the issue even after applying https://patchwork.freedesktop.org/patch/237490/
Comment 5 Olivier Fourdan 2018-07-10 16:48:42 UTC
Reading the patch, I think I understand better, this patch is to fix the assertion failure I met during the “git bisect” trying to identify the commit hat first caused the rendering issue, it's not about fixing the the actual problem with rendering
Comment 6 Olivier Fourdan 2018-07-10 17:28:13 UTC
Created attachment 140540 [details]
Simple reproducer
Comment 7 Jason Ekstrand 2018-07-10 18:11:46 UTC
Created attachment 140541 [details] [review]
Piglit patch to add GL_TEXTURE_RECTANGLE support to the tfp test

I modified the current piglit TFP test to also test texture rectangle and it works if you remove the assert in intel_mipmap_tree.c.  Clearly, this isn't just a GL_TEXTURE_RECTANGLE issue.
Comment 8 asimiklit 2018-07-11 11:18:54 UTC
Created attachment 140557 [details]
results from piglit patch on Mesa 17.2.8 and Mesa 18.2.0-devel

(In reply to Jason Ekstrand from comment #7)
> Created attachment 140541 [details] [review] [review]
> Piglit patch to add GL_TEXTURE_RECTANGLE support to the tfp test
> 
> I modified the current piglit TFP test to also test texture rectangle and it
> works if you remove the assert in intel_mipmap_tree.c.  Clearly, this isn't
> just a GL_TEXTURE_RECTANGLE issue.

Hi I have tested your piglit patch and second version of "Simple reproducer" from Olivier Fourdan on:

   OpenGL vendor string: Intel Open Source Technology Center
   OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 620 (Kaby Lake GT2) 
   OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.2.8

and 
 
   OpenGL vendor string: Intel Open Source Technology Center
   OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 620 (Kaby Lake GT2) 
   OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.2.0-devel (git-04fff21c62)
   with my patch which just fixes unnecessary assert for GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE targets mismatch.


I could not reproduce the "rendering" issue on  Mesa 17.2.8 and on 18.2.0-devel versions. I received the identical correct result on both Mesas. So I think that the "rendering" issue was fixed somewhere between 18.1 and 18.2.0-devel however Mesa 18.2.0-devel still contains the "assertion" issue which should be fixed by my patch. 

Note: The first version of "Simple reproducer" from Olivier Fourdan was producing black screen on both Mesa 17.2.8 and on 18.2.0-devel for me so I fixed it and it stop to reproduce the "rendering" issue on both Mesas just "assertion" issue is reproduced.

Сould somebody also check if this "rendering" issue is still reproducable on Mesa 18.2.0-devel?
Comment 9 Denis 2018-07-11 13:39:06 UTC
(In reply to Olivier Fourdan from comment #6)
> Created attachment 140540 [details]
> Simple reproducer

Hi Oliver. I checked attached by you file on manjaro (xfce exactly) - mesa release versions 18.0.4, 18.1.0 and 18.1.3
In all of them I saw red square.
could it be CPU specific issue? I have 2 SKL's (ubuntu and manjaro) and on both 18.1.0 mesa draws red square.
Comment 10 Olivier Fourdan 2018-07-11 14:17:39 UTC
(In reply to Denis from comment #9)
> (In reply to Olivier Fourdan from comment #6)
> > Created attachment 140540 [details]
> > Simple reproducer
> 
> Hi Oliver. I checked attached by you file on manjaro (xfce exactly) - mesa
> release versions 18.0.4, 18.1.0 and 18.1.3
> In all of them I saw red square.

But TFP is used (optionally) only in xfce from git master, so unless you run the current (unreleased) version of xfce and force GL rendering, you won't see this bug.

> could it be CPU specific issue? I have 2 SKL's (ubuntu and manjaro) and on
> both 18.1.0 mesa draws red square.

Could be, I had also reports from other xfce users who were facing this bug.
Comment 11 Olivier Fourdan 2018-07-11 14:23:55 UTC
(In reply to asimiklit from comment #8)
> [...]
> Сould somebody also check if this "rendering" issue is still reproducable on
> Mesa 18.2.0-devel?

FWIW, I could reproduce with master (from July 10), so either this is CPU/GPU related or it got fixed very recently. I shall retry as soon as I get back to my computer, and report back.
Comment 12 Denis 2018-07-13 11:37:53 UTC
>But TFP is used (optionally) only in xfce from git master, so unless you run the current (unreleased) version of xfce and force GL rendering, you won't see this bug.

In my understanding, attached file should be more then enough for issue reproducing, yes?

>Could be, I had also reports from other xfce users who were facing this bug.
Could you please give information about your hardware? I ran the test on 3 different CPU's - SKL, KBL and SNB.
Comment 13 Olivier Fourdan 2018-07-13 12:48:55 UTC
(In reply to Denis from comment #12)
> In my understanding, attached file should be more then enough for issue
> reproducing, yes?

Yes, absolutely.

> Could you please give information about your hardware? I ran the test on 3
> different CPU's - SKL, KBL and SNB.

Sure:

00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b) (prog-if 00 [VGA controller])
	Subsystem: Lenovo Device 220c
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 46
	Region 0: Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at 3000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00018  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: i915
	Kernel modules: i915
Comment 14 Olivier Fourdan 2018-07-16 09:47:50 UTC
Humm, now I wonder, could this be related to the modesettings DDX and glamor somehow?

I tried with the intel DDX instead of the modesettings DDX and the test succeeds on the same hardware... What DDX are you using?
Comment 15 Denis 2018-07-16 12:38:39 UTC
Olivier, could you please advice, how to verify the driver in use? I checked via Synaptic, searching "xf86-video" - and in results (marked as Installed) I didn't see modesetting.

simpler search also doesn't display anything related to modesetting:


den@den-Latitude-E5470:~$ locate xf86-video
/usr/lib/xserver-xorg-video-intel-hwe-16.04/xf86-video-intel-backlight-helper
/usr/share/apport/package-hooks/source_xf86-video-displaylink.py
/usr/share/apport/package-hooks/source_xf86-video-msm.py
/usr/share/apport/package-hooks/source_xf86-video-omapfb.py
/usr/share/polkit-1/actions/org.x.xf86-video-intel.backlight-helper.policy
Comment 16 Olivier Fourdan 2018-07-16 13:13:23 UTC
(In reply to Denis from comment #15)
> Olivier, could you please advice, how to verify the driver in use?

The DDX can be seen in the Xorg logs, either in "~/.local/share/xorg/Xorg.*.log" or in "/var/log/Xorg.*.log" depending on your system.

There, you should see either "modeset" or "intel" depending on which DDX you've selected in your xorg.conf (or xorg.conf.d/).

e.g. for modesetting:

   (II) modesetting: Driver for Modesetting Kernel Drivers: kms
   (II) modeset(0): using drv /dev/dri/card0

for intel:

   (II) intel: Driver for Intel(R) Iris(TM) Graphics
   (II) intel: Driver for Intel(R) Iris(TM) Pro Graphics
   (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20180308
   (--) intel(0): Integrated Graphics Chipset: Intel(R) HD Graphics 4400

Point is, something has changed in Mesa 18.1 which causes attachment 140540 [details] to render a white window with the Xorg modesetting driver with glamor but not with the intel DDX, at least on my hardware (using Fedora 28 here with xserver-1.20)... Many distributions have switched to the modesetting driver by default with Intel hardware.
Comment 17 Denis 2018-07-16 13:29:11 UTC
thanks for highlighting this. Checked my Xorg log and below the part related to modeseting:

According to it, I have modesetting driver, and still don't reproduce the issue. Possibly I will try to do same operation - exchange modesetting driver with intel and check test again.


[620297.866] (II) LoadModule: "modesetting"
[620297.867] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[620297.867] (II) Module modesetting: vendor="X.Org Foundation"
[620297.867] 	compiled for 1.19.5, module version = 1.19.5
[620297.867] 	Module class: X.Org Video Driver
[620297.867] 	ABI class: X.Org Video Driver, version 23.0
[620297.867] (II) LoadModule: "fbdev"
[620297.868] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[620297.868] (II) Module fbdev: vendor="X.Org Foundation"
[620297.868] 	compiled for 1.19.3, module version = 0.4.4
[620297.868] 	Module class: X.Org Video Driver
[620297.868] 	ABI class: X.Org Video Driver, version 23.0
[620297.868] (II) LoadModule: "vesa"
[620297.869] (II) Loading /usr/lib/xorg/modules/drivers/vesa_drv.so
[620297.869] (II) Module vesa: vendor="X.Org Foundation"
[620297.870] 	compiled for 1.19.3, module version = 2.3.4
[620297.870] 	Module class: X.Org Video Driver
[620297.870] 	ABI class: X.Org Video Driver, version 23.0
[620297.870] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
Comment 18 Denis 2018-07-16 14:17:11 UTC
hmm, in your first comment you marked this:
>Additional information:
> * Using texture type GL_TEXTURE_RECTANGLE_ARB

And what I found out in my case - that after running test I see this:
>Using texture type GL_TEXTURE_2D
That's possible reason why test is passing in my case. Investigating, why it takes this texture type.
Comment 19 Olivier Fourdan 2018-07-16 14:25:11 UTC
(In reply to Denis from comment #17)
> [...]
> According to it, I have modesetting driver, and still don't reproduce the
> issue.

Yes, but you're using xserver-1.19.5, not 1.20.

This is where it gets even more interesting, I cannot reproduce with modesettings/glamor from xserver-1.19 either.

I understand this is getting confusing, considering the number of possibilities and packages/configuration involved. I don't want to confuse things even more...

Yet, this is what I've been able to come up with, so far:

                                | Mesa-18.0.x | Mesa-18.1.x | Mesa (git)  |
--------------------------------+-------------+-------------+-------------+
xserver-1.19 + modesettings DDX |      OK     |      OK     |      OK     |
--------------------------------+-------------+-------------+-------------+
xserver-1.19 + intel DDX        |      OK     |      OK     |      OK     |
--------------------------------+-------------+-------------+-------------+
xserver-1.20 + modesettings DDX |      OK     |     FAIL    |     FAIL    |
--------------------------------+-------------+-------------+-------------+
xserver-1.20 + intel DDX        |      OK     |      OK     |      OK     |
--------------------------------+-------------+-------------+-------------+
Comment 20 Denis 2018-07-18 10:23:41 UTC
Olivier, could you please share info, how did you build xserver 1.20?
I used this manual http://www.linuxfromscratch.org/blfs/view/svn/x/xorg-server.html

and my machine stuck on welcome screen after building and installing. Had to reinstall Ubuntu after (rollback via chroot didn't help second time).

I still want to reproduce the same conditions on my local machine.
Comment 21 Olivier Fourdan 2018-07-18 11:37:19 UTC
(In reply to Denis from comment #20)
> [...]
> 
> I still want to reproduce the same conditions on my local machine.

My advise would be to never replace the binaries from your distribution with ones you've compiled, but to install in a completely separate directory tree. 

If that helps, this issue was first reported (downstream in xfce) by Debian buster/sid users after the switch to Mesa 18.1.1

  https://bugzilla.xfce.org/show_bug.cgi?id=14475#c17
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=901789
Comment 22 Olivier Fourdan 2018-07-18 13:42:02 UTC
Some more info, this is related to the use of "glamor"...

xserver-1.20 + modesettings DDX + glamor : FAILS
xserver-1.20 + modesettings DDX + w/o glamor : OK

That explains why intel DDX works, it doesn't use glamor.

Same with Wayland, weston started with "--use-pixman" (which disables acceleration in Xwayland as well) works, whereas weston with Xwayland and glamor acceleration fails as well as Xorg with modesettings and glamor acceleration.

Now, I don't know if the issue is actually in glamor or in mesa...
Comment 23 Olivier Fourdan 2018-07-19 09:33:03 UTC
So, as this is also reproducible with an example from khronos.org with Xwayland, I bisected the Xserver and was able to pinpoint the first bad commit in glamor.

I filed https://bugs.freedesktop.org/show_bug.cgi?id=107287 for glamor.
Comment 24 Olivier Fourdan 2018-07-20 11:40:58 UTC
Quick follow up, investigating bug 107287 (affecting Xwayland which also uses glamor), it became apparent that the issue was more or less related to the introduction of multiple planes and format modifier support in both Mesa and the Xserver.

So I reran a new bisection (now avoiding the assert) and this time was able to find the first bad commit:

069fdd5f9facbd72fb6a289696c7b74e3237e70f is the first bad commit
commit 069fdd5f9facbd72fb6a289696c7b74e3237e70f
Author: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Date:   Fri Jul 7 02:54:26 2017 -0400

    egl/x11: Support DRI3 v1.1
    
    Add support for DRI3 v1.1, which allows pixmaps to be backed by
    multi-planar buffers, or those with format modifiers. This is both
    for allocating render buffers, as well as EGLImage imports from a
    native pixmap (EGL_NATIVE_PIXMAP_KHR).
    
    Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
    Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
    Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
    Reviewed-by: Daniel Stone <daniels@collabora.com>

:040000 040000 24a88d838ccccdcd3aadb977d50c3fa59b9373bc 938aabed7e3e5a41fd0e3167a6afdceac69a37fd M	src


This explains why this requires both Mesa >= 18.1 and xserver >= 1.20 (as format modifiers support landed on both versions) and also why this affects only intel (as no other driver implement format modifiers yet, afaik).
Comment 25 Olivier Fourdan 2018-07-20 15:59:16 UTC
Created attachment 140735 [details]
Simple reproducer

An even simpler reproducer based on https://www.khronos.org/opengl/wiki/Programming_OpenGL_in_Linux:_Using_texture_from_pixmap_extension

Only change there is the use of GLX_DOUBLEBUFFER to trigger the issue
Comment 26 Olivier Fourdan 2018-07-25 09:12:46 UTC
Ok, some progress...

→ The rendering fails because dri3_alloc_render_buffer() fails.

→ dri3_alloc_render_buffer() fails because xcb_dri3_get_supported_modifiers_reply() returns NULL.

→ xcb_dri3_get_supported_modifiers_reply() return NULL because proc_dri3_get_supported_modifiers() fails.

→ proc_dri3_get_supported_modifiers() fails because dixLookupWindow() fails.

→ dixLookupWindow() fails because Mesa passed a pixmap and not a window (draw->drawable is a pixmap, not a window)
Comment 27 Olivier Fourdan 2018-07-25 14:14:21 UTC
Created attachment 140813 [details] [review]
[PATCH] dri3: Do not get supported modifiers on pixmaps

This patch fixes the issue for me.

Also sent to mesa-devel, patch at https://patchwork.freedesktop.org/series/47217/
Comment 28 Olivier Fourdan 2018-07-26 07:35:11 UTC
Created attachment 140817 [details] [review]
[PATCH v2] dri3: For 1.2, use root window instead of pixmap drawable
Comment 29 Olivier Fourdan 2018-07-26 07:49:13 UTC
Created attachment 140818 [details] [review]
[PATCH v3] dri3: For 1.2, use root window instead of pixmap drawable
Comment 30 vadym 2018-08-08 13:51:11 UTC
Patch works well for me. Looks like bug can be closed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.