Bug 99400

Summary: [nouveau] garbled rendering with glamor on G71
Product: Mesa Reporter: Olivier Fourdan <fourdan>
Component: Drivers/DRI/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED MOVED QA Contact: Nouveau Project <nouveau>
Severity: normal    
Priority: medium CC: fdsfgs
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Screenshot of the issue
GLAMOR_DEBUG=1 weston

Description Olivier Fourdan 2017-01-13 15:21:00 UTC
Created attachment 128933 [details]
Screenshot of the issue

Description:

While investigating rendering bugs on Xwayland with nouveau specifically ([1] and [2]) I realized that some gtk2 output (this Xwayland) is completely garbled.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1411447
[2] https://bugzilla.gnome.org/show_bug.cgi?id=776255

How reproducible:

Always

Steps to reproduce:

1. Log in a Wayland session (either Weston or GNOME shell) on a Dell M1710 with a GeForce Go 7950 GTX 
2. Run gtk-demo (the one from gtk2, i.e. not Wayland native but relying on Xwayland)
3. Start the "Assistant" demo (second starting from the top)

Actual result:

The window is black with random content

Expected result:

The rendering is correct

Additional data:

This issue does not occur when using "shm" and CPU rendering instead of glamor in Xwayland.

I am not aware of any other hardware where glamor produces such garbled output, so I suspect an issue with nouveau, but if you reckon it's an issue with glamor instead, please flip this bug back to xserver/glamor

I have seen other issue with black icons being reported with glamor and nouveau as well and these do not occur when using the CPU render in weston for example (https://bugzilla.redhat.com/show_bug.cgi?id=1411447#c12)

In my case, hardware is VGA compatible controller: NVIDIA Corporation G71M [GeForce Go 7950 GTX] (rev a1) (prog-if 00 [VGA controller])

01:00.0 0300: 10de:0297 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: 1028:019b
Comment 1 Olivier Fourdan 2017-01-13 15:26:24 UTC
apitrace of Xwayland available here:

https://people.freedesktop.org/~ofourdan/xwayland-apitrace-bug-99400.bz2
Comment 2 Ilia Mirkin 2017-01-13 15:33:40 UTC
Not that it's of much consolation, but the nv30 GL driver sucks. There are a number of unhandled situations, and I wouldn't be overly surprised if glamor were to hit one or several of them.

A "big hammer" approach to fixing some of the issues is running with NV30_SWTNL=1, although there's plenty more issues that this won't have an effect on. That usually fixes "bad geometry" issues, however the screenshot seems to suggest that perhaps incompatible color and depth buffers are being mixed - we print a warning about those in debug builds.

I'll have a look at the apitrace later to see if can try reproducing on my NV34, although it's possible that won't work this time.
Comment 3 Olivier Fourdan 2017-01-13 15:42:36 UTC
(In reply to Ilia Mirkin from comment #2)
> Not that it's of much consolation, but the nv30 GL driver sucks. There are a
> number of unhandled situations, and I wouldn't be overly surprised if glamor
> were to hit one or several of them.

Could be a regression, I don't remember seeing those /before/ and the downstream bugs about black icons are recent as well.

And there've been some glamor changes in the xserver as well, so it oculd be /something/ in glamor has changed and doesn't play well with the driver.

> A "big hammer" approach to fixing some of the issues is running with
> NV30_SWTNL=1, although there's plenty more issues that this won't have an
> effect on. That usually fixes "bad geometry" issues, however the screenshot
> seems to suggest that perhaps incompatible color and depth buffers are being
> mixed

Yes, that makes me think of that as well, could explain the black icons being reported downstream.

> - we print a warning about those in debug builds.

Good, so I'll rebuild with "--enable-debug" and see what I get.
 
> I'll have a look at the apitrace later to see if can try reproducing on my
> NV34, although it's possible that won't work this time.

Thanks a bunch!
Comment 4 Ilia Mirkin 2017-01-13 15:50:08 UTC
(In reply to Olivier Fourdan from comment #3)
> (In reply to Ilia Mirkin from comment #2)
> > Not that it's of much consolation, but the nv30 GL driver sucks. There are a
> > number of unhandled situations, and I wouldn't be overly surprised if glamor
> > were to hit one or several of them.
> 
> Could be a regression, I don't remember seeing those /before/ and the
> downstream bugs about black icons are recent as well.
> 
> And there've been some glamor changes in the xserver as well, so it oculd be
> /something/ in glamor has changed and doesn't play well with the driver.

This may come as some surprise, but nv30 is not a hotbed of development activity. If you can figure out a change in mesa that broke it, that'd be very helpful.

Separately, check dmesg - NVIDIA cards tend to yell loudly when we do something wrong.
Comment 5 Olivier Fourdan 2017-01-13 16:53:08 UTC
(In reply to Ilia Mirkin from comment #4)
> This may come as some surprise, but nv30 is not a hotbed of development

Understandable :)

> activity. If you can figure out a change in mesa that broke it, that'd be
> very helpful.

Actually, I was thinking more about glamor here.

> Separately, check dmesg - NVIDIA cards tend to yell loudly when we do
> something wrong.

I have tried a build with --enable-debug, but I see nothing wrong at all being reported neither in dmesg nor in journalctl.
Comment 6 Ilia Mirkin 2017-01-13 18:47:36 UTC
(In reply to Olivier Fourdan from comment #1)
> apitrace of Xwayland available here:
> 
> https://people.freedesktop.org/~ofourdan/xwayland-apitrace-bug-99400.bz2

How did you get this trace? I can't seem to replay it - none of the context setup stuff is there. (And it seems to like doing glFlush *a lot*...)
Comment 7 Olivier Fourdan 2017-01-13 19:22:40 UTC
(In reply to Ilia Mirkin from comment #6)
> How did you get this trace? I can't seem to replay it - none of the context
> setup stuff is there. (And it seems to like doing glFlush *a lot*...)

Directly on Xwayland.

  apitrace trace /usr/bin/Xwayland "$@"
Comment 8 Ilia Mirkin 2017-01-14 00:33:00 UTC
(In reply to Olivier Fourdan from comment #7)
> (In reply to Ilia Mirkin from comment #6)
> > How did you get this trace? I can't seem to replay it - none of the context
> > setup stuff is there. (And it seems to like doing glFlush *a lot*...)
> 
> Directly on Xwayland.
> 
>   apitrace trace /usr/bin/Xwayland "$@"

OK, well, if I can't replay the trace then it's not of much use. I don't know anything about Wayland or XWayland or how they work, unfortunately.

Perhaps you might have success in tracing Xvfb + glamor or Xnest + glamor or something?
Comment 9 Olivier Fourdan 2017-01-16 16:17:17 UTC
Created attachment 128984 [details]
GLAMOR_DEBUG=1 weston

Humm, using GLAMOR_DEBUG, I see this error:

  glamor_composite_choose_shader:       Unsupported source picture format.
    glamor_composite_with_shader:       glamor_composite_choose_shader failed

When the issue occurs, so it's possibly that glamor is not checking all of its own requirement in glamor_init()
Comment 10 Olivier Fourdan 2017-01-16 16:36:47 UTC
(In reply to Olivier Fourdan from comment #9)
>   glamor_composite_choose_shader:       Unsupported source picture format.
>     glamor_composite_with_shader:       glamor_composite_choose_shader failed

I see the same with intel and yet rendering is correct there, so it's not that.
Comment 11 Olivier Fourdan 2017-01-17 11:03:27 UTC
(In reply to Olivier Fourdan from comment #10)
> (In reply to Olivier Fourdan from comment #9)
> >   glamor_composite_choose_shader:       Unsupported source picture format.
> >     glamor_composite_with_shader:       glamor_composite_choose_shader failed
> 
> I see the same with intel and yet rendering is correct there, so it's not
> that.

Unless the issue is with the fallback code in glamor, i.e.:

https://cgit.freedesktop.org/xorg/xserver/tree/glamor/glamor_render.c#n1699

Calling the fallback code unconditionally (i.e. "goto fail;" at the beginning of glamor_composite() ) still exhibits the issue on nvidia/nouveau and not on intel, but then it doesn't seem to be specific to nouveau... I am confused now.
Comment 12 Olivier Fourdan 2017-01-27 15:13:44 UTC
I traced it down to the use of the text input field in gtk-demo's assistant.

The issue seems to be with one particular call to glamor_poly_fill_rect_gl() with more than one prect (nrect > 1).

https://cgit.freedesktop.org/xorg/xserver/tree/glamor/glamor_rects.c#n42

Considering that the GLSL version is < 130 with this hardware, we are in the else case:

https://cgit.freedesktop.org/xorg/xserver/tree/glamor/glamor_rects.c#n83

If, in that case, with nrect > 1, we bail (to SW) then rendering is fine.

But that same code gives the correct result in may other cases, with multiple rects, so it does not seem to be a bug in the glamor code though... Besides, the same code as used in Xephyr works fine.

I captured the apitrace of both Xehpyr -glamor and Xwayland along with the mesa debug and glamor debug messages with:

  WAYLAND_DISPLAY=wayland-0 MESA_DEBUG=1,incomplete_tex,incomplete_fbo,context apitrace trace ~/local/bin/Xwayland -noreset :1 |& tee Xwayland.log &
  DISPLAY=:1 GDK_BACKEND=x11 gtk-demo

Then same with Xephyr:

  DISPLAY=:0 MESA_DEBUG=1,incomplete_tex,incomplete_fbo,context  apitrace trace ~/local/bin/Xephyr -glamor -noreset :1 |& tee Xephyr.log &
  DISPLAY=:1 GDK_BACKEND=x11 gtk-demo

Mesa complains that "Texture Obj xxx incomplete because: TexImage[1] is missing" but that occurs with both Xephyr and Xwayland so I guess it should not be the problem.

By logging the prect coordinates, we can match if with the apitrace dump in both cases. But again, I don't see much difference between the two (Xephyr -glamor vs. Xwayland) 

Logs and traces are available here:

   https://people.freedesktop.org/~ofourdan/bug99400/
Comment 13 Ilia Mirkin 2017-01-28 15:57:26 UTC
Olivier,

I'd like to help you, but I'm just not sure how -- I don't have Wayland (or any interest in setting that up), and your traces appear incomplete. I don't know why - I've never seen that before. But they don't create contexts. Compare your Xephyr trace, which has

1 glXChooseFBConfig
...
9 glXCreateContext
10 glXMakeCurrent

While the Xwayland trace has none of those. Paradoxically, it starts at call 31. Perhaps you have a non-egl-enabled apitrace? Also note that there's not a single instance of UseProgram or any other shader-related items in the XWayland trace.

Does the Xephyr trace exhibit the problems for you when replayed? If so, I can investigate that. However on my GK208 (or with llvmpipe), when replayed, it doesn't appear to render anything.
Comment 14 Ilia Mirkin 2017-01-28 16:17:11 UTC
Note that there's a shotgun-debugging approach here -

nv30_vbo_validate - force it to take the vertex->need_conversion path (like we do on BE). No clue if that'll help, but it's a start.

Also did you check whether NV30_SWTNL=1 fixes things?
Comment 15 Olivier Fourdan 2017-01-30 09:41:42 UTC
(In reply to Ilia Mirkin from comment #13)
> I'd like to help you, but I'm just not sure how

I know, and very much appreciate your help!

I'm sorry I can't be of more help, I spent like the past 5 days or so on this and all I could come up with is pretty much summarized in comment 12, ie not much.

> [...] I've never seen that before. But they don't create contexts.
> Compare your Xephyr trace, which has glXChooseFBConfig, glXCreateContext,
> glXMakeCurrent

I think this is normal, this is Wayland not X so it doesn't need/use GLX.

Xwayland is an X server for the X11 clients and a Wayland client to the Wayland compositor, so that X11 application can still work on Wayland.

> While the Xwayland trace has none of those. Paradoxically, it starts at call
> 31. Perhaps you have a non-egl-enabled apitrace? Also note that there's not
> a single instance of UseProgram or any other shader-related items in the
> XWayland trace.

Yes, that's puzzling... I rebuilt apitrace current from github (instead of using the one from Fedora) and it's the same.

> Does the Xephyr trace exhibit the problems for you when replayed? If so, I
> can investigate that. However on my GK208 (or with llvmpipe), when replayed,
> it doesn't appear to render anything.

Unfortunately I could never make apitrace replay anything here either.

(In reply to Ilia Mirkin from comment #14)
> nv30_vbo_validate - force it to take the vertex->need_conversion path (like
> we do on BE). No clue if that'll help, but it's a start.
> 
> Also did you check whether NV30_SWTNL=1 fixes things?

Neither forcing conversion in nv30_vbo_validate() nor using NV30_SWTNL=1 change anything.
Comment 16 Olivier Fourdan 2017-01-30 12:41:54 UTC
(In reply to Ilia Mirkin from comment #13)
> I'd like to help you, but I'm just not sure how -- I don't have Wayland (or
> any interest in setting that up), and your traces appear incomplete. I don't
> know why - I've never seen that before. But they don't create contexts.
> [...]
> While the Xwayland trace has none of those. Paradoxically, it starts at call
> 31. Perhaps you have a non-egl-enabled apitrace? Also note that there's not
> a single instance of UseProgram or any other shader-related items in the
> XWayland trace.
Oh right... Blimey! You got me thinking here, you're right, those Xwayland traces are just wrong, it's libepoxy not playing nice with "apitrace --api-egl" (https://github.com/anholt/libepoxy/issues/68) so I picked up the forked version libexpoy, rebuilt apitrace and Xwayland and was able to capture a much better trace!
Comment 17 Olivier Fourdan 2017-01-30 12:50:18 UTC
Updated traces and logs here:

https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.log
https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.trace

With this, I see interesting errors like:

Mesa: User error: GL_INVALID_VALUE in glCopyTexSubImage2D(xoffset+width)
(EE) glamor0: GL error: GL_INVALID_VALUE in glCopyTexSubImage2D(xoffset+width)
Comment 18 Ilia Mirkin 2017-01-30 19:49:10 UTC
Curious. It should perhaps be noted that there are no instances of glCopyTexSubImage2D in your trace. That means they happen some other way...

It should be noted that the warning in question can only come out as

      if (yoffset + subHeight > (GLint) destImage->Height) {
         _mesa_error(ctx, GL_INVALID_VALUE, "%s(yoffset+height)", func);
         return GL_TRUE;
      }

Anyways, looking at your recent trace, while it contains various useful items, I can't get any of the framebuffers to show particularly useful data when replaying to specific draws on an intel system. Can you find a draw that should have a useful image?
Comment 19 Ilia Mirkin 2017-01-30 19:51:12 UTC
OH! DUH!

apitrace: warning: glMapBufferRange: MAP_COHERENT_BIT|MAP_WRITE_BIT unsupported <https://git.io/vV9kM>

Right, so you need to run this with MESA_EXTENSION_OVERRIDE=-GL_ARB_buffer_storage. Or find some way to force glamor to disable it. apitrace doesn't like it. I'm sure that's why the textures are garbage.
Comment 20 Olivier Fourdan 2017-01-31 09:11:41 UTC
(In reply to Ilia Mirkin from comment #19)
> Right, so you need to run this with
> MESA_EXTENSION_OVERRIDE=-GL_ARB_buffer_storage. Or find some way to force
> glamor to disable it. apitrace doesn't like it. I'm sure that's why the
> textures are garbage.

Haha! yes! I can see the content now, on intel.

I cannot load the file on nouveau where it was captured, it says

 | 2 @0 eglGetPlatformDisplayEXT(platform = EGL_PLATFORM_GBM_KHR, native_display = 0x26926b0, attrib_list = {}) = 0x26d8250
 | 2: warning: unsupported eglGetPlatformDisplayEXT call
error: failed to create OpenGL 3.1 context.

Same links, updated files:

https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.log
https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.trace

For example frame 8932 (toward the end) is interesting imho, because it shows the icon looks wrong (which is what led me to this issue in the first place).

But the rest of the window looks fine on intel, which is not what I see on nouveau with the G71, it looks all garbled there (as in attachment  128933 [details])
Comment 21 Olivier Fourdan 2017-01-31 12:37:26 UTC
(In reply to Olivier Fourdan from comment #20)
> 
> I cannot load the file on nouveau where it was captured, it says
> 
>  | 2 @0 eglGetPlatformDisplayEXT(platform = EGL_PLATFORM_GBM_KHR,
> native_display = 0x26926b0, attrib_list = {}) = 0x26d8250
>  | 2: warning: unsupported eglGetPlatformDisplayEXT call
> error: failed to create OpenGL 3.1 context.

Quick follow up on this, this is because of the way glamor works, it first creates an EGL context with some config attributes (which include the OpenGL version 3.1) and if it fails, it retries with no config attributes:

https://cgit.freedesktop.org/xorg/xserver/tree/glamor/glamor_egl.c#n819

or for Xwayland:

https://cgit.freedesktop.org/xorg/xserver/tree/hw/xwayland/xwayland-glamor.c#n310

So back to apitrace, as the config attributes specifies OpenGL 3.1, it fails and apitrace just stops there. So I changed Xwayland to not do the first call (which on this HW returns NULL anyway) and now apitrace is a happy buddy, I can replay the trace on nouveau as well.

Same links, updated files:

https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.log
https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.trace

> For example frame 8932 (toward the end) is interesting imho, [...]

Let's forget about this, now that the new trace can replay on both intel and nouveau.

The problem I see is rather that I see no problem in apitrace, the frames look fine, AFAICT.

And yet the rendering was wrong on screen when I captured the trace...
Comment 22 Ilia Mirkin 2017-02-02 06:26:35 UTC
(In reply to Olivier Fourdan from comment #21)
> Same links, updated files:
> 
> https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.log
> https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.trace
> 
> > For example frame 8932 (toward the end) is interesting imho, [...]
> 
> Let's forget about this, now that the new trace can replay on both intel and
> nouveau.
> 
> The problem I see is rather that I see no problem in apitrace, the frames
> look fine, AFAICT.

Hmmm... this is very unfortunate. And furthermore unfortunate is that this trace makes use of NPOT textures, which my NV34 doesn't support (at least not without some pretty nasty workarounds).

If replaying the trace yields the desired results, then this is all for naught. You can try running

  apitrace dump-images --calls='*/draw' Xwayland.trace

and see if any of the screenshots generated show the artifacts. If not, then this is a much more insidious problem.
Comment 23 Ilia Mirkin 2017-03-18 22:06:27 UTC
OK, I now have a NV4A (NV44A). Running apitrace dump-images and comparing them to the gk208 output yielded identical results.

If there's some way that doesn't involve me installing a *ton* of software, I'd be happy to look at this further. Perhaps you can come up with a set of repro steps that I can use. Please note that I've never used Wayland or anything related.
Comment 24 Olivier Fourdan 2017-03-20 08:29:20 UTC
One possibility would be to run a live image of Fedora 25 (which comes with Wayland by default), at least it would allow for a quick test (no need for you to install everything if it's not reproducible on your hardware setup...)

However, gtk-demo (which is the simple way to reproduce) comes with the devel packages, which won't be on a live image... But this should be installable, even on a live image, as long as a network connection is available.
Comment 25 poma 2017-03-20 11:13:08 UTC
Daily/nightly composes can be found here:

Fedora 26
https://dl.fedoraproject.org/pub/fedora/linux/development/26/Spins/x86_64/iso/
https://dl.fedoraproject.org/pub/fedora/linux/development/26/Workstation/x86_64/iso/

Rawhide - Fedora 27
https://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/Spins/x86_64/iso/
https://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/Workstation/x86_64/iso/

...........................................................

How to write LiveDVD ISO image to portable USB flash drive:

e.g.
# dd if=Fedora*Live*.iso of=/dev/disk/by-id/usb-<ID_SERIAL> \
  bs=1M iflag=direct oflag=direct conv=fdatasync status=progress

If /dev/sdz represents portable USB flash drive
$ udevadm info -q env -n sdz | grep ID_SERIAL=

Using ID_SERIAL known is the exact device that will be used

WARNING: using 'dd' will erase all partitions and -data- on portable USB flash drive

Ref.
https://fedoraproject.org/wiki/How_to_create_and_use_Live_USB

..........................................

How to install demonstrative GTK+ widgets:
# dnf install gtk2-devel gtk3-devel
Comment 26 Ilia Mirkin 2017-03-20 18:11:40 UTC
(In reply to Olivier Fourdan from comment #24)
> One possibility would be to run a live image of Fedora 25 (which comes with
> Wayland by default), at least it would allow for a quick test (no need for
> you to install everything if it's not reproducible on your hardware setup...)
> 
> However, gtk-demo (which is the simple way to reproduce) comes with the
> devel packages, which won't be on a live image... But this should be
> installable, even on a live image, as long as a network connection is
> available.

I was hoping to avoid rebooting :) But like... would weston + gtk2-demo be sufficient? Or do I need something else, like, say, GNOME or KDE or something? (If the latter, that's pretty much a non-starter.)
Comment 27 Olivier Fourdan 2017-03-20 18:45:38 UTC
(In reply to Ilia Mirkin from comment #26)
> I was hoping to avoid rebooting :) But like... would weston + gtk2-demo be
> sufficient? Or do I need something else, like, say, GNOME or KDE or
> something? (If the latter, that's pretty much a non-starter.)

Sure, no need for GNOME or KDE, weston + Xwayland + gtk+-2.24.x should be sufficient.
Comment 28 Ilia Mirkin 2017-03-26 02:03:44 UTC
OK, this weston stuff is a no-go. No way to run it against a specific dri card without intensive udev work (which I have no interest in learning), and apparently won't even let me start without some env vars (XDG_RUNTIME_DIR, perhaps others) after I tried starting it under X.

So... I can't repro with the given instructions (because I can't follow the instructions successfully). Perhaps there are other instructions that reproduce the issue? Does Xephyr -glamor reproduce the issue? Something else that I can run?
Comment 29 Olivier Fourdan 2017-03-27 13:26:43 UTC
(In reply to Ilia Mirkin from comment #28)
> OK, this weston stuff is a no-go. No way to run it against a specific dri
> card without intensive udev work (which I have no interest in learning), and
> apparently won't even let me start without some env vars (XDG_RUNTIME_DIR,
> perhaps others) after I tried starting it under X.

You need Xwayland, and therefore a Wayland compositor to connect to. But Xwayland can run standalone (within a Wayland compositor) so once/if you can get a Wayland compositor to run, you could run Xwayland from there and (hopefully) reproduce, e.g.:

    Xwayland :30 &
    DISPLAY=:30 gtk-demo

But the requirement is to have a Wayland compositor so that Xwayland can run.

Weston is not the only Wayland compositor, apart from GNOME, there are other smaller Wayland compositors such as sway (http://swaywm.org/) which can run nested as well (so you can test from an existing X11 session), but I haven't played much with those.

> So... I can't repro with the given instructions (because I can't follow the
> instructions successfully). Perhaps there are other instructions that
> reproduce the issue? Does Xephyr -glamor reproduce the issue? Something else
> that I can run?

Unfortunately no, "Xephyr -glamor" does not reproduce the issue, which makes things even more confusing, admittedly.
Comment 30 Olivier Fourdan 2017-03-27 14:12:42 UTC
One more thing, maybe, Xephyr opens /dev/dri/card0 whereas Xwayland uses EGL with render node /dev/dri/renderD128
Comment 31 GitLab Migration User 2019-09-18 20:44:43 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1124.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.