Summary: | [nouveau] garbled rendering with glamor on G71 | ||
---|---|---|---|
Product: | Mesa | Reporter: | Olivier Fourdan <fourdan> |
Component: | Drivers/DRI/nouveau | Assignee: | Nouveau Project <nouveau> |
Status: | RESOLVED MOVED | QA Contact: | Nouveau Project <nouveau> |
Severity: | normal | ||
Priority: | medium | CC: | fdsfgs |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Screenshot of the issue
GLAMOR_DEBUG=1 weston |
Description
Olivier Fourdan
2017-01-13 15:21:00 UTC
apitrace of Xwayland available here: https://people.freedesktop.org/~ofourdan/xwayland-apitrace-bug-99400.bz2 Not that it's of much consolation, but the nv30 GL driver sucks. There are a number of unhandled situations, and I wouldn't be overly surprised if glamor were to hit one or several of them. A "big hammer" approach to fixing some of the issues is running with NV30_SWTNL=1, although there's plenty more issues that this won't have an effect on. That usually fixes "bad geometry" issues, however the screenshot seems to suggest that perhaps incompatible color and depth buffers are being mixed - we print a warning about those in debug builds. I'll have a look at the apitrace later to see if can try reproducing on my NV34, although it's possible that won't work this time. (In reply to Ilia Mirkin from comment #2) > Not that it's of much consolation, but the nv30 GL driver sucks. There are a > number of unhandled situations, and I wouldn't be overly surprised if glamor > were to hit one or several of them. Could be a regression, I don't remember seeing those /before/ and the downstream bugs about black icons are recent as well. And there've been some glamor changes in the xserver as well, so it oculd be /something/ in glamor has changed and doesn't play well with the driver. > A "big hammer" approach to fixing some of the issues is running with > NV30_SWTNL=1, although there's plenty more issues that this won't have an > effect on. That usually fixes "bad geometry" issues, however the screenshot > seems to suggest that perhaps incompatible color and depth buffers are being > mixed Yes, that makes me think of that as well, could explain the black icons being reported downstream. > - we print a warning about those in debug builds. Good, so I'll rebuild with "--enable-debug" and see what I get. > I'll have a look at the apitrace later to see if can try reproducing on my > NV34, although it's possible that won't work this time. Thanks a bunch! (In reply to Olivier Fourdan from comment #3) > (In reply to Ilia Mirkin from comment #2) > > Not that it's of much consolation, but the nv30 GL driver sucks. There are a > > number of unhandled situations, and I wouldn't be overly surprised if glamor > > were to hit one or several of them. > > Could be a regression, I don't remember seeing those /before/ and the > downstream bugs about black icons are recent as well. > > And there've been some glamor changes in the xserver as well, so it oculd be > /something/ in glamor has changed and doesn't play well with the driver. This may come as some surprise, but nv30 is not a hotbed of development activity. If you can figure out a change in mesa that broke it, that'd be very helpful. Separately, check dmesg - NVIDIA cards tend to yell loudly when we do something wrong. (In reply to Ilia Mirkin from comment #4) > This may come as some surprise, but nv30 is not a hotbed of development Understandable :) > activity. If you can figure out a change in mesa that broke it, that'd be > very helpful. Actually, I was thinking more about glamor here. > Separately, check dmesg - NVIDIA cards tend to yell loudly when we do > something wrong. I have tried a build with --enable-debug, but I see nothing wrong at all being reported neither in dmesg nor in journalctl. (In reply to Olivier Fourdan from comment #1) > apitrace of Xwayland available here: > > https://people.freedesktop.org/~ofourdan/xwayland-apitrace-bug-99400.bz2 How did you get this trace? I can't seem to replay it - none of the context setup stuff is there. (And it seems to like doing glFlush *a lot*...) (In reply to Ilia Mirkin from comment #6) > How did you get this trace? I can't seem to replay it - none of the context > setup stuff is there. (And it seems to like doing glFlush *a lot*...) Directly on Xwayland. apitrace trace /usr/bin/Xwayland "$@" (In reply to Olivier Fourdan from comment #7) > (In reply to Ilia Mirkin from comment #6) > > How did you get this trace? I can't seem to replay it - none of the context > > setup stuff is there. (And it seems to like doing glFlush *a lot*...) > > Directly on Xwayland. > > apitrace trace /usr/bin/Xwayland "$@" OK, well, if I can't replay the trace then it's not of much use. I don't know anything about Wayland or XWayland or how they work, unfortunately. Perhaps you might have success in tracing Xvfb + glamor or Xnest + glamor or something? Created attachment 128984 [details]
GLAMOR_DEBUG=1 weston
Humm, using GLAMOR_DEBUG, I see this error:
glamor_composite_choose_shader: Unsupported source picture format.
glamor_composite_with_shader: glamor_composite_choose_shader failed
When the issue occurs, so it's possibly that glamor is not checking all of its own requirement in glamor_init()
(In reply to Olivier Fourdan from comment #9) > glamor_composite_choose_shader: Unsupported source picture format. > glamor_composite_with_shader: glamor_composite_choose_shader failed I see the same with intel and yet rendering is correct there, so it's not that. (In reply to Olivier Fourdan from comment #10) > (In reply to Olivier Fourdan from comment #9) > > glamor_composite_choose_shader: Unsupported source picture format. > > glamor_composite_with_shader: glamor_composite_choose_shader failed > > I see the same with intel and yet rendering is correct there, so it's not > that. Unless the issue is with the fallback code in glamor, i.e.: https://cgit.freedesktop.org/xorg/xserver/tree/glamor/glamor_render.c#n1699 Calling the fallback code unconditionally (i.e. "goto fail;" at the beginning of glamor_composite() ) still exhibits the issue on nvidia/nouveau and not on intel, but then it doesn't seem to be specific to nouveau... I am confused now. I traced it down to the use of the text input field in gtk-demo's assistant. The issue seems to be with one particular call to glamor_poly_fill_rect_gl() with more than one prect (nrect > 1). https://cgit.freedesktop.org/xorg/xserver/tree/glamor/glamor_rects.c#n42 Considering that the GLSL version is < 130 with this hardware, we are in the else case: https://cgit.freedesktop.org/xorg/xserver/tree/glamor/glamor_rects.c#n83 If, in that case, with nrect > 1, we bail (to SW) then rendering is fine. But that same code gives the correct result in may other cases, with multiple rects, so it does not seem to be a bug in the glamor code though... Besides, the same code as used in Xephyr works fine. I captured the apitrace of both Xehpyr -glamor and Xwayland along with the mesa debug and glamor debug messages with: WAYLAND_DISPLAY=wayland-0 MESA_DEBUG=1,incomplete_tex,incomplete_fbo,context apitrace trace ~/local/bin/Xwayland -noreset :1 |& tee Xwayland.log & DISPLAY=:1 GDK_BACKEND=x11 gtk-demo Then same with Xephyr: DISPLAY=:0 MESA_DEBUG=1,incomplete_tex,incomplete_fbo,context apitrace trace ~/local/bin/Xephyr -glamor -noreset :1 |& tee Xephyr.log & DISPLAY=:1 GDK_BACKEND=x11 gtk-demo Mesa complains that "Texture Obj xxx incomplete because: TexImage[1] is missing" but that occurs with both Xephyr and Xwayland so I guess it should not be the problem. By logging the prect coordinates, we can match if with the apitrace dump in both cases. But again, I don't see much difference between the two (Xephyr -glamor vs. Xwayland) Logs and traces are available here: https://people.freedesktop.org/~ofourdan/bug99400/ Olivier, I'd like to help you, but I'm just not sure how -- I don't have Wayland (or any interest in setting that up), and your traces appear incomplete. I don't know why - I've never seen that before. But they don't create contexts. Compare your Xephyr trace, which has 1 glXChooseFBConfig ... 9 glXCreateContext 10 glXMakeCurrent While the Xwayland trace has none of those. Paradoxically, it starts at call 31. Perhaps you have a non-egl-enabled apitrace? Also note that there's not a single instance of UseProgram or any other shader-related items in the XWayland trace. Does the Xephyr trace exhibit the problems for you when replayed? If so, I can investigate that. However on my GK208 (or with llvmpipe), when replayed, it doesn't appear to render anything. Note that there's a shotgun-debugging approach here - nv30_vbo_validate - force it to take the vertex->need_conversion path (like we do on BE). No clue if that'll help, but it's a start. Also did you check whether NV30_SWTNL=1 fixes things? (In reply to Ilia Mirkin from comment #13) > I'd like to help you, but I'm just not sure how I know, and very much appreciate your help! I'm sorry I can't be of more help, I spent like the past 5 days or so on this and all I could come up with is pretty much summarized in comment 12, ie not much. > [...] I've never seen that before. But they don't create contexts. > Compare your Xephyr trace, which has glXChooseFBConfig, glXCreateContext, > glXMakeCurrent I think this is normal, this is Wayland not X so it doesn't need/use GLX. Xwayland is an X server for the X11 clients and a Wayland client to the Wayland compositor, so that X11 application can still work on Wayland. > While the Xwayland trace has none of those. Paradoxically, it starts at call > 31. Perhaps you have a non-egl-enabled apitrace? Also note that there's not > a single instance of UseProgram or any other shader-related items in the > XWayland trace. Yes, that's puzzling... I rebuilt apitrace current from github (instead of using the one from Fedora) and it's the same. > Does the Xephyr trace exhibit the problems for you when replayed? If so, I > can investigate that. However on my GK208 (or with llvmpipe), when replayed, > it doesn't appear to render anything. Unfortunately I could never make apitrace replay anything here either. (In reply to Ilia Mirkin from comment #14) > nv30_vbo_validate - force it to take the vertex->need_conversion path (like > we do on BE). No clue if that'll help, but it's a start. > > Also did you check whether NV30_SWTNL=1 fixes things? Neither forcing conversion in nv30_vbo_validate() nor using NV30_SWTNL=1 change anything. (In reply to Ilia Mirkin from comment #13) > I'd like to help you, but I'm just not sure how -- I don't have Wayland (or > any interest in setting that up), and your traces appear incomplete. I don't > know why - I've never seen that before. But they don't create contexts. > [...] > While the Xwayland trace has none of those. Paradoxically, it starts at call > 31. Perhaps you have a non-egl-enabled apitrace? Also note that there's not > a single instance of UseProgram or any other shader-related items in the > XWayland trace. Oh right... Blimey! You got me thinking here, you're right, those Xwayland traces are just wrong, it's libepoxy not playing nice with "apitrace --api-egl" (https://github.com/anholt/libepoxy/issues/68) so I picked up the forked version libexpoy, rebuilt apitrace and Xwayland and was able to capture a much better trace! Updated traces and logs here: https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.log https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.trace With this, I see interesting errors like: Mesa: User error: GL_INVALID_VALUE in glCopyTexSubImage2D(xoffset+width) (EE) glamor0: GL error: GL_INVALID_VALUE in glCopyTexSubImage2D(xoffset+width) Curious. It should perhaps be noted that there are no instances of glCopyTexSubImage2D in your trace. That means they happen some other way... It should be noted that the warning in question can only come out as if (yoffset + subHeight > (GLint) destImage->Height) { _mesa_error(ctx, GL_INVALID_VALUE, "%s(yoffset+height)", func); return GL_TRUE; } Anyways, looking at your recent trace, while it contains various useful items, I can't get any of the framebuffers to show particularly useful data when replaying to specific draws on an intel system. Can you find a draw that should have a useful image? OH! DUH! apitrace: warning: glMapBufferRange: MAP_COHERENT_BIT|MAP_WRITE_BIT unsupported <https://git.io/vV9kM> Right, so you need to run this with MESA_EXTENSION_OVERRIDE=-GL_ARB_buffer_storage. Or find some way to force glamor to disable it. apitrace doesn't like it. I'm sure that's why the textures are garbage. (In reply to Ilia Mirkin from comment #19) > Right, so you need to run this with > MESA_EXTENSION_OVERRIDE=-GL_ARB_buffer_storage. Or find some way to force > glamor to disable it. apitrace doesn't like it. I'm sure that's why the > textures are garbage. Haha! yes! I can see the content now, on intel. I cannot load the file on nouveau where it was captured, it says | 2 @0 eglGetPlatformDisplayEXT(platform = EGL_PLATFORM_GBM_KHR, native_display = 0x26926b0, attrib_list = {}) = 0x26d8250 | 2: warning: unsupported eglGetPlatformDisplayEXT call error: failed to create OpenGL 3.1 context. Same links, updated files: https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.log https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.trace For example frame 8932 (toward the end) is interesting imho, because it shows the icon looks wrong (which is what led me to this issue in the first place). But the rest of the window looks fine on intel, which is not what I see on nouveau with the G71, it looks all garbled there (as in attachment 128933 [details]) (In reply to Olivier Fourdan from comment #20) > > I cannot load the file on nouveau where it was captured, it says > > | 2 @0 eglGetPlatformDisplayEXT(platform = EGL_PLATFORM_GBM_KHR, > native_display = 0x26926b0, attrib_list = {}) = 0x26d8250 > | 2: warning: unsupported eglGetPlatformDisplayEXT call > error: failed to create OpenGL 3.1 context. Quick follow up on this, this is because of the way glamor works, it first creates an EGL context with some config attributes (which include the OpenGL version 3.1) and if it fails, it retries with no config attributes: https://cgit.freedesktop.org/xorg/xserver/tree/glamor/glamor_egl.c#n819 or for Xwayland: https://cgit.freedesktop.org/xorg/xserver/tree/hw/xwayland/xwayland-glamor.c#n310 So back to apitrace, as the config attributes specifies OpenGL 3.1, it fails and apitrace just stops there. So I changed Xwayland to not do the first call (which on this HW returns NULL anyway) and now apitrace is a happy buddy, I can replay the trace on nouveau as well. Same links, updated files: https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.log https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.trace > For example frame 8932 (toward the end) is interesting imho, [...] Let's forget about this, now that the new trace can replay on both intel and nouveau. The problem I see is rather that I see no problem in apitrace, the frames look fine, AFAICT. And yet the rendering was wrong on screen when I captured the trace... (In reply to Olivier Fourdan from comment #21) > Same links, updated files: > > https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.log > https://people.freedesktop.org/~ofourdan/bug99400/Xwayland.trace > > > For example frame 8932 (toward the end) is interesting imho, [...] > > Let's forget about this, now that the new trace can replay on both intel and > nouveau. > > The problem I see is rather that I see no problem in apitrace, the frames > look fine, AFAICT. Hmmm... this is very unfortunate. And furthermore unfortunate is that this trace makes use of NPOT textures, which my NV34 doesn't support (at least not without some pretty nasty workarounds). If replaying the trace yields the desired results, then this is all for naught. You can try running apitrace dump-images --calls='*/draw' Xwayland.trace and see if any of the screenshots generated show the artifacts. If not, then this is a much more insidious problem. OK, I now have a NV4A (NV44A). Running apitrace dump-images and comparing them to the gk208 output yielded identical results. If there's some way that doesn't involve me installing a *ton* of software, I'd be happy to look at this further. Perhaps you can come up with a set of repro steps that I can use. Please note that I've never used Wayland or anything related. One possibility would be to run a live image of Fedora 25 (which comes with Wayland by default), at least it would allow for a quick test (no need for you to install everything if it's not reproducible on your hardware setup...) However, gtk-demo (which is the simple way to reproduce) comes with the devel packages, which won't be on a live image... But this should be installable, even on a live image, as long as a network connection is available. Daily/nightly composes can be found here: Fedora 26 https://dl.fedoraproject.org/pub/fedora/linux/development/26/Spins/x86_64/iso/ https://dl.fedoraproject.org/pub/fedora/linux/development/26/Workstation/x86_64/iso/ Rawhide - Fedora 27 https://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/Spins/x86_64/iso/ https://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/Workstation/x86_64/iso/ ........................................................... How to write LiveDVD ISO image to portable USB flash drive: e.g. # dd if=Fedora*Live*.iso of=/dev/disk/by-id/usb-<ID_SERIAL> \ bs=1M iflag=direct oflag=direct conv=fdatasync status=progress If /dev/sdz represents portable USB flash drive $ udevadm info -q env -n sdz | grep ID_SERIAL= Using ID_SERIAL known is the exact device that will be used WARNING: using 'dd' will erase all partitions and -data- on portable USB flash drive Ref. https://fedoraproject.org/wiki/How_to_create_and_use_Live_USB .......................................... How to install demonstrative GTK+ widgets: # dnf install gtk2-devel gtk3-devel (In reply to Olivier Fourdan from comment #24) > One possibility would be to run a live image of Fedora 25 (which comes with > Wayland by default), at least it would allow for a quick test (no need for > you to install everything if it's not reproducible on your hardware setup...) > > However, gtk-demo (which is the simple way to reproduce) comes with the > devel packages, which won't be on a live image... But this should be > installable, even on a live image, as long as a network connection is > available. I was hoping to avoid rebooting :) But like... would weston + gtk2-demo be sufficient? Or do I need something else, like, say, GNOME or KDE or something? (If the latter, that's pretty much a non-starter.) (In reply to Ilia Mirkin from comment #26) > I was hoping to avoid rebooting :) But like... would weston + gtk2-demo be > sufficient? Or do I need something else, like, say, GNOME or KDE or > something? (If the latter, that's pretty much a non-starter.) Sure, no need for GNOME or KDE, weston + Xwayland + gtk+-2.24.x should be sufficient. OK, this weston stuff is a no-go. No way to run it against a specific dri card without intensive udev work (which I have no interest in learning), and apparently won't even let me start without some env vars (XDG_RUNTIME_DIR, perhaps others) after I tried starting it under X. So... I can't repro with the given instructions (because I can't follow the instructions successfully). Perhaps there are other instructions that reproduce the issue? Does Xephyr -glamor reproduce the issue? Something else that I can run? (In reply to Ilia Mirkin from comment #28) > OK, this weston stuff is a no-go. No way to run it against a specific dri > card without intensive udev work (which I have no interest in learning), and > apparently won't even let me start without some env vars (XDG_RUNTIME_DIR, > perhaps others) after I tried starting it under X. You need Xwayland, and therefore a Wayland compositor to connect to. But Xwayland can run standalone (within a Wayland compositor) so once/if you can get a Wayland compositor to run, you could run Xwayland from there and (hopefully) reproduce, e.g.: Xwayland :30 & DISPLAY=:30 gtk-demo But the requirement is to have a Wayland compositor so that Xwayland can run. Weston is not the only Wayland compositor, apart from GNOME, there are other smaller Wayland compositors such as sway (http://swaywm.org/) which can run nested as well (so you can test from an existing X11 session), but I haven't played much with those. > So... I can't repro with the given instructions (because I can't follow the > instructions successfully). Perhaps there are other instructions that > reproduce the issue? Does Xephyr -glamor reproduce the issue? Something else > that I can run? Unfortunately no, "Xephyr -glamor" does not reproduce the issue, which makes things even more confusing, admittedly. One more thing, maybe, Xephyr opens /dev/dri/card0 whereas Xwayland uses EGL with render node /dev/dri/renderD128 -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1124. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.