Bug 95266 - Geometry missing from rendering, only when using gallivm.
Summary: Geometry missing from rendering, only when using gallivm.
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/llvmpipe (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-04 17:54 UTC by David Lonie
Modified: 2019-09-18 18:32 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Reproduction of issue (7.22 MB, application/bzip2)
2016-05-04 17:54 UTC, David Lonie
Details
TestTranslucentLUTDepthPeeling.trace (91.97 KB, application/octet-stream)
2016-05-04 20:53 UTC, Jose Fonseca
Details

Description David Lonie 2016-05-04 17:54:11 UTC
Created attachment 123465 [details]
Reproduction of issue

VTK's new depth peeling implementation has been tested on numerous platforms (mac, windows, linux, iOS/android ES3) and various OpenGL implementations (nVidia, ATI, Intel, and Mesa), but only fails when using mesa.

This issue was first described in another bug report: 

https://bugs.freedesktop.org/show_bug.cgi?id=94955

but the issue has moved away from the original topic of that report. From the discussion there, it appears to be some issue in the floating point textures used in the new peeling code, as the samplers are returning NaNs (If I understand the discussion correctly). Indeed, the old peeling implementation (which works on Mesa) only used 32-bit RGBA textures.

I've provided an executable from VTK that reproduces the issue, a compiled mesa libGL.so, and sample outputs using nVidia and Mesa OpenGL implementations, along with details on how VTK and Mesa were configured.

The README included in the tarball is pasted below for further details.

---------------------------------

This directory contains a reproduction of a Mesa OpenGL bug that
affects VTK's new depth peeling code. It contains the following files:

libGL.so - A compiled mesa OpenGL implementation, built against 38fcf7cb.
run-mesa.sh - A script to run the test executable with the included mesa library.
run-sysGL.sh - A script to run the test executable using the system OpenGL library.
TestTranslucentLUTDepthPeeling-valid.png - Example valid output for test.
TestTranslucentLUTDepthPeeling-mesa.png - Incorrect output for test from mesa.
vtkRenderingCoreCxxTests -- The test executable.

# Reproducing the bug

Execute run-sysGL.sh. This will run the TestTranslucentLUTDepthPeeling
test in the vtkRenderingCoreCxxTests executable using the libGL.so
found in LD_LIBRARY_PATH. This code has been successfully tested on
windows, mac, iOS/android GL ES3, and linux, using nVidia, intel, and
ATI OpenGL implementations. The output should resemble the included
'valid' png.

Execute run-mesa.sh. This runs the test using the included libGL.so
(change the LD_PRELOAD value in the script to test a different
library). The output will resemble the included 'mesa' png -- just the
background, with no visible geometry.

# Notes: Building Mesa

The included mesa libGL.so was compiled with the options:

./autogen.sh \
    --enable-debug \
    --prefix=/ssd/src/mesa-master-install-nomangle \
    --disable-dri \
    --disable-egl \
    --disable-gles1 \
    --disable-gles2 \
    --disable-shared-glapi \
    --enable-xlib-glx \
    --enable-gallium-osmesa \
    --with-gallium-drivers=swrast \
    --enable-gallium-llvm=yes \
    LLVM_CONFIG=/path/to/llvm-3.8.0/llvm-config \
    --enable-llvm-shared-libs

with debugging symbols stripped later to reduce file size.

# Notes: Building VTK

The VTK test can be built by downloading the VTK source code from our
git repo:

https://gitlab.kitware.com/vtk/vtk.git

I built against HEAD acd85fe48d.

There is currently a workaround that falls back to an older depth
peeling implementation for Mesa 11.3.0(-devel). To reproduce the bug,
edit Rendering/OpenGL2/vtkOpenGLRenderer.cxx and remove the version check in
vtkOpenGLRenderer::DeviceRenderTranslucentPolygonalGeometry, around line 307:

      std::string glVersion =
          reinterpret_cast<const char *>(glGetString(GL_VERSION));
      if (glVersion.find("Mesa 11.3.0") != std::string::npos)
        {
        vtkDebugMacro("Disabling dual depth peeling -- mesa bug detected. "
                      "GL_VERSION = " << glVersion);
        dualDepthPeelingSupported = false;
        }                                                                           }
                                                                            
Configure using cmake:

mkdir vtk-build
cd vtk-build
cmake path/to/vtk/sourcecode/ \
      -DVTK_RENDERING_BACKEND=OpenGL2 \
      -DOPENGL_INCLUDE_DIR=/path/to/installed/mesa/include
      -DOPENGL_gl_LIBRARY=/path/to/installed/mesa/lib/libGL.so
      -DOPENGL_glu_LIBRARY=""

Run 'make' to build, and run the depth peeling tests:

cd vtk-build
ctest -R DepthPeel

(use "ctest -V -R 'SomeTestName'" to see the actual executables/arguments/output)
Comment 1 David Lonie 2016-05-04 17:58:06 UTC
The "Reproduction of issue" attachment is a tar.bz2 file, btw. When I try to download it it just dumps the binary as character data to my browser...Let me know if I should re-upload it and do something differently.
Comment 2 Ilia Mirkin 2016-05-04 18:02:06 UTC
You might consider including an apitrace that reproduces the issue. People are probably going to be less inclined to run some random binary off the interwebs.

Also, is this only a bug with llvmpipe (and/or gallivm), or have you tested this with various hw drivers in mesa? (Sorry, your wrote a lot of text and I only skimmed it.)
Comment 3 David Lonie 2016-05-04 18:09:49 UTC
I included an apitrace on the last bug, and was told that it was not suitable for reproducing bugs (something about apitrace forcing a 4.5 context IIRC). I can make one for this issue if anyone needs it.

I share concerns about running random binaries, and have included instructions for building VTK from official sources to reproduce the issue for the more wary developers among us :)

I've only tested using the mesa configuration options provided in the report. If there are other configurations you'd like me to test, I'd be happy to do so. Just let me know the exact options to specify -- mesa's configure options are a bit of a black box to me.
Comment 4 Ilia Mirkin 2016-05-04 18:14:30 UTC
(In reply to David Lonie from comment #3)
> I included an apitrace on the last bug, and was told that it was not
> suitable for reproducing bugs (something about apitrace forcing a 4.5
> context IIRC). I can make one for this issue if anyone needs it.

Looking at it...

2 @0 glXCreateContextAttribsARB(dpy = 0x6c4f20, config = 0x72c1c0, share_context = NULL, direct = True, attrib_list = [GLX_CONTEXT_MAJOR_VERSION_ARB, 4, GLX_CONTEXT_MINOR_VERSION_ARB, 5, 0]) = 0x74cf10

don't do that? :) Mesa does not support GL 4.5, was that trace captured on some proprietary driver that does? Can you change VTK (even temporarily) to only request GL 3.0? Or alternatively flip to requesting a core context (which that trace isn't doing) and request GL 3.2?
Comment 5 Ilia Mirkin 2016-05-04 18:16:13 UTC
FWIW I was able to run your trace from the other bug with

MESA_GL_VERSION_OVERRIDE=4.5 MESA_GLSL_VERSION_OVERRIDE=450

so I suspect that a trace of this issue would be nice to have as well.
Comment 6 Jose Fonseca 2016-05-04 18:45:43 UTC
(In reply to David Lonie from comment #3)
> I included an apitrace on the last bug, and was told that it was not
> suitable for reproducing bugs (something about apitrace forcing a 4.5
> context IIRC). I can make one for this issue if anyone needs it.

I agree a trace would be useful.

If the issue repros with Mesa + llvmpipe, then just record the trace with Mesa + llvmpipe.

If you don't record the trace with the exact same OpenGL implementation, then all sort of things can go wrong.
Comment 7 David Lonie 2016-05-04 19:17:59 UTC
I'm trying to get a trace, but apitrace is failing an assertion:

$ apitrace trace --api gl bin/vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling"                                                                                                         

vtkRenderingCoreCxxTests: /build/apitrace/src/apitrace-7.1/build/wrappers/glxtrace.cpp:97195: void (* _wrapProcAddress(const GLubyte*, __GLXextFuncPtr))(): Assertion `procPtr != (__GLXextFuncPtr)&glXCreateContextAttribsARB' failed.

This is only happening on Mesa (I get a successful trace using my system nVidia drivers), and it also happens when following the "Trace manually" instructions here:

https://github.com/apitrace/apitrace/blob/master/docs/USAGE.markdown#tracing-manually

Is this issue familiar to anyone? I'll keep trying to get this working, but would appreciate any tips if this is a known issue :)
Comment 8 Jose Fonseca 2016-05-04 20:12:23 UTC
(In reply to David Lonie from comment #7)
> Is this issue familiar to anyone? I'll keep trying to get this working, but
> would appreciate any tips if this is a known issue :)

It looks like apitrace is about to enter an infinite loop: the address of the supposedly "real" glXCreateContextAttribsARB function is actually pointing to the fake glXCreateContextAttribsARB wrapper function.

I'll take a look.  Thanks for trying.


It looks like doing `MESA_GL_VERSION_OVERRIDE=4.5 glretrace -S frame vtkRenderingOpenGL2CxxTests.4075.trim.trace` works.
Comment 9 Roland Scheidegger 2016-05-04 20:26:46 UTC
FWIW it was not the samplers returning NaN for the float texture, but rather the _coords_ being NaNs (thus the results from sampling could be anything, though it is possible some implementations would use 0.0 as coord instead). But those coords came from the vs, which seemed simple enough. So possibly the NaNs might have been the result of a previous render pass. (It is of course also possible some shader is simply bogus and relying on NaNs getting flushed to zero somewhere on output.)
It would indeed be helpful to know if hw drivers are affected too.
(Using vtk sources to reproduce is pretty easy.)
Comment 10 Jose Fonseca 2016-05-04 20:50:44 UTC
(In reply to Jose Fonseca from comment #8)
> It looks like apitrace is about to enter an infinite loop: the address of
> the supposedly "real" glXCreateContextAttribsARB function is actually
> pointing to the fake glXCreateContextAttribsARB wrapper function.
> 
> I'll take a look.  Thanks for trying.
> 
> 
> It looks like doing `MESA_GL_VERSION_OVERRIDE=4.5 glretrace -S frame
> vtkRenderingOpenGL2CxxTests.4075.trim.trace` works.

I suspect problem is that you're loading Mesa libGL.so via LD_PRELOAD. From attached mesa_bug.tar.bz2:

  $ cat run-mesa.sh 
  #!/bin/bash
  
  echo "Running test against mesa libGL."
  LD_PRELOAD=libGL.so ./vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling" -I

This will defintely interfere with apitrace since apitrace needs to use LD_PRELOAD to inject.

The proper way of using a custom libGL.so is to simply set LD_LIBRARY_PATH to the directory that has it:

  ln -sf libGL.so libGL.so,1
  LD_LIBRARY_PATH=$PWD ./vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling" -I

and all should work fine.
Comment 11 Jose Fonseca 2016-05-04 20:53:21 UTC
Created attachment 123472 [details]
TestTranslucentLUTDepthPeeling.trace

Trace.
Comment 12 David Lonie 2016-05-04 21:13:35 UTC
(In reply to Jose Fonseca from comment #10)
> The proper way of using a custom libGL.so is to simply set LD_LIBRARY_PATH
> to the directory that has it:
> 
>   ln -sf libGL.so libGL.so,1
>   LD_LIBRARY_PATH=$PWD ./vtkRenderingCoreCxxTests
> "TestTranslucentLUTDepthPeeling" -I

Just tried that here, no such luck: 

$ ln -s libGL.so libGL.so.1
$ LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH apitrace trace --api gl ./vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling"                                                                                       
apitrace: loaded into /usr/bin/apitrace                                                                                                
apitrace: unloaded from /usr/bin/apitrace                                                                                              
apitrace: loaded into /ssd/src/VTK/build-tmp/mesa_bug/vtkRenderingCoreCxxTests                                                         
apitrace: tracing to /ssd/src/VTK/build-tmp/mesa_bug/vtkRenderingCoreCxxTests.18.trace                                                 
vtkRenderingCoreCxxTests: /build/apitrace/src/apitrace-7.1/build/wrappers/glxtrace.cpp:97195: void (* _wrapProcAddress(const GLubyte*, __GLXextFuncPtr))(): Assertion `procPtr != (__GLXextFuncPtr)&glXCreateContextAttribsARB' failed.

Looks like you did get a trace from it, though, so I wonder what's different. Thanks for uploading that, hopefully that will make it easier to track down!
Comment 13 Ilia Mirkin 2016-05-04 21:18:38 UTC
FWIW this is what it looks like on i965: http://i.imgur.com/LMmcGkM.png

I'm guessing that's correct, but the "correct" screenshot is inside an archive and I'm lazy. It is indeed totally messed up with llvmpipe.
Comment 14 David Lonie 2016-05-04 21:24:15 UTC
(In reply to Ilia Mirkin from comment #13)
> FWIW this is what it looks like on i965: http://i.imgur.com/LMmcGkM.png
> 
> I'm guessing that's correct, but the "correct" screenshot is inside an
> archive and I'm lazy. It is indeed totally messed up with llvmpipe.

Yep, that's what it should look like.
Comment 15 Emil Velikov 2016-05-04 21:45:04 UTC
Just pointing out something that may not be that obvious - David is using the xlib powered gallium libGL. That one uses the (iirc) unsupported split shared LLVM libraries.
Comment 16 Jose Fonseca 2016-05-04 22:08:14 UTC
(In reply to David Lonie from comment #12)
> Just tried that here, no such luck: 
> 
> $ ln -s libGL.so libGL.so.1
> $ LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH apitrace trace --api gl
> ./vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling"                 
> 
> apitrace: loaded into /usr/bin/apitrace                                     

apitrace is being injected into apitrace...

This can only happen if you have stuff you shouldn't have on LD_PRELOAD and LD_LIBRARY_PATH variables...  :-/

Please start from a clean terminal.  Don't do anything other than I described.  In particular don't set LD_PRELOAD, and only set LD_LIBRARY_PATH=$PWD.

Honestly it's better not to fiddle with LD_PRELOAD directly yourself at all.  It just makes everything unnecesarily harder.  Just let the `apitrace trace` command do it.


I repeat: 

- the right way of loading libGL.so is to set LD_LIBRARY_PATH=/path/to/where/libGL.so.1/lives/

- the right way of tracing is to `apitrace trace myapp`

- and the right way of doing both is to

  export LD_LIBRARY_PATH=/path/to/where/libGL.so.1/lives/
  apitrace trace myapp


(Also note, the right name for libGL is not libGL.so as you had in the tarball.  It needs to be libGL.so.1.  I suspect that this mismatch is what lead you to resort to LD_PRELOAD to start with.  But if the name is libGL.so.1 that should never be necessary.)
Comment 17 David Lonie 2016-05-05 13:48:31 UTC
Thanks for the info Jose, I'll make a note of that for the future.
Comment 18 Ilia Mirkin 2016-05-06 03:19:26 UTC
And to further confirm, trace replays fine on both a GF108 and GT215 with nouveau's nvc0 and nv50 drivers. So this is a gallivm issue, not mesa. Adjusting subject.
Comment 19 GitLab Migration User 2019-09-18 18:32:36 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/240.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.