Bug 36282

Summary: 34a5d3b9f4740601708c82093e2114356d749e65: glxgears segfaults when compiled with shared glapi
Product: Mesa Reporter: Andrew Randrianasulu <randrik>
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: alexandre.f.demers, ianmllgn, krzysztof.krakowiak, marvin24, ojab, pedretti.fabio, vlee
Version: git   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: config.log from Mesa's source tree
Backtrace on r600g (64 bit OS)
backtrace with r300g
screenshot of glxgears with MESA_DEBUG=all

Description Andrew Randrianasulu 2011-04-15 20:04:37 UTC
Using mesa from git (commit 41b38bd21c1031e65799c888a97d8a0c14ea2aaa - " translate: s/varient/variant/") i can see following backtrace (in gdb)

(gdb) bt full
#0  0xb7980f17 in loopback_VertexAttribs2fvNV (index=2, n=138117492, 
    v=0x83b8154) at main/api_loopback.c:1204
        i = 138117491
#1  0xb79a634f in execute_list (ctx=0x8068480, list=<value optimized out>)
    at main/dlist.c:8161
        opcode = <value optimized out>
        dlist = <value optimized out>
        n = 0x83b816c
        done = <value optimized out>
#2  0xb79a9b39 in _mesa_CallList (list=1243057392) at main/dlist.c:8511
        save_compile_flag = 16 '\020'
#3  0x0804946e in ?? ()
No symbol table info available.
#4  0x0804a762 in ?? ()
No symbol table info available.
#5  0xb7cd5bd6 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#6  0x08049181 in ?? ()
No symbol table info available.

Bug also present with nouveau_vieux_dri.so DRI driver (but mplayer -vo gl works, so GL is not busted completely)

Mesa up to c6e33ca285f9eba26cae2fdd74fb5cc694f1e74b ("Disable direct rendering on Cygwin") was fine.

My be this bug actually due to old gcc (gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3)/binutils GNU assembler (GNU Binutils for Ubuntu) 2.20.1-system.20100303

Or i just compiled mesa wrongly (i'll attach config.log in next message)
Comment 1 Andrew Randrianasulu 2011-04-15 20:09:40 UTC
Created attachment 45688 [details]
config.log from Mesa's source tree

I used ./autogen.sh first, then make distclean few  times. This machine a bit slow for git bisect (celeron-400), but i can do it if necessary.

libdrm was compiled manually (commit be8802a9414e85ba07ae257fccadd245fcf7c7b6 in libdrm tree), then mesa from git master and X server from 1.9 branch.

Mesa's config line was:
make distclean && CPPFLAGS=-I/usr/include/nouveau ./configure --prefix=/usr --with-dri-drivers=nouveau,swrast --enable-shared-dricore --disable-egl --disable-gallium --enable-texture-float
Comment 2 Alexandre Demers 2011-04-16 07:45:45 UTC
I'll help you on that one. I also have the same problem here on r600g. At first, I thought it was something with Natty (I just upgraded to beta 2). But it's very similar. I also have tested with swrast and it also crashed. I'll be bisecting on my side and see if I find the same thing as you.
Comment 3 Alexandre Demers 2011-04-16 09:43:26 UTC
Bisecting gave me the following:

34a5d3b9f4740601708c82093e2114356d749e65 is the first bad commit
commit 34a5d3b9f4740601708c82093e2114356d749e65
Author: Brian Paul <brianp@vmware.com>
Date:   Sun Apr 10 12:48:28 2011 -0600

    mesa: plug in new functions for GL_ARB_sampler_objects
    
    Build the new sources, plug the new functions into the dispatch table,
    implement display list support.  And enable extension in the gallium
    state tracker.

:040000 040000 50d8747e0dc22e1634407703563447bd0e1c44d3 be44081fa9ea33563f8582df370cfb13a0d5c256 M      src
Comment 4 Alexandre Demers 2011-04-16 09:53:28 UTC
I forgot to mention I'm running kernel 2.6.39-rc3 and libdrm 2.6.25 (latest) on a 64 bit OS.
Comment 5 Brian Paul 2011-04-17 11:09:42 UTC
Can any other r600 users say if they're running into this?  There's nothing r600-specific about the commit in question and I don't recall other DRI users reporting this issue.  Thanks.
Comment 6 Alexandre Demers 2011-04-17 11:48:40 UTC
(In reply to comment #5)
> Can any other r600 users say if they're running into this?  There's nothing
> r600-specific about the commit in question and I don't recall other DRI users
> reporting this issue.  Thanks.

It is not r600 specific since Andrew as the same problem, but his system uses an NVidia card and he said in the original title the crash occured in swrast. Maybe he can confirm it was with the swrast driver and he may test with Nouveau also.

We have a very similar backtrace (I can add mine if needed).
Comment 7 Alexandre Demers 2011-04-17 12:09:28 UTC
Created attachment 45739 [details]
Backtrace on r600g (64 bit OS)
Comment 8 Alexandre Demers 2011-04-17 12:11:59 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Can any other r600 users say if they're running into this?  There's nothing
> > r600-specific about the commit in question and I don't recall other DRI users
> > reporting this issue.  Thanks.
> 
> It is not r600 specific since Andrew as the same problem, but his system uses
> an NVidia card and he said in the original title the crash occured in swrast.
> Maybe he can confirm it was with the swrast driver and he may test with Nouveau
> also.
> 
> We have a very similar backtrace (I can add mine if needed).

I should have re-read comment #1: nouveau-vieux also ha the problem...
Comment 9 almos 2011-04-17 12:38:17 UTC
Created attachment 45740 [details]
backtrace with r300g

I don't know if it helps, but I attached a backtrace created with r300g at d2afae33f896ece1af0c8953ac9ce141c39f6dd2

A bit different from the previous ones.
Comment 10 Benjamin Bellec 2011-04-17 14:44:22 UTC
I'm running on F15 x86-PAE (libdrm 2.4.25) with the 2.6.39-rc3.fc16's kernel + mesa git (r600g - RV770), and I haven't this kind of problem.
glxgears is running fine.
Comment 11 Marc Dietrich 2011-04-19 01:40:14 UTC
only happens here when shared-glapi is enabled (r600g)
Comment 12 Alexandre Demers 2011-04-19 07:36:02 UTC
(In reply to comment #11)
> only happens here when shared-glapi is enabled (r600g)

Confirmed also here that the bug only appears when Mesa is compiled with --enable-shared-glapi.
Comment 13 Marek Olšák 2011-04-20 08:43:01 UTC
*** Bug 36403 has been marked as a duplicate of this bug. ***
Comment 14 Marek Olšák 2011-04-20 08:43:10 UTC
*** Bug 36318 has been marked as a duplicate of this bug. ***
Comment 15 Marek Olšák 2011-04-20 08:59:14 UTC
I ran into this problem when upgrading my distribution and fixed it easily.

Make sure glxgears picks up the correct libGL.so. You may have multiple such files in you system at various locations, like:

/usr/lib/libGL.so
/usr/lib/mesa/libGL.so
/usr/local/...
etc.

I had libGL from git in /usr/lib, but my system always picks up the one in /usr/lib/mesa, which contained the libGL provided by my distribution. I made a symlink from /usr/lib/mesa/libGL* to my libGL from git and that fixed it.

Do: ldd /usr/bin/glxgears

and see which libGL.so file is used. If it's not the one you compiled and installed, overwrite it by the correct one.

Closing as NOT A BUG.
Comment 16 almos 2011-04-20 11:21:55 UTC
(In reply to comment #15)
> 
> Make sure glxgears picks up the correct libGL.so. You may have multiple such
> files in you system at various locations, like:
> ...
> I had libGL from git in /usr/lib, but my system always picks up the one in
> /usr/lib/mesa, which contained the libGL provided by my distribution. I made a
> symlink from /usr/lib/mesa/libGL* to my libGL from git and that fixed it.
> ...
> and see which libGL.so file is used. If it's not the one you compiled and
> installed, overwrite it by the correct one.

I don't get it. I have mesa 7.10.2 installed in /usr/lib, and git version in /home/almos/sources/mesa/. I do this:
$ export LIBGL_DRIVERS_PATH=/home/almos/sources/mesa/lib/gallium
and glxinfo reports:
OpenGL vendor string: X.Org R300 Project
OpenGL renderer string: Gallium 0.4 on ATI RV350
OpenGL version string: 2.1 Mesa 7.11-devel (git-d2afae3)
...

Checking some executables:
$ ldd `which glxgears`
	libGL.so.1 => /usr/lib/libGL.so.1 (0xb769f000)
        ...
$ ldd `which glxinfo`
	libGL.so.1 => /usr/lib/libGL.so.1 (0xb774c000)
        ...
$ ldd `which foobillard`
	libGL.so.1 => /usr/lib/libGL.so.1 (0xb76f6000)
        ...
$ ldd ./googleearth-bin
        libGL.so.1 => /usr/lib/libGL.so.1 (0xb574e000)
        ...

But only glxgears segfaults, and nothing else. Am I missing something?
Comment 17 Marc Dietrich 2011-04-20 12:22:42 UTC
I don't have these linker tricks at all here (opensuse), so I don't think this is the reason.
Comment 18 Alexandre Demers 2011-04-20 12:43:24 UTC
ldd points to the good lib destination here. I tested gltron and it doesn't crash, RendererFeatTest.bin64 also works fine, glxinfo is reporting the good compiled version. glxgears is the only application crashing right now when mesa is compiled with --enable-shared-api (when mesa is compiled without that parameter, glxgears doesn't crash). So I'm reopening this bug since it's not fixed.

By the way, may I suggest not closing a bug before people reporting it confirm the fix or solution is working for them?
Comment 19 Andrew Randrianasulu 2011-04-20 13:17:49 UTC
(In reply to comment #15)
> I ran into this problem when upgrading my distribution and fixed it easily.
> 
> Make sure glxgears picks up the correct libGL.so. You may have multiple such
> files in you system at various locations, like:
> 
> /usr/lib/libGL.so
> /usr/lib/mesa/libGL.so
> /usr/local/...
> etc.
> 
> I had libGL from git in /usr/lib, but my system always picks up the one in
> /usr/lib/mesa, which contained the libGL provided by my distribution. I made a
> symlink from /usr/lib/mesa/libGL* to my libGL from git and that fixed it.
> 
> Do: ldd /usr/bin/glxgears
> 
> and see which libGL.so file is used. If it's not the one you compiled and
> installed, overwrite it by the correct one.
> 
> Closing as NOT A BUG.

I can confirm my glxgears spinning again, but people below this comment still have problems....

Sorry for long silince, i've tested few onfigurations without shared dri, with LD_LIBRARY_PATH trick, tried some bisect and make clean instead of make distclean ... none worked, but deleting wrong libGL (and symlinking right one) fixed things here.
Comment 20 Andrew Randrianasulu 2011-04-20 13:26:00 UTC
Ops. silence - not "silince", and configurations, not "onfigurations".

I saw one strange thing - after my buld was interrupted and restarted - there was link error in mesa/src/glsl - but it was ignored by build system. Deleting corrupted *.o file fixed things, without make clean. Not sure if mesa's build system should detect errors from linking shared objects, or rely on "always run make clean before rebuilding" rule?
Comment 21 Marek Olšák 2011-04-21 00:04:46 UTC
You may also try 'git clean -fdx' instead of 'make clean'. It deletes the all files and directories not added in git, as if you made a clean clone. Nevertheless, the crash in loopback_* is caused by wrong libGL.so being installed.

> By the way, may I suggest not closing a bug before people reporting it confirm
> the fix or solution is working for them?

Anyone can always re-open it anyway.

Also try this:

LIBGL_DEBUG=verbose glxinfo

It prints which driver is loaded.
Comment 22 Alexandre Demers 2011-04-21 00:19:54 UTC
(In reply to comment #21)
> You may also try 'git clean -fdx' instead of 'make clean'. It deletes the all
> files and directories not added in git, as if you made a clean clone.
> Nevertheless, the crash in loopback_* is caused by wrong libGL.so being
> installed.
> 
> > By the way, may I suggest not closing a bug before people reporting it confirm
> > the fix or solution is working for them?
> 
> Anyone can always re-open it anyway.
> 
> Also try this:
> 
> LIBGL_DEBUG=verbose glxinfo
> 
> It prints which driver is loaded.

I'm using make realclean when compiling and I always hard reset git when fetching. Is it enough or does "make clean -fdx" better?

Nevertheless, LIBGL_DEBUG=verbose glxinfo gives me:
name of display: :0
libGL: OpenDriver: trying /usr/lib/tls/r600_dri.so
libGL: OpenDriver: trying /usr/lib/r600_dri.so
libGL error: dlopen /usr/lib/r600_dri.so failed (/usr/lib/r600_dri.so: cannot open shared object file: No such file or directory)
libGL: OpenDriver: trying /usr/lib/dri/tls/r600_dri.so
libGL: OpenDriver: trying /usr/lib/dri/r600_dri.so

ls -la  /usr/lib/dri gives:
lrwxrwxrwx 1 root root 30 2011-04-21 02:15 /usr/lib/dri -> /usr/lib/x86_64-linux-gnu/dri/

I deleted all libGL and dri files and folders (which scrapped my setup at first) befores rebuilding and reinstalling. I made sure it's really using the good files. Otherwise, how could we explain that building mesa with --enable-shared-api shows the problem, while disabling it doesn't? It means at least that libGL.so is being replaced. Then, there is libglapi.so left... I had a look at libglapi.so and here again, everything is pointing to the good file:
ldd /usr/bin/glxinfo
        linux-vdso.so.1 =>  (0x00007fff4e7ff000)
        libGL.so.1 => /usr/lib/x86_64-linux-gnu/libGL.so.1 (0x00007fb3ad9a9000)
        libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007fb3ad66f000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb3ad2da000)
        libglapi.so.0 => /usr/lib/x86_64-linux-gnu/libglapi.so.0 (0x00007fb3ad0b6000)
...

What should we do to help you more?
Comment 23 Marek Olšák 2011-04-21 00:26:31 UTC
No idea, but hard-resetting is useless for you as it only resets the source files.
Comment 24 Fabio Pedretti 2011-04-21 01:03:22 UTC
I have a similar problem that happens on r300g/r300/swrastg/swrast when using shared-glapi. It still affects only glxgears but rather than crashing it shows psychedelic colors.

https://bugs.freedesktop.org/show_bug.cgi?id=35268 may be related, I also tried applying patch attached there but it doesn't fix my problem.
Comment 25 Alexandre Demers 2011-04-22 15:21:23 UTC
Since the problem introduced by commit 34a5d3b9f4740601708c82093e2114356d749e65 happens only when enabling shared glapi, could it be caused by something not properly exported in (or linked to) the libglapi.so. According to commits before the culprit one (there are 7 of them), some were targetting the glapi and needed regeneration of lists and values. Could it be that something changed, but was not changed when it's "exported" in the common shared glapi? Could it be something supposed to be exported in libglapi, but is not?
Comment 26 almos 2011-04-30 01:01:02 UTC
Here's some update on this. I found another application that segfaults: neverball. I also found that setting LD_LIBRARY_PATH=/home/almos/sources/mesa/lib solves the problem (as I mentioned earlier, I used to set LIBGL_DRIVERS_PATH only), which might indicate that it's indeed caused by libGL.so mismatch, but I still don't understand why only these two segfault. BTW I also did a bisect, and I can confirm that revision 34a5d3b9f4740601708c82093e2114356d749e65 is the first bad commit.

I'm not sure about that glapi thing, though. I compile Mesa with:
./autogen.sh --with-state-trackers=dri,glx,egl --enable-debug --with-dri-drivers= --prefix=/usr --enable-texture-float
and I have no libglapi.so anywhere. What does that do anyways?
Comment 27 Chistopher Krakowiak 2011-04-30 06:28:09 UTC
*** Bug 36708 has been marked as a duplicate of this bug. ***
Comment 28 Alexandre Demers 2011-04-30 07:56:27 UTC
(In reply to comment #26)
> Here's some update on this. I found another application that segfaults:
> neverball. I also found that setting
> LD_LIBRARY_PATH=/home/almos/sources/mesa/lib solves the problem (as I mentioned
> earlier, I used to set LIBGL_DRIVERS_PATH only), which might indicate that it's
> indeed caused by libGL.so mismatch, but I still don't understand why only these
> two segfault. BTW I also did a bisect, and I can confirm that revision
> 34a5d3b9f4740601708c82093e2114356d749e65 is the first bad commit.
> 
> I'm not sure about that glapi thing, though. I compile Mesa with:
> ./autogen.sh --with-state-trackers=dri,glx,egl --enable-debug
> --with-dri-drivers= --prefix=/usr --enable-texture-float
> and I have no libglapi.so anywhere. What does that do anyways?

--enable-shared-glapi creates another library (libglapi.so). This library contains de shared code between OpenGL (let'S say desktop), OpenGL ES and maybe even OpenVG, instead of having duplicated code in each GL implementation.

However, in my case, it doesn't have to do with a library left somewhere. As I tested and showed, everything is pointing to the right ones.
Comment 29 almos 2011-05-15 10:37:40 UTC
Now I tried this again with current Mesa, and neither glxgears nor neverball segfaults. They now run both with and without --enable-shared-glapi, and LD_LIBRARY_PATH is not needed.
Comment 30 Alexandre Demers 2011-05-15 11:35:38 UTC
(In reply to comment #29)
> Now I tried this again with current Mesa, and neither glxgears nor neverball
> segfaults. They now run both with and without --enable-shared-glapi, and
> LD_LIBRARY_PATH is not needed.

Your flag (LD_LIBRARY_PATH=...) was forcing to use the libraries under that path. If you remove it without doing "make install", then you are only using the GL files provided by your distribution. May I suggest another test? Could you try compiling mesa without --enable-shared-glapi and run "ldd /usr/bin/glxgears". It shouldn't list libglapi. Then compile mesa again with --enable-shared-glapi, run again "ldd /usr/bin/glxgears". Now, it should list libglapi. Let me know what your results are.
Comment 31 almos 2011-05-15 12:11:30 UTC
(In reply to comment #30)
> (In reply to comment #29)
> > Now I tried this again with current Mesa, and neither glxgears nor neverball
> > segfaults. They now run both with and without --enable-shared-glapi, and
> > LD_LIBRARY_PATH is not needed.
> 
> Your flag (LD_LIBRARY_PATH=...) was forcing to use the libraries under that
> path. If you remove it without doing "make install", then you are only using
> the GL files provided by your distribution. May I suggest another test? Could
> you try compiling mesa without --enable-shared-glapi and run "ldd
> /usr/bin/glxgears". It shouldn't list libglapi. Then compile mesa again with
> --enable-shared-glapi, run again "ldd /usr/bin/glxgears". Now, it should list
> libglapi. Let me know what your results are.

Up until now I was compiling mesa without --enable-shared-glapi, and I usually never use LD_LIBRARY_PATH (i.e. the compiled r300_dri.so is loaded by the system libGL.so), and it works fine.

This bug was reported to only occur if shared glapi is enabled, but I experienced that it is independent of it. I also found that it doesn't crash if LD_LIBRARY_PATH is pointing to the compiled libGL.so. Now with current Mesa master it doesn't crash with any combination (I might have skipped some, though) for me. It does have rendering errors, though, see #37234.

Now that I have mesa compiled with --enable-shared-glapi, ldd `which glxgears` lists libglapi.so, if I set LD_LIBRARY_PATH. Does this answer your question?
Comment 32 Alexandre Demers 2011-05-15 14:47:35 UTC
Problem still present when using --enable-shared-glapi here, not when disabled.
Comment 33 Marc Dietrich 2011-05-30 05:00:16 UTC
Created attachment 47309 [details]
screenshot of glxgears with MESA_DEBUG=all
Comment 34 Marc Dietrich 2011-05-30 05:01:37 UTC
tried with todays git HEAD and found that the segfault does not occur with MESA_DEBUG=all, but there seem to be some lightning problem (see screenshot in the previous comment).
Comment 35 Chia-I Wu 2011-06-08 06:50:12 UTC
34a5d3b9f4740601708c82093e2114356d749e65 is indeed the bad commit.  The bug is triggered when libglapi and libdricore use different struct glapi_table.  This happens when shared glapi is enabled.  Or when mixing libGL built from 7.10 with a DRI driver built from git head.  I will work on a fix.
Comment 36 Chia-I Wu 2011-06-08 08:22:38 UTC
The symptom I had was the same as Marc's.  I've pushed several changes to fix it and to avoid future breakage.  Please test again.
Comment 37 ojab 2011-06-08 10:18:33 UTC
glxgears works fine for me (r600g) with mesa aea2236 compiled with --enable-shared-glapi. Looks like fixed.
Comment 38 Alexandre Demers 2011-06-08 10:22:07 UTC
I'll test it later today when I'll get home.
Comment 39 Alexandre Demers 2011-06-08 13:55:29 UTC
Seems good here too.
Comment 40 Marc Dietrich 2011-06-09 00:36:06 UTC
yup - also fixed here.
Comment 41 Fabio Pedretti 2011-06-09 01:32:08 UTC
Just a note: configure says shared glapi is experimental:

  --enable-shared-glapi   EXPERIMENTAL. Enable shared glapi for OpenGL
                          [default=no]

Is it still true for some reason or simply configure wasn't updated?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.