Bug 98428

Summary: Undefined non-weak-symbol in dri-drivers
Product: Mesa Reporter: NicolasChauvet <kwizart>
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact: mesa-dev
Severity: normal    
Priority: medium CC: ajax, devurandom, drinkcat, dz1125.bug.tracker, eugene.shatokhin, idr, jdieter, kyle.brenneman, lordheavym
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Avoid undefined non-weak-symbol for libglapi in dri drivers
Updated patch that applies to both mesa and gallium drivers
glapi: v3 Link with glapi when built shared
Revert "gbm: dlopen libglapi so gbm_create_device works"
mesa: glapi: Clean-up dlopening glapi as we are building shared by default

Description NicolasChauvet 2016-10-25 08:02:45 UTC
Created attachment 127532 [details] [review]
Avoid undefined non-weak-symbol for libglapi in dri drivers

This issue was initialy reported in:
https://github.com/NVIDIA/libglvnd/issues/104

There are undefined non-weak-symbol in dri drivers as shown bellow:
---
ldd  -r /usr/lib64/dri/i965_dri.so 
	linux-vdso.so.1 (0x00007ffec66d8000)
	libdrm_intel.so.1 => /lib64/libdrm_intel.so.1 (0x00007fc7edf1f000)
	libdrm_nouveau.so.2 => /lib64/libdrm_nouveau.so.2 (0x00007fc7edd16000)
	libdrm_radeon.so.1 => /lib64/libdrm_radeon.so.1 (0x00007fc7edb0a000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fc7ed8e3000)
	libdrm.so.2 => /lib64/libdrm.so.2 (0x00007fc7ed6d4000)
	libexpat.so.1 => /lib64/libexpat.so.1 (0x00007fc7ed4a7000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc7ed28b000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fc7ed087000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fc7ecd7d000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fc7ecb66000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fc7ec7a4000)
	/lib64/ld-linux-x86-64.so.2 (0x0000564f41df9000)
	libpciaccess.so.0 => /lib64/libpciaccess.so.0 (0x00007fc7ec599000)
	libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fc7ec326000)
undefined symbol: _glapi_get_dispatch_table_size	(/usr/lib64/dri/i965_dri.so)
undefined symbol: _glapi_get_context	(/usr/lib64/dri/i965_dri.so)
undefined symbol: _glapi_add_dispatch	(/usr/lib64/dri/i965_dri.so)
undefined symbol: _glapi_check_multithread	(/usr/lib64/dri/i965_dri.so)
undefined symbol: _glapi_tls_Context	(/usr/lib64/dri/i965_dri.so)
undefined symbol: _glapi_set_context	(/usr/lib64/dri/i965_dri.so)
undefined symbol: _glapi_set_dispatch	(/usr/lib64/dri/i965_dri.so)
undefined symbol: _glapi_tls_Dispatch	(/usr/lib64/dri/i965_dri.so)
---

This doesn't lead to issue (probably) because dri drivers are linked to libGL.so which is linked to libglapi, so the symbols are resolved at runtime dynamically.

But with glvnd enabled mesa, the libglvnd's libGL isn't linked to libglapi anymore (libGLX_mesa is). This lead process like Xorg server to miss the symbol resolution and fail to load the appropriate DRI driver.

The suggested fix links to a shared libglapi everywhere (not only on ANDROID).
Comment 1 NicolasChauvet 2016-10-25 08:23:39 UTC
I'm still testing this patch
But at least, the previous commit (0cbc90c57c) seems to indicate some rationale.
Quoting:
    On non-Android (non-bionic) platform, EGL uses the following
    workflow, which works fine:
      dlopen("libglapi.so", RTLD_LAZY | RTLD_GLOBAL);
      dlopen("dri/<driver>_dri.so", RTLD_NOW | RTLD_GLOBAL);

@Nicolas
Can you comment on why this is better to dlopen instead of to link ?
Comment 2 Emil Velikov 2016-10-25 11:36:28 UTC
Correction:

"dri drivers are _not_ linked to libGL.so"

Quick fix is to retroactively fix xserver libglx.so to dlopen libGLX_mesa/libOpenGL (both are required iirc), and hope for the best if we fail.
Comment 3 Emil Velikov 2016-10-25 11:42:23 UTC
On a more comprehensive note:

One of the goals behind GLVND is to reuse mesa's GLAPI and allow us to drop our "copy". The latter seems to have diminished and my attempts to get it back on board have been shot down, sort of speak.

The original issue:

Ajax would be one of the better people to answer this, but the gist is that there can/could be than one provider* for the entry points - which one gets picked is (implicitly) determined by the specific DRI loader.

Thus if we link against the "wrong" provider things will end up badly.

At the same time - DRI loader/driver interface is (and was last time I've tried) stable. So if one is to use a) old libGL.so which provides the symbols and b) new DRI module (linked against libglapi.so) you get symbol collision.


If we are to break the stability we should consider:

 - DRI drivers which don't receive much love these days, don't support latest version of __DRI_EXTENSION v19 or are missing the some extensions all together.

 - dri1,2,3 capability/support

 - Support for old server(s) - eg. make libEGL/x11 require DRI2 minor >= 3

 - Dropping {barely ,un}used codepaths - eg. loaders mandate __DRI_DRI2 >= 4 and similar on the driver side.

 - And last but not least - reuse GLVND dispatch. For this we would need to convince Kyle that GLdisplatch.so can be accessible by vendors and it gets some ABI stability (note the GLVND libraries in general seem ABI unstable).
Comment 4 Adam Jackson 2016-10-25 19:08:41 UTC
(In reply to Emil Velikov from comment #3)
> On a more comprehensive note:
> 
> One of the goals behind GLVND is to reuse mesa's GLAPI and allow us to drop
> our "copy". The latter seems to have diminished and my attempts to get it
> back on board have been shot down, sort of speak.

Shot down? How?

> The original issue:
> 
> Ajax would be one of the better people to answer this, but the gist is that
> there can/could be than one provider* for the entry points - which one gets
> picked is (implicitly) determined by the specific DRI loader.

"The" entry points. Which, glapi? If so, no, the DRI drivers really should be linking against it.

> At the same time - DRI loader/driver interface is (and was last time I've
> tried) stable. So if one is to use a) old libGL.so which provides the
> symbols and b) new DRI module (linked against libglapi.so) you get symbol
> collision.

If you're using shared glapi - and you really should be - then no. Both libGL and the DRI driver would be linked against libglapi and that's just fine, same way everything in the world is linked against libc and there's no collision.

>  - And last but not least - reuse GLVND dispatch. For this we would need to
> convince Kyle that GLdisplatch.so can be accessible by vendors and it gets
> some ABI stability (note the GLVND libraries in general seem ABI unstable).

We're likely to want libGLdispatch to be stable API anyway, yes. Consider this bit of discussion:

https://github.com/NVIDIA/libglvnd/pull/100#issuecomment-254588402
Comment 5 Emil Velikov 2016-10-28 15:43:20 UTC
(In reply to ajax at nwnk dot net from comment #4)
> (In reply to Emil Velikov from comment #3)
> > On a more comprehensive note:
> > 
> > One of the goals behind GLVND is to reuse mesa's GLAPI and allow us to drop
> > our "copy". The latter seems to have diminished and my attempts to get it
> > back on board have been shot down, sort of speak.
> 
> Shot down? How?
> 
> > The original issue:
> > 
> > Ajax would be one of the better people to answer this, but the gist is that
> > there can/could be than one provider* for the entry points - which one gets
> > picked is (implicitly) determined by the specific DRI loader.
> 
> "The" entry points. Which, glapi? If so, no, the DRI drivers really should
> be linking against it.
> 
Yes, the glapi entry points.

Are old xservers (ones which includes their own glapi dispatch) + glapi linked dri modules going to work ?

I thought the answer was no ? In that case we might want to check again with Ian, if the (his?) idea of keeping the DRI loader/driver is still feasible.

And if we go for breaking things, there's a _ton_ of things we can drop and/or fix ;-)

> > At the same time - DRI loader/driver interface is (and was last time I've
> > tried) stable. So if one is to use a) old libGL.so which provides the
> > symbols and b) new DRI module (linked against libglapi.so) you get symbol
> > collision.
> 
> If you're using shared glapi - and you really should be - then no. Both
> libGL and the DRI driver would be linked against libglapi and that's just
> fine, same way everything in the world is linked against libc and there's no
> collision.
> 
I do agree that we want to link against glapi, although one needs to convince distros and(?) Ian that breaking old libGL (admittedly only static ones) is fine.

> >  - And last but not least - reuse GLVND dispatch. For this we would need to
> > convince Kyle that GLdisplatch.so can be accessible by vendors and it gets
> > some ABI stability (note the GLVND libraries in general seem ABI unstable).
> 
> We're likely to want libGLdispatch to be stable API anyway, yes. Consider
> this bit of discussion:
> 
> https://github.com/NVIDIA/libglvnd/pull/100#issuecomment-254588402

I've mentioned a similar thing [1] [2] and it seems that one "has to" go through the winsys library/ies [2].

Even convincing to version GLdispatch.so ended up quite a process [3].

My personal favourite: "To clarify: libGLdispatch.so does not have an ABI." 
Sure it doesn't ... ;-)

[1] https://bugs.freedesktop.org/show_bug.cgi?id=92877#c2
[2] https://github.com/NVIDIA/libglvnd/pull/85
[3] https://github.com/NVIDIA/libglvnd/issues/59.
Comment 6 Adam Jackson 2016-10-28 19:05:07 UTC
(In reply to Emil Velikov from comment #5)

> Are old xservers (ones which includes their own glapi dispatch) + glapi
> linked dri modules going to work ?

Who cares? Anyone still running 1.14 is already in a state of sin.

I _suspect_ that 1.14 servers would still work fine in this case, since the copy of glapi in libglx would be ahead of libglapi proper in the resolution search order, so assuming xserver's copy then is compatible with glapi's interface now... If I really need to spin up a Fedora 20 vm to test that theory I suppose I can.
  
> I do agree that we want to link against glapi, although one needs to
> convince distros and(?) Ian that breaking old libGL (admittedly only static
> ones) is fine.

Again, I'm not sure this ends up being a break. If you had an old libGL that linked glapi statically, that copy should win out over the dynamic libglapi from your magically new DRI driver [1], unless I'm seriously misremembering how ELF works. It'd be fragile I admit, since libGL's copy of glapi would need to be compatible with the dynamic one, but that ought to be true.

[1] - Seriously though, nobody deploys driver updates like this. If you got a new driver you were already in a position to have updated libGL that you're sure is compatible with the new driver. The interesting compatibility path is that new libGL can load _old_ drivers, in my mind. The other direction seems like it's mostly useful for developers, who are perfectly capable of setting LD_LIBRARY_PATH as well as LIBGL_DRIVERS_PATH while testing.
Comment 7 Ian Romanick 2016-10-28 21:46:07 UTC
So... how does it work if the linked libglapi and libGL provide different, incompatible implementations of, say, _glapi_set_dispatch?
Comment 8 Adam Jackson 2016-10-31 20:26:22 UTC
(In reply to Ian Romanick from comment #7)
> So... how does it work if the linked libglapi and libGL provide different,
> incompatible implementations of, say, _glapi_set_dispatch?

Define "incompatible". If they have the same interface from the caller's perspective (same function names and signatures, and either compatible or opaque data types), then everything's fine, ld.so will consistently pick the same library in the search scope to provide those functions. I don't think the glapi caller actually knows the layout of the dispatch table, for example, so that could be whatever the implementation wants.
Comment 9 Kyle Brenneman 2016-10-31 21:31:37 UTC
> > >  - And last but not least - reuse GLVND dispatch. For this we would need to
> > > convince Kyle that GLdisplatch.so can be accessible by vendors and it gets
> > > some ABI stability (note the GLVND libraries in general seem ABI unstable).
> > 
> > We're likely to want libGLdispatch to be stable API anyway, yes. Consider
> > this bit of discussion:
> > 
> > https://github.com/NVIDIA/libglvnd/pull/100#issuecomment-254588402
> 
> I've mentioned a similar thing [1] [2] and it seems that one "has to" go
> through the winsys library/ies [2].
> 
> Even convincing to version GLdispatch.so ended up quite a process [3].
> 
> My personal favourite: "To clarify: libGLdispatch.so does not have an ABI." 
> Sure it doesn't ... ;-)
> 
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=92877#c2
> [2] https://github.com/NVIDIA/libglvnd/pull/85
> [3] https://github.com/NVIDIA/libglvnd/issues/59.

libGLdispatch.so doesn't have an ABI because it has no direct interface *at all* to applications or drivers. The entire interface is defined in libGLX.so and libEGL.so, including anything that passes through to libGLdispatch.so.

Now, providing more direct control over the dispatch tables is something that's come up before. This is the current version of that:
https://github.com/NVIDIA/libglvnd/pull/89

I don't have a good enough understanding of what Mesa needs out of it to be sure if that's sufficient, but if that overall design looks like it would work, then I'd be happy to dust it off and get it updated for the current EGL and GLX interfaces.
Comment 10 Emil Velikov 2016-11-01 13:18:38 UTC
(In reply to Kyle Brenneman from comment #9)
> > > >  - And last but not least - reuse GLVND dispatch. For this we would need to
> > > > convince Kyle that GLdisplatch.so can be accessible by vendors and it gets
> > > > some ABI stability (note the GLVND libraries in general seem ABI unstable).
> > > 
> > > We're likely to want libGLdispatch to be stable API anyway, yes. Consider
> > > this bit of discussion:
> > > 
> > > https://github.com/NVIDIA/libglvnd/pull/100#issuecomment-254588402
> > 
> > I've mentioned a similar thing [1] [2] and it seems that one "has to" go
> > through the winsys library/ies [2].
> > 
> > Even convincing to version GLdispatch.so ended up quite a process [3].
> > 
> > My personal favourite: "To clarify: libGLdispatch.so does not have an ABI." 
> > Sure it doesn't ... ;-)
> > 
> > [1] https://bugs.freedesktop.org/show_bug.cgi?id=92877#c2
> > [2] https://github.com/NVIDIA/libglvnd/pull/85
> > [3] https://github.com/NVIDIA/libglvnd/issues/59.
> 
> libGLdispatch.so doesn't have an ABI because it has no direct interface *at
> all* to applications or drivers. The entire interface is defined in
> libGLX.so and libEGL.so, including anything that passes through to
> libGLdispatch.so.
> 
libGLdispatch.so _has_ ABI - that's how libGLX.so and others use it. Sure it's private, unstable, etc. and that's fine. It is an ABI regardless and ignoring that won't make it disappear ;-)

> Now, providing more direct control over the dispatch tables is something
> that's come up before. This is the current version of that:
> https://github.com/NVIDIA/libglvnd/pull/89
> 
> I don't have a good enough understanding of what Mesa needs out of it to be
> sure if that's sufficient, but if that overall design looks like it would
> work, then I'd be happy to dust it off and get it updated for the current
> EGL and GLX interfaces.
As said a couple of times before (iirc Adam also said it as well) - exposing it via the winsys is not going to work.

The DRI modules also use the dispatch (due to $reasons) and they are winsys agnostic. If you're interested in specifics check the following:

$ git grep "\<\(CALL\|SET\)_[A-Z][a-z][a-zA-Z0-9]*"

To iterate - we _want_ to make the GLdispatch API/ABI accessible for vendors, since everyone will benefit from it. This is something that was implicitly mentioned on the initial design stages/proposal.
Comment 11 Kyle Brenneman 2016-11-01 20:33:46 UTC
(In reply to Emil Velikov from comment #10)
> (In reply to Kyle Brenneman from comment #9)
> > > > >  - And last but not least - reuse GLVND dispatch. For this we would need to
> > > > > convince Kyle that GLdisplatch.so can be accessible by vendors and it gets
> > > > > some ABI stability (note the GLVND libraries in general seem ABI unstable).
> > > > 
> > > > We're likely to want libGLdispatch to be stable API anyway, yes. Consider
> > > > this bit of discussion:
> > > > 
> > > > https://github.com/NVIDIA/libglvnd/pull/100#issuecomment-254588402
> > > 
> > > I've mentioned a similar thing [1] [2] and it seems that one "has to" go
> > > through the winsys library/ies [2].
> > > 
> > > Even convincing to version GLdispatch.so ended up quite a process [3].
> > > 
> > > My personal favourite: "To clarify: libGLdispatch.so does not have an ABI." 
> > > Sure it doesn't ... ;-)
> > > 
> > > [1] https://bugs.freedesktop.org/show_bug.cgi?id=92877#c2
> > > [2] https://github.com/NVIDIA/libglvnd/pull/85
> > > [3] https://github.com/NVIDIA/libglvnd/issues/59.
> > 
> > libGLdispatch.so doesn't have an ABI because it has no direct interface *at
> > all* to applications or drivers. The entire interface is defined in
> > libGLX.so and libEGL.so, including anything that passes through to
> > libGLdispatch.so.
> > 
> libGLdispatch.so _has_ ABI - that's how libGLX.so and others use it. Sure
> it's private, unstable, etc. and that's fine. It is an ABI regardless and
> ignoring that won't make it disappear ;-)
> 
> > Now, providing more direct control over the dispatch tables is something
> > that's come up before. This is the current version of that:
> > https://github.com/NVIDIA/libglvnd/pull/89
> > 
> > I don't have a good enough understanding of what Mesa needs out of it to be
> > sure if that's sufficient, but if that overall design looks like it would
> > work, then I'd be happy to dust it off and get it updated for the current
> > EGL and GLX interfaces.
> As said a couple of times before (iirc Adam also said it as well) - exposing
> it via the winsys is not going to work.
> 
> The DRI modules also use the dispatch (due to $reasons) and they are winsys
> agnostic. If you're interested in specifics check the following:
> 
> $ git grep "\<\(CALL\|SET\)_[A-Z][a-z][a-zA-Z0-9]*"
> 
> To iterate - we _want_ to make the GLdispatch API/ABI accessible for
> vendors, since everyone will benefit from it. This is something that was
> implicitly mentioned on the initial design stages/proposal.

I'll read through and look for the CALL/SET_ calls. The critical question for libGLdispatch, though: Does Mesa rely on changing the entries in dispatch tables on the fly, or does it just need to construct a few in advance and switch between them?

As for the details of what libGLdispatch would need to do, we should probably move the discussion over to the libglvnd thread instead of cluttering up this one.
Comment 12 Jonathan Dieter 2016-11-30 19:47:08 UTC
Created attachment 128291 [details] [review]
Updated patch that applies to both mesa and gallium drivers

I'm not quite sure what the final consensus is on this, but I'm trying to get a multiseat system working with some seats using the Intel and AMD open drivers and another using NVIDIA's binary driver.  I've gotten it working, but it required the attached patch (built on kwizart's original patch), so just thought I'd submit it.
Comment 13 NicolasChauvet 2017-01-13 17:05:34 UTC
Created attachment 128934 [details] [review]
glapi: v3 Link with glapi when built shared
Comment 14 NicolasChauvet 2017-01-13 17:06:09 UTC
Created attachment 128935 [details] [review]
Revert "gbm: dlopen libglapi so gbm_create_device works"
Comment 15 NicolasChauvet 2017-01-13 17:06:45 UTC
Created attachment 128936 [details] [review]
mesa: glapi: Clean-up dlopening glapi as we are building shared by default
Comment 16 NicolasChauvet 2017-01-13 17:14:12 UTC
I went deeper in this issue assuming the real fix is to build glapi shared and link to it anyway. (I'm not sure if there is any users that will find the dlopening glapi only as needed useful over not building glapi completely).

If there are real users, maybe this can be made a build time configuration switch ? (In this case I will try to make a v4).

For Fedora users, there is a tests repo with this patch added:
https://copr.fedorainfracloud.org/coprs/kwizart/glvnd/
dnf copr enable kwizart/glvnd
dnf update
Comment 17 Darek 2017-01-15 16:54:53 UTC
Segfault after '0003-mesa-glapi-Clean-up-dlopening-glapi-as-we-are-buildi.patch'

---
$ weston
               GL_EXT_copy_image GL_EXT_draw_elements_base_vertex
               GL_EXT_polygon_offset_clamp GL_EXT_texture_border_clamp
               GL_KHR_context_flush_control GL_OES_copy_image
               GL_OES_draw_elements_base_vertex GL_OES_texture_border_clamp
               GL_OES_texture_stencil8 GL_EXT_blend_func_extended
               GL_EXT_clip_cull_distance GL_EXT_window_rectangles
               GL_MESA_shader_integer_functions
Segmentation fault (core dumped)
---
---
$ dmesg
[ 1516.138552] sway[25188]: segfault at 0 ip    (null) sp 00007ffdf089a008 error 14 in sway[400000+3a000]
[ 1541.545016] es2tri[25201]: segfault at 0 ip  (null) sp 00007fff272a05d8 error 14 in es2tri[400000+3000]
[ 1564.770186] weston[25214]: segfault at 0 ip  (null) sp 00007ffecd7b8fd8 error 14 in weston[400000+d000]
---

Steps to reproduce:

- build mesa-git-17.0.0_devel.88149.c9b74f3f03 + 0001 + 0002 + 0003
  Note: mesa-git + 0001 + 0002 without issue.
- run weston or es2tri
Comment 18 Emil Velikov 2017-05-04 17:26:16 UTC
Should be fixed in master and I'll pick the patches for stable.

FTR - 1/3 is incomplete while 2/3 and 3/3 are wrong (as seen in comment 17)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.