Bug 400

Summary: [TRACKER] dlloader issues
Product: xorg Reporter: Mike A. Harris <mharris>
Component: Server/DDX/Xorg/dlloaderAssignee: Adam Jackson <ajax>
Status: RESOLVED FIXED QA Contact:
Severity: enhancement    
Priority: low CC: ajax, basic, dberkholz, eric, gad.kadosh, solar, swtaylor, tseng
Version: git   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 377, 393, 962, 978, 1054, 1114, 1288, 1324, 1760, 2787, 3016, 3196, 3567, 4361, 5023    
Bug Blocks: 600    
Attachments:
Description Flags
Patch to enable dlopen loader use on Solaris
none
version 1
none
fixes for ati drivers
none
minor update none

Description Mike A. Harris 2004-04-01 14:55:05 UTC
This is a master tracking bug to track all issues related to circular
dependancies between modules causing the dlopen() based loader to fail
to get a running X server, when the server is compiled with:

#define MakeDllModules YES


My understanding of this issue, feel free to correct me if I'm wrong, is
that the Dll loader has never been used by default in any official XFree86
release, and has only been used specifically by developers when trying to
bootstrap a new architecture to work with the server prior to the ELF loader
being enhanced to properly support the new architecture.

To my knowledge, at present, the DLL loader is not an officially supported
module loading mechanism, and is not used on any platform/architecture
combination by default, which is why it is broken currently.

There are several people who would like to use this loader, and to see it
become the default module loading mechanism in the future, as that would
make debugging easier, and would eliminate all of the complexities of the
ELF loader, such as problems with NX stack/heap.

Nobody to my knowledge has expressed interest in officially owning and
fixing this problem yet, however it is useful to have a master tracker for
the issue, so that each bug that gets filed can be flagged as blocking this
master bug.

Having said that, someone who is interested in seeing this actually work,
or who has a requirement that it work is most likely going to have to
stand up and volunteer to investigate the issues and fix them and submit
patches for review.

Also note, that the dlopen() loader is not portable to all architectures
supported by the X server, and so it may need to be handled specially
for each architecture that differs.
Comment 1 Brandon Hale 2004-04-04 18:26:55 UTC
From my experience with the dlloader, it seems the best way to attack it is to
relink the modules as shared objects linked to all appropriate libs. Currently
afaik X modules are not linked, so dlloader cannot perform symbol resolution.
Comment 2 Alan Coopersmith 2004-04-07 14:59:53 UTC
Actually, I've got the dlopen loader running on Solaris x86 and it seems to work
fine.  To make it work I had to write a script that runs after the build finishes,
matches up the symbols and requirements, generates the correct linker flags, and 
then relinks all the modules with the flags.  I haven't tried this on anything
but Solaris x86, so don't know if it will work anywhere else, but will attach the
files that are working for me in case they are useful for someone else.
Comment 3 Alan Coopersmith 2004-04-07 15:13:48 UTC
Created attachment 176 [details] [review]
Patch to enable dlopen loader use on Solaris

(This is against XORG-RELEASE-1 from before the -TM merge, hence the XFree86
references.)
Comment 4 Adam Jackson 2004-06-08 09:26:15 UTC
i've been doing similar work fixing up the ati driver to work with the dlopen
loader:

http://freedesktop.org/~ajax/

basically, drivers just love accessing data in other modules directly.  that's a
bad thing.  with libdl, symbolic data references have to be resolvable at
dlopen() time - unlike functions, which can be redirected to the linker and thus
resolved lazily.

usually this is a cry for refactoring; global variables are bad.  in libati i've
fixed this by one of two methods:

- move the functionality into a function inside the appropriate module (preferred)
- wrap the variable in an accessor function (ugly)

with those changes i can launch X, using only the libdl loader, on linux/x86,
and without special link-time tricks.  as an added bonus the code functions
identically between loaders, so it'll still work with the old loader if you
really want to.

alternatively you can tag the offending variables as weak references, but that's
a portability nightmare i'd just as soon avoid.

which i suppose means i'm volunteering to fix this.  patches to follow shortly.
Comment 5 Brandon Hale 2004-06-09 22:31:23 UTC
As a side note, this is related to fdo bug #600
Comment 6 Adam Jackson 2004-06-25 23:07:47 UTC
accepting, i've started fixing this in debrix.
Comment 7 Adam Jackson 2004-07-21 22:51:45 UTC
reassign
Comment 8 Adam Jackson 2004-07-27 23:26:39 UTC
Created attachment 534 [details] [review]
version 1

this patch covers all the video drivers under hw/xfree86 that build on x86 by
default, with the exception of ati (fixed in debrix CVS, haven't pulled the fix
yet) and glide (superceded by the voodoo driver as far as i'm concerned).  all
the drivers at least load themselves to the point of probing for hardware and
failing.  vesa, vga, and tdfx all load successfully at all color depths.  i've
also got an i740, mach64, and s3virge i need to test on.  the GLX/DRI issues
were resolved earlier in bug 377.

known issues:

- not all framebuffer modules have been tested yet.  untested formats:
xf8_32wid (sunffb), xf8_32bpp (mga, glint), xf8_16bpp (chips), cfb* (s3virge),
xf24_32bpp (s3virge), afb (vesa, fbdev).  i'll take the s3virge ones, the
others i'll need testers for.

- this first version uses LoaderSymbol() rather prodigiously.  technically ISO
C forbids assigning a void* to a function pointer, so we get some warnings in
gcc due to our use of -ansi -pedantic; other compilers may have more serious
problems, though if they do i wonder how they support dlopen at all.  more
seriously, LoaderSymbol is very heavyweight, walking the whole symbol table for
the named function.  i don't believe any of the LoaderSymbol calls are on fast
paths, though.	the general alternative is to create weak resolver functions
that return the address of the desired function, but that gets ugly quick.  in
some instances this can be worked around entirely (see: cirrus driver).

- several arrays in XAA, mfb, and xf[14]bpp were wrapped with accessor
functions.  it's unknown how much performance impact this will have; i suspect
it's down in the noise region.

- some of the fbdevhw users could be made to use the new FillInScreenInfo
function.

- though i've tried hard to preserve the ABI (despite numerous temptations), i
may have missed something.

- no input drivers besides keyboard and mouse have been attempted.

- ati fixes from debrix haven't been imported.

please test.
Comment 9 Adam Jackson 2004-07-29 22:57:14 UTC
Created attachment 546 [details] [review]
fixes for ati drivers

this version imports the ati fixes from debrix.  r128 and radeon drivers still
can't be loaded directly due to some R*Chipset stupidity, but they'll load if
you say Driver "ati".  i know the fix, it's just not pretty, and i want to get
this out and tested quickly.  other than that, all drivers load and probe
successfully.  (note the loading issues are only applicable for people using
dlloader, elfloader users are unaffected.)

input drivers turn out not to be a problem, since none of them call
LoadSubModule.	i'll sanity check this patch on the hardware i have, and if it
passes i'll commit.  s3virge is also affected by the XAA changes, so i'll run
x11perf before and after to see what sort of damage we're looking at.
Comment 10 Adam Jackson 2004-07-30 13:17:49 UTC
Created attachment 549 [details] [review]
minor update

now actually tested on an ati card.  works on all the hardware i have here. 
cfb and the overlay framebuffer formats still need fixing, and radeon and r128
still need to be loaded from ati.

i don't know of any bugs besides that, so i'm pushing it to the tree.  i'm
keeping a copy here should catastrophe strike and we need to back it out.

the LoaderSymbol() abuse will disappear shortly.
Comment 11 Adam Jackson 2004-08-18 10:08:58 UTC
A quick list of known dlloader issues is also maintained at
http://freedesktop.org/~ajax/dlloader-status.txt
Comment 12 Gad Kadosh 2005-06-08 02:26:02 UTC
I've been using dlloader with xorg-6.8.2 for quite some time now, with no
problem at all (Gentoo..)
Recently I wanted to give 6.8.99.8 snapshot a try. It gave me unresolved symbols
errors on some modules (not all of them), like glx, dri, vga, vesa, radeon. I
only assume it's related to dlloader, but haven't yet tried to recompile it
without dlloader to check.
Comment 13 Adam Jackson 2005-06-08 08:17:08 UTC
that's almost certainly because you're using gentoo's hardened toolchain, which
sets -z now in the linker by default.  this is guaranteed to fail, and their
ebuild should correctly work around it.
Comment 15 Gad Kadosh 2005-06-08 13:58:44 UTC
Than wasn't it working for me ? compiled with dlloader USE flag and gcc-hardened
3.3.5 ?
Comment 16 Adam Jackson 2005-12-15 06:39:44 UTC
moving off the 7.0 tracker, all the known bugs blocking this one have been resolved.
Comment 17 Adam Jackson 2005-12-25 17:35:36 UTC
since dlloader is now the default i'm closing the tracker.  pleas file further
dlloader bugs individually against the appropriate component.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.