Bug 2749

Summary: Dynamic modules need visibility cleanups
Product: xorg Reporter: Adam Jackson <ajax>
Component: Server/DDX/XorgAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED INVALID QA Contact:
Severity: enhancement    
Priority: medium CC: alan.coopersmith, idr, michel, roland.mainz, Seongbae.Park
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 3360    
Bug Blocks:    
Attachments:
Description Flags
visibility-for-fb-1.patch none

Description Adam Jackson 2005-03-16 13:06:08 UTC
fb exports 330 symbols but has an effective public API of only 8.  exporting
just those 8 symbols reduces the object's size by 20k (about 10%) on x86.  the
function call entry for the other 322 symbols is shorter as a result, so
function calls get significantly cheaper.  i measured about a 6% speedup in
Render with render_bench with an early version of the attached patch.

fb will show the most performance improvement from this sort of cleanup, since
most other modules aren't CPU-intensive.  however the footprint reduction would
be similar across all modules.  excluding GLcore (which has its own set of
issues) this would drop runtime code footprint by about 160k on x86 assuming 10%
is typical.

this is potentially an ABI breaking change.

when i have solid performance numbers i'll post them here.
Comment 1 Adam Jackson 2005-03-16 13:07:29 UTC
Created attachment 2128 [details] [review]
visibility-for-fb-1.patch

gcc-only, dirty, contains bits of other changes but should work.  to make it
take effect, tweak fb's {i,}makefile to include -fvisibility=hidden and compile
with gcc 3.4 or later.
Comment 2 Roland Mainz 2005-03-16 14:10:01 UTC
seongbae/alanc:
Does Sun Workshop/Forte have a similar flag to hide symbols ?
Comment 3 Alan Coopersmith 2005-03-16 14:21:45 UTC
I don't know if there's a compiler flag - we usually use linker mapfiles to 
control symbol visibility in our Solaris builds.   Of course, for things only
referenced from a single file, static works well on any compiler I know of, so
those changes are no problem.
Comment 4 Seongbae Park 2005-03-16 14:24:46 UTC
# cc -flags | grep scope
-xldscope=<a>         Indicates the appropriate linker scoping within the source
prog
ram; <a>={global|symbolic|hidden}

-xldscope=hidden will do what -fvisibility=hidden does in gcc.

This sets the default linker scoping for symbols.
You can set linker scoping for each symbol in the source by using __global,
__symbolic, __hidden attribute for symbol declaration in c/c++.

Of course, as Alan pointed out, you can use the map file.
Comment 5 Adam Jackson 2005-03-16 14:36:19 UTC
very cool, didn't know about -xldscope.  it and the tags don't seem to be in the
Forte 7 C developers manual:

http://docs.sun.com/source/816-2454/index.html

the only problem with using the map file is it's after code generation has been
done.  the symbol won't show up in the dynamic symbol table but the call
prologue for it will still be the same as if it were default visibility, so you
still hit the PLT/GOT indirection overhead.  better than nothing.

i'd probably pursue a hybrid approach of adding both static and EXPORTED tags,
with EXPORTED #defined to nothing for compilers with no visibility control and
to __global or equivalent for good compilers.  this provides an easy transition
path, because when we detect a good compiler we can just add the option to
CFLAGS and win.  this is the approach i used in Mesa and it works well.
Comment 6 Seongbae Park 2005-03-16 15:00:46 UTC
-xldscope was first introduced in Studio 8 compiler.
I don't think using mapfiles would incur PLT overhead (at least on SPARC) - 
the only disadvantage of using mapfiles is a bit of link time overhead, 
and missing compiler optimizations such as inlining (or any optimization that
requires the exact caller/callee relationship) but no more than that.
Comment 7 Adam Jackson 2005-03-16 15:05:37 UTC
(In reply to comment #6)
> -xldscope was first introduced in Studio 8 compiler.

that would explain it.

> I don't think using mapfiles would incur PLT overhead (at least on SPARC) - 
> the only disadvantage of using mapfiles is a bit of link time overhead, 
> and missing compiler optimizations such as inlining (or any optimization that
> requires the exact caller/callee relationship) but no more than that.

my worldview is heavily x86 biased (though i'm trying to change that), so you're
probably right.  thanks for the hints.  when i do this for real i'll be sure to
add the bits for the Sun compiler.
Comment 8 Adam Jackson 2005-03-19 17:30:02 UTC
quick performance numbers using render_bench and Xvfb, with imlib2 numbers culled.  before:

*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 8.254 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 3.650 sec.
*** ROUND 2 ***
---------------------------------------------------------------
Test: Test Xrender doing 1/2 scaled Over blends
Time: 7.601 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing 1/2 scaled Over blends
Time: 7.626 sec.
*** ROUND 3 ***
---------------------------------------------------------------
Test: Test Xrender doing 2* smooth scaled Over blends
Time: 170.836 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing 2* smooth scaled Over blends
Time: 171.429 sec.

and after:

*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 7.622 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 3.493 sec.
*** ROUND 2 ***
---------------------------------------------------------------
Test: Test Xrender doing 1/2 scaled Over blends
Time: 7.520 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing 1/2 scaled Over blends
Time: 7.291 sec.
*** ROUND 3 ***
---------------------------------------------------------------
Test: Test Xrender doing 2* smooth scaled Over blends
Time: 168.337 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing 2* smooth scaled Over blends
Time: 169.048 sec.

the Render software path is extremely function-call intensive in the unscaled case, so the Round 1 
numbers are what's interesting here.  nearly 8% faster, not bad.  the speedup won't be as big on 
hardware-backed servers since framebuffer reads are slow and Render does a lot of them.
Comment 9 Adam Jackson 2006-04-15 05:39:39 UTC
this bug is too confused to be "fixed".

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.