Bug 4197

Summary:	fix TEXTRELs in libGL.so
Product:	Mesa	Reporter:	Mike Frysinger <vapier>
Component:	Mesa core	Assignee:	mesa-dev
Status:	CLOSED NOTABUG	QA Contact:
Severity:	normal
Priority:	high	CC:	bugzillas, dberkholz, joshuabaergen, junk_mail, pageexec, solar
Version:	git
Hardware:	x86 (IA32)
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:
Attachments:	xorg-x11-pic-notextrel.patch fixed version of the mesa pic patch updated patch updated patch take 2 GLX_NO_TEXREL to enable/disable. GLX_NO_TEXREL to enable/disable. (version 2)

Description Mike Frysinger 2005-08-22 17:00:47 UTC

find attached a patch that applies to cvs head in
xorg/xc/extras/Mesa/src/mesa/x86/ ... this was last tested against Gentoo
xorg-x11-6.8.2-r2 on x86, but the patch applies pretty cleanly to cvs head

the patch (by the PaX team afaik) updates the x86 asm to be PIC friendly so that
we dont end up with TEXTRELs in libGL.so

the only file that changed was extras/Mesa/src/mesa/x86/glapi_x86.S to include
TLS support, but a quick check looks like the TLS code should be PIC friendly so
it does not need additional work

Comment 1 Mike Frysinger 2005-08-22 17:01:12 UTC

Created attachment 2998 [details] [review]
xorg-x11-pic-notextrel.patch

Comment 2 PaX Team 2005-08-23 03:12:48 UTC

Created attachment 3003 [details] [review]
fixed version of the mesa pic patch

oops, this was the buggy patch, don't apply it (it changes the stub size that
will quickly make GL apps crash when the dispatcher indexes into the stub
array)! instead look at this new one, it should get things right. also it
changes gl_x86_asm.py that generates glapi_x86.S, which is the right way, i
guess. note a trick in get_dispatchbase: it assumes that it is called by
relative 'call' insn (which is a given in this case since it's called only from
the stubs) so that it can extract the 4 byte relative offset from it and
compute its own position in memory, without having to do another round of
call/pop or call/mov/retn.

Comment 3 Alex Deucher 2005-08-23 12:23:11 UTC

this should probably be assigned to upstream Mesa.

Comment 4 Adam Jackson 2005-11-30 16:37:55 UTC

patch doesn't apply against 6.4, haven't checked if that's a trivial failure or
not yet.

Comment 5 PaX Team 2005-11-30 21:26:08 UTC

Created attachment 3951 [details] [review]
updated patch

here's a quick rediff against the standalone Mesa-6.4 package. note that i
couldn't find the gl_x86_asm.py script so its patch is omitted, whoever has it
will have to update it per the changes in glapi_x86.S.

Comment 6 PaX Team 2005-11-30 23:24:53 UTC

Created attachment 3954 [details] [review]
updated patch take 2

ignore the previous one, it had some bugs (blind forward port wasn't a good
idea ;-). also, there're much more textrels due to -fPIC being explicitly
omitted from the x86 DRI makefiles, i fixed that by patching
configs/linux-dri-x86 but i'm not sure if it's the correct way.

on a sidenote, i don't see how GLX_USE_TLS stub handling can work at all, for
some reason it wants to 'relocate' the stubs at runtime but they're in .text
which is not marked as writable (neither in the ELF program header nor via
mprotect(). not that if it could work with an enforcement like PaX, mind you.
the more important question is, why that runtime patching is needed at all. i
think @NTPOFF is resolved at link time so that whole runtime relocation thing
can go and the stubs can directly reference %gs:_glapi_tls_Dispatch@NTPOFF.

Comment 7 Ian Romanick 2005-12-01 09:05:22 UTC

(In reply to comment #6)

FYI, gl_x86_asm.py is located in src/mesa/glapi.

> ignore the previous one, it had some bugs (blind forward port wasn't a good
> idea ;-). also, there're much more textrels due to -fPIC being explicitly
> omitted from the x86 DRI makefiles, i fixed that by patching
> configs/linux-dri-x86 but i'm not sure if it's the correct way.

At the very least, this patch needs some conditionals to disable it.  I suspect
that a lot of this will break on other platforms (e.g., Windows, cygwin, etc.).
 I also expect that it will kill the performance of quite a few apps.  There's a
reason that we use custom assembly for this part of the code, and this patch
seems to defeat most of that purpose.

Given the choice between removing TEXTRELs and improving (or maintaining)
performance, I will pick performance every time.

> on a sidenote, i don't see how GLX_USE_TLS stub handling can work at all, for
> some reason it wants to 'relocate' the stubs at runtime but they're in .text
> which is not marked as writable (neither in the ELF program header nor via
> mprotect(). not that if it could work with an enforcement like PaX, mind you.
> the more important question is, why that runtime patching is needed at all. i
> think @NTPOFF is resolved at link time so that whole runtime relocation thing
> can go and the stubs can directly reference %gs:_glapi_tls_Dispatch@NTPOFF.

Grep for 'wtext' in glapi_x86.S.  The run-time patching is done to inline the
call to _x86_get_dispatch and thereby avoid the extra overhead.  Yes, there are
applications that would see measurable performance degredation from leaving that
as a call (rather than inline).

If I recall correctly, having the 'movl %gs:_glapi_tls_Dispatch@NTPOFF, %eax'
inline at build-time had a significant impact on the disk size of libGL.so.

Comment 8 PaX Team 2005-12-18 12:36:03 UTC

first of all, to daniel@freedesktop.org: please do not remove the dependency of
1690, it was an explicit request from ajax back then. second, if you do remove
it, then at least give some food for thought for discussion. mine's below,
regarding the importance of text relocations.

(In reply to comment #7)
> (In reply to comment #6)
> 
> FYI, gl_x86_asm.py is located in src/mesa/glapi.

i can't find it in my copy of MesaLib-6.4.tar.bz2 (md5sum:
85a84e47a3f718f752f306b9e0954ef6), else i would have modified it as i did before.

> > ignore the previous one, it had some bugs (blind forward port wasn't a good
> > idea ;-). also, there're much more textrels due to -fPIC being explicitly
> > omitted from the x86 DRI makefiles, i fixed that by patching
> > configs/linux-dri-x86 but i'm not sure if it's the correct way.
> 
> At the very least, this patch needs some conditionals to disable it.
>  I suspect that a lot of this will break on other platforms (e.g., Windows,
cygwin, etc.).

some of the patch explicitly depends on __PIC__ already, mmx_blend* and
glapi_x86.S don't. i'll modify mmx_blend* to accomodate non-PIC capable
platforms if that code can be used on them(?), but i'm unsure about glapi_x86.S
as it already has an explicit ifndef __WIN32__ and DJGPP's Makefile.DJ doesn't
use it at all. are there any other x86 platforms that use this file but are not
capable of PIC?

>  I also expect that it will kill the performance of quite a few apps.  There's a
> reason that we use custom assembly for this part of the code, and this patch
> seems to defeat most of that purpose.

if you *really* cared about performance (instead of trying to find an excuse)
then you'd have looked hard at the whole GL API/ABI picture already. let's see
what a GL API call ends up in on linux/i386.

  app:          call glNewList@PLT
  app.plt:      jmp [glNewList@GOT]
  gl.dispatch:  mov eax,[_glapi_Dispatch]
                test eax,eax
                jz .get_dispatch
                jmp [eax+0x...]
                .get_dispatch:
                call _glapi_get_dispatch
                jmp [eax+0x...]

on the fast path, that's 6 insns, with no less than 3 memory accesses (potential
cache misses), 2 of which are indirect control flow changes (potential branch
misprediction). and all that just to get to the first insn of the actual GL API
(which will do its prologue/epilogue on top of doing real work, mind you). if
that's not an overkill then i don't know what is.

my patch adds 5 insns to this, 3 of which are memory accesses. one memory access
is the same as the old code (same potential cache miss), the other two are
guaranteed to be cached, one is the return address on the stack, the other is
the call insn executed just before the access. in other words, the execution
overhead in absolute terms (clock cycles) is minimal. where you can have a
measurable impact is when the absolut overhead is comparable to the given GL API
execution time itself. i recall ajax mentioned that some of them are very short
(in terms of asm insns), but those are also the APIs that already suffer the 6
insns/3 memory access 'custom assembly' that you had for this purpose (again, in
addition to the API's prologue/epilogue). adding 5 more will make it worse, but
it's already bad and can't be the reason for outright rejection. if you really
wanted to fix it, then you'd find a way for inlining such short API calls, that
will not only eliminate the API call overhead (including prologue/epilogue code)
but also help the compiler optimize register/memory accesses in the *caller* -
can't get better than that for performance.

> Given the choice between removing TEXTRELs and improving (or maintaining)
> performance, I will pick performance every time.

i think you don't realize what choice you're making (and i definitely disagree
that you should be making that choice for all users and deny them a textrel-free
X/GL). textrels are a subset of runtime code generation which itself is a
privilege that plays a very important role in security, in particular,
exploiting memory corruption based bugs (stack/heap overflow, integer handling,
etc). this privilege (which is granted to all userland on all contemporary OSs,
unless you use PaX) is the one that allows exploiting these bugs via executing
remotely injected code. therefore taking away this privilege prevents a huge
class of exploits from working, in practice you'll find that about 99.9...% of
exploits rely on this privilege. eliminating them for good is good for security,
that is, if you care about it. 

it so happens that there're quite a few people who do and yet they'd also like
to enjoy the usual benefits of their systems. one reason this privilege is
needed when some code has textrels, that's why we (in the hardened gentoo
project) have been actively eliminating them for some years now. X has been a
particularly painful exercise due to its elfloader, so we quickly switched to
dlloader (as far as i know, we were the first distro to use it), we supported
x.org to move to it by default (big thanks to ajax for his work), and we would
of course like to get it all right now that X is officially moving to the
dlloader. anyway, the thing is, be careful with your choices, voting for
performance gains (rather questionable ones at that, considering the above
discussion) and textrels will expose every single app using GL to the most
widely used exploit methods. i and many others chose differently, hence these
patches and our hope is that we can find a way to make them available in X (else
they'll continue to live in gentoo's portage as they did so far).

if you're still not convinced about the role of textrels, read Ulrich Drepper's
DSO howto (http://people.redhat.com/drepper/dsohowto.pdf). if even that's not
enough then try to convince him that glibc should be compiled without -fPIC for
extra performance, and see how far you get.

> > i think @NTPOFF is resolved at link time so that whole runtime relocation thing
> > can go and the stubs can directly reference %gs:_glapi_tls_Dispatch@NTPOFF.
> 
> Grep for 'wtext' in glapi_x86.S.

thanks, i missed that while looking at the code.

>  The run-time patching is done to inline the
> call to _x86_get_dispatch and thereby avoid the extra overhead.  Yes, there are
> applications that would see measurable performance degredation from leaving that
> as a call (rather than inline).

this is the same wrong performance argument that i discussed above. fix the
API/ABI properly if you truly care about performance.

> If I recall correctly, having the 'movl %gs:_glapi_tls_Dispatch@NTPOFF, %eax'
> inline at build-time had a significant impact on the disk size of libGL.so.

what's 'significant' here? and in any case, relocation data is used only once
during, well, relocation, the kernel's page cache will discard those pages
quickly and there's no impact otherwise. so this runtime generated code can be
eliminated.

Comment 9 Daniel Stone 2005-12-18 13:18:09 UTC

'pax team': it is too late for the release at this point; it is frozen hard. 
mesa is separate from xorg, so if this really needs to be fixed, then first of
all get your patch actually accepted by mesa (there's no point saying 'oh my
god, we absolutely cannot release without this fix, despite the fact this will
smash the release cycle and push the release into 2006' when no-one agrees on
how to fix it), and then co-ordinate with the mesa guys to get that into a mesa
release.  when all that's done, it will get into the next xorg release
automatically.  easy!

Comment 10 Allen Akin 2005-12-18 13:33:49 UTC

Just FYI, the reasons GL function calls aren't inlined are (1) the code executed
for a call depends on the current RenderMode, whether display list construction
is active, etc., so inlined code would have to include an expensive multiway
test; (2) the code executed for a call is device-dependent in the general case,
so there's no reliable way to know what to inline at compile time even for a
single "preferred" RenderMode.

The current way of doing things evolved to handle those problems, plus API
functions that are supported by some drivers but not others (a subject that may
be more complicated than we need to get into at the moment, so I'll leave it at
that).

Why are the fine-grained calls in the API in the first place?  To (1) reduce
cache pressure for apps that mix graphics-related and non-graphics data; (2)
keep data flowing from the CPU to the GPU so that parallelism is maximized. 
Newer apps tend to be designed differently, so don't use the fine-grained calls
heavily, but there are a lot of large older apps that really care about their
performance.

Carry on...

Allen

Comment 11 PaX Team 2005-12-19 10:11:59 UTC

(In reply to comment #9)
> 'pax team': it is too late for the release at this point; it is frozen hard. 
> mesa is separate from xorg, so if this really needs to be fixed, then first of
> all get your patch actually accepted by mesa

that's exactly what we've been trying to do since august. the mesa bugzilla is
here, this bug is assigned to them, and you saw all the responses on it... what
else are we expected to do? anyway, lack of interest means that i'll stick to
(hardened) gentoo from now on.

Comment 12 Ian Romanick 2005-12-19 12:19:00 UTC

(In reply to comment #8)

> (In reply to comment #7)
> > (In reply to comment #6)
> >  I also expect that it will kill the performance of quite a few apps.  There's a
> > reason that we use custom assembly for this part of the code, and this patch
> > seems to defeat most of that purpose.
> 
> if you *really* cared about performance (instead of trying to find an excuse)
> then you'd have looked hard at the whole GL API/ABI picture already. let's see
> what a GL API call ends up in on linux/i386.

First of all, you will attract many more bees with honey than with vinegar.

Second of all, OpenGL is an industry-wide defined application interface.  We
aren't at liberty to change it willy-nilly.

Third of all, people on the graphics industry care about two things: 
correctness of rendering and the time required to do it.

>   app:          call glNewList@PLT
>   app.plt:      jmp [glNewList@GOT]
>   gl.dispatch:  mov eax,[_glapi_Dispatch]
>                 test eax,eax
>                 jz .get_dispatch
>                 jmp [eax+0x...]
>                 .get_dispatch:
>                 call _glapi_get_dispatch
>                 jmp [eax+0x...]
> 
> on the fast path, that's 6 insns, with no less than 3 memory accesses (potential
> cache misses), 2 of which are indirect control flow changes (potential branch
> misprediction). and all that just to get to the first insn of the actual GL API
> (which will do its prologue/epilogue on top of doing real work, mind you). if
> that's not an overkill then i don't know what is.

This isn't the fast path.  The fast path is the TLS case.  What is your point?

> my patch adds 5 insns to this, 3 of which are memory accesses. one memory access
> is the same as the old code (same potential cache miss), the other two are

So, your patch roughly doubles the worst-case function call overhead.  There are
a number of drivers (e.g., the open-source R200 driver) where the ABI overhead
is *more* instructions than the actual function being called (e.g., the run-time
codegen immediate mode functions).  Doubling (or even increasing by 30%) that
overhead will make a measurable impact on industry standard benchmarks.  I can
guarnantee that people that care about graphics care way the hell more about
that they do about texrels.

> guaranteed to be cached, one is the return address on the stack, the other is
> the call insn executed just before the access. in other words, the execution
> overhead in absolute terms (clock cycles) is minimal. where you can have a
> measurable impact is when the absolut overhead is comparable to the given GL API
> execution time itself. i recall ajax mentioned that some of them are very short
> (in terms of asm insns), but those are also the APIs that already suffer the 6
> insns/3 memory access 'custom assembly' that you had for this purpose (again, in
> addition to the API's prologue/epilogue). adding 5 more will make it worse, but
> it's already bad and can't be the reason for outright rejection. if you really
> wanted to fix it, then you'd find a way for inlining such short API calls, that
> will not only eliminate the API call overhead (including prologue/epilogue code)

You clearly have zero understanding of the OpenGL ABI works on Linux.  We
*CAN'T* inline this stuff in the application.  There is no way to know at
compile time what the actual function will be!  This layer provides an level of
indirection similar to a C++ virtual function table.  Are you going to suggest
that G++ somehow inline virtual functions at compile time?  There's no way to
know what function will be called because:

1. It depends on the driver that is currently loaded.
2. It depends on the libGL being used (Nvidia's libGL handles some of these
cases very differently than we do).
3. It depends on the current GL state.

> > Given the choice between removing TEXTRELs and improving (or maintaining)
> > performance, I will pick performance every time.
> 
> i think you don't realize what choice you're making (and i definitely disagree
> that you should be making that choice for all users and deny them a textrel-free
> X/GL). textrels are a subset of runtime code generation which itself is a

I think that penalizing the majority of our users to satisfy a minority is a
mistake.  If some particular disto wants a textrel-free libGL, I welcome them to
build the C version of the API dispatch routines with -fPIC.

End of story.

Comment 13 PaX Team 2005-12-19 13:12:41 UTC

(In reply to comment #12)
> I think that penalizing the majority of our users to satisfy a minority is a
> mistake.  If some particular disto wants a textrel-free libGL, I welcome them to
> build the C version of the API dispatch routines with -fPIC.

i guess you missed the whole point of this bug entry. it wasn't at all about
"penalizing the majority", it was about making it possible for some "minority"
to have a secure X server (and applications, and eventually, a whole desktop).
when you learn to conduct intelligent discussions, we can get back to the topic.
btw, even virtual functions can be inlined - at runtime.

Comment 14 Declan Moriarty 2006-01-26 04:12:08 UTC

*** Bug 5710 has been marked as a duplicate of this bug. ***

Comment 15 Declan Moriarty 2006-01-26 21:01:34 UTC

(In reply to comment #14)
> *** Bug 5710 has been marked as a duplicate of this bug. *** The take 2 patch
works on mesa-6.4.1 also. In view of some of the vitriol above, I felt I should
outline a perspective of the user (not a programmer)of a hardened system.

Hardened systems are for resisting attackers, and the main application is
'always on' servers. As such, I have no business installing X, or DRI. It is
because mine is a learning exercise that these went in. Quake, kde, gnome, or
any other bug rich eye candy will never be installed. Not here. 
     When dri failed to install, I had 2 choices
          1. Forget DRI (Over the textrels)
          2. Recompile a kernel without CONFIG_PAX_NOELFRELOCS
This is such a major weakness that no hardened system would contemplate it for
long. Attackers intrude by exploiting some program; key to the continuance of
the attack is rewriting a pointer to  point at their own code. Relocating
position independent code with the kernel randomising location of the code makes
this latter step exceptionally difficult and noisy, as an attacker won't know
beforehand where anything is. No sitting ducks in ram.
     I will be quite happy with 75% of the performance of a gaming box, or 50%
of it. I don't ever want to see this adapted into the mainstream Mesa code
unless the various protections (Grsecurity, Pax, ssp, etc) are also adopted by
the mainstream development. But I would always like the patch available. I don't
care if it breaks with other boxes, or the API paper spec, as long as things work.

Comment 16 Kevin Day 2006-06-21 11:12:30 UTC

Created attachment 6009 [details] [review]
GLX_NO_TEXREL to enable/disable.

I modified patch: 3954: updated patch take 2.
The only difference here is that #ifdef GLX_NO_TEXREL was added to every single
case in the said patch.
This should allow for developers to explicitly pass -DGLX_NO_TEXREL
My hope here is that this can be accepted upstream such that those who donot
want NO_TEXREL security, will not get it by default, but those who do, can
simply pass -DGLX_NO_TEXREL.

I was careful not to apply any other changes in this patch, but to help push
this one upstream, I removed the configs/linux-dri-x86 portion of the patch as
that would also have to be left alone to be accepted upstream.

I wish for this to be re-opened so that it can be accepted upstream, but I
leave that task to those who started/continued this thread, if they think it is
worth trying to push this patch upstream.

Comment 17 Kevin Day 2006-06-21 12:22:55 UTC

Comment on attachment 6009 [details] [review]
GLX_NO_TEXREL to enable/disable.

This is royally screwed up.  Use the -2 version of this patch.

Comment 18 Kevin Day 2006-06-21 12:25:36 UTC

Created attachment 6010 [details] [review]
GLX_NO_TEXREL to enable/disable. (version 2)

This is the corrected version.
I screwed up the #ifdefs all over the place for some reason.
This does compile.

Comment 19 Adam Jackson 2009-08-24 12:23:21 UTC

Mass version move, cvs -> git

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.