find attached a patch that applies to cvs head in xorg/xc/extras/Mesa/src/mesa/x86/ ... this was last tested against Gentoo xorg-x11-6.8.2-r2 on x86, but the patch applies pretty cleanly to cvs head the patch (by the PaX team afaik) updates the x86 asm to be PIC friendly so that we dont end up with TEXTRELs in libGL.so the only file that changed was extras/Mesa/src/mesa/x86/glapi_x86.S to include TLS support, but a quick check looks like the TLS code should be PIC friendly so it does not need additional work
Created attachment 2998 [details] [review] xorg-x11-pic-notextrel.patch
Created attachment 3003 [details] [review] fixed version of the mesa pic patch oops, this was the buggy patch, don't apply it (it changes the stub size that will quickly make GL apps crash when the dispatcher indexes into the stub array)! instead look at this new one, it should get things right. also it changes gl_x86_asm.py that generates glapi_x86.S, which is the right way, i guess. note a trick in get_dispatchbase: it assumes that it is called by relative 'call' insn (which is a given in this case since it's called only from the stubs) so that it can extract the 4 byte relative offset from it and compute its own position in memory, without having to do another round of call/pop or call/mov/retn.
this should probably be assigned to upstream Mesa.
patch doesn't apply against 6.4, haven't checked if that's a trivial failure or not yet.
Created attachment 3951 [details] [review] updated patch here's a quick rediff against the standalone Mesa-6.4 package. note that i couldn't find the gl_x86_asm.py script so its patch is omitted, whoever has it will have to update it per the changes in glapi_x86.S.
Created attachment 3954 [details] [review] updated patch take 2 ignore the previous one, it had some bugs (blind forward port wasn't a good idea ;-). also, there're much more textrels due to -fPIC being explicitly omitted from the x86 DRI makefiles, i fixed that by patching configs/linux-dri-x86 but i'm not sure if it's the correct way. on a sidenote, i don't see how GLX_USE_TLS stub handling can work at all, for some reason it wants to 'relocate' the stubs at runtime but they're in .text which is not marked as writable (neither in the ELF program header nor via mprotect(). not that if it could work with an enforcement like PaX, mind you. the more important question is, why that runtime patching is needed at all. i think @NTPOFF is resolved at link time so that whole runtime relocation thing can go and the stubs can directly reference %gs:_glapi_tls_Dispatch@NTPOFF.
(In reply to comment #6) FYI, gl_x86_asm.py is located in src/mesa/glapi. > ignore the previous one, it had some bugs (blind forward port wasn't a good > idea ;-). also, there're much more textrels due to -fPIC being explicitly > omitted from the x86 DRI makefiles, i fixed that by patching > configs/linux-dri-x86 but i'm not sure if it's the correct way. At the very least, this patch needs some conditionals to disable it. I suspect that a lot of this will break on other platforms (e.g., Windows, cygwin, etc.). I also expect that it will kill the performance of quite a few apps. There's a reason that we use custom assembly for this part of the code, and this patch seems to defeat most of that purpose. Given the choice between removing TEXTRELs and improving (or maintaining) performance, I will pick performance every time. > on a sidenote, i don't see how GLX_USE_TLS stub handling can work at all, for > some reason it wants to 'relocate' the stubs at runtime but they're in .text > which is not marked as writable (neither in the ELF program header nor via > mprotect(). not that if it could work with an enforcement like PaX, mind you. > the more important question is, why that runtime patching is needed at all. i > think @NTPOFF is resolved at link time so that whole runtime relocation thing > can go and the stubs can directly reference %gs:_glapi_tls_Dispatch@NTPOFF. Grep for 'wtext' in glapi_x86.S. The run-time patching is done to inline the call to _x86_get_dispatch and thereby avoid the extra overhead. Yes, there are applications that would see measurable performance degredation from leaving that as a call (rather than inline). If I recall correctly, having the 'movl %gs:_glapi_tls_Dispatch@NTPOFF, %eax' inline at build-time had a significant impact on the disk size of libGL.so.
first of all, to daniel@freedesktop.org: please do not remove the dependency of 1690, it was an explicit request from ajax back then. second, if you do remove it, then at least give some food for thought for discussion. mine's below, regarding the importance of text relocations. (In reply to comment #7) > (In reply to comment #6) > > FYI, gl_x86_asm.py is located in src/mesa/glapi. i can't find it in my copy of MesaLib-6.4.tar.bz2 (md5sum: 85a84e47a3f718f752f306b9e0954ef6), else i would have modified it as i did before. > > ignore the previous one, it had some bugs (blind forward port wasn't a good > > idea ;-). also, there're much more textrels due to -fPIC being explicitly > > omitted from the x86 DRI makefiles, i fixed that by patching > > configs/linux-dri-x86 but i'm not sure if it's the correct way. > > At the very least, this patch needs some conditionals to disable it. > I suspect that a lot of this will break on other platforms (e.g., Windows, cygwin, etc.). some of the patch explicitly depends on __PIC__ already, mmx_blend* and glapi_x86.S don't. i'll modify mmx_blend* to accomodate non-PIC capable platforms if that code can be used on them(?), but i'm unsure about glapi_x86.S as it already has an explicit ifndef __WIN32__ and DJGPP's Makefile.DJ doesn't use it at all. are there any other x86 platforms that use this file but are not capable of PIC? > I also expect that it will kill the performance of quite a few apps. There's a > reason that we use custom assembly for this part of the code, and this patch > seems to defeat most of that purpose. if you *really* cared about performance (instead of trying to find an excuse) then you'd have looked hard at the whole GL API/ABI picture already. let's see what a GL API call ends up in on linux/i386. app: call glNewList@PLT app.plt: jmp [glNewList@GOT] gl.dispatch: mov eax,[_glapi_Dispatch] test eax,eax jz .get_dispatch jmp [eax+0x...] .get_dispatch: call _glapi_get_dispatch jmp [eax+0x...] on the fast path, that's 6 insns, with no less than 3 memory accesses (potential cache misses), 2 of which are indirect control flow changes (potential branch misprediction). and all that just to get to the first insn of the actual GL API (which will do its prologue/epilogue on top of doing real work, mind you). if that's not an overkill then i don't know what is. my patch adds 5 insns to this, 3 of which are memory accesses. one memory access is the same as the old code (same potential cache miss), the other two are guaranteed to be cached, one is the return address on the stack, the other is the call insn executed just before the access. in other words, the execution overhead in absolute terms (clock cycles) is minimal. where you can have a measurable impact is when the absolut overhead is comparable to the given GL API execution time itself. i recall ajax mentioned that some of them are very short (in terms of asm insns), but those are also the APIs that already suffer the 6 insns/3 memory access 'custom assembly' that you had for this purpose (again, in addition to the API's prologue/epilogue). adding 5 more will make it worse, but it's already bad and can't be the reason for outright rejection. if you really wanted to fix it, then you'd find a way for inlining such short API calls, that will not only eliminate the API call overhead (including prologue/epilogue code) but also help the compiler optimize register/memory accesses in the *caller* - can't get better than that for performance. > Given the choice between removing TEXTRELs and improving (or maintaining) > performance, I will pick performance every time. i think you don't realize what choice you're making (and i definitely disagree that you should be making that choice for all users and deny them a textrel-free X/GL). textrels are a subset of runtime code generation which itself is a privilege that plays a very important role in security, in particular, exploiting memory corruption based bugs (stack/heap overflow, integer handling, etc). this privilege (which is granted to all userland on all contemporary OSs, unless you use PaX) is the one that allows exploiting these bugs via executing remotely injected code. therefore taking away this privilege prevents a huge class of exploits from working, in practice you'll find that about 99.9...% of exploits rely on this privilege. eliminating them for good is good for security, that is, if you care about it. it so happens that there're quite a few people who do and yet they'd also like to enjoy the usual benefits of their systems. one reason this privilege is needed when some code has textrels, that's why we (in the hardened gentoo project) have been actively eliminating them for some years now. X has been a particularly painful exercise due to its elfloader, so we quickly switched to dlloader (as far as i know, we were the first distro to use it), we supported x.org to move to it by default (big thanks to ajax for his work), and we would of course like to get it all right now that X is officially moving to the dlloader. anyway, the thing is, be careful with your choices, voting for performance gains (rather questionable ones at that, considering the above discussion) and textrels will expose every single app using GL to the most widely used exploit methods. i and many others chose differently, hence these patches and our hope is that we can find a way to make them available in X (else they'll continue to live in gentoo's portage as they did so far). if you're still not convinced about the role of textrels, read Ulrich Drepper's DSO howto (http://people.redhat.com/drepper/dsohowto.pdf). if even that's not enough then try to convince him that glibc should be compiled without -fPIC for extra performance, and see how far you get. > > i think @NTPOFF is resolved at link time so that whole runtime relocation thing > > can go and the stubs can directly reference %gs:_glapi_tls_Dispatch@NTPOFF. > > Grep for 'wtext' in glapi_x86.S. thanks, i missed that while looking at the code. > The run-time patching is done to inline the > call to _x86_get_dispatch and thereby avoid the extra overhead. Yes, there are > applications that would see measurable performance degredation from leaving that > as a call (rather than inline). this is the same wrong performance argument that i discussed above. fix the API/ABI properly if you truly care about performance. > If I recall correctly, having the 'movl %gs:_glapi_tls_Dispatch@NTPOFF, %eax' > inline at build-time had a significant impact on the disk size of libGL.so. what's 'significant' here? and in any case, relocation data is used only once during, well, relocation, the kernel's page cache will discard those pages quickly and there's no impact otherwise. so this runtime generated code can be eliminated.
'pax team': it is too late for the release at this point; it is frozen hard. mesa is separate from xorg, so if this really needs to be fixed, then first of all get your patch actually accepted by mesa (there's no point saying 'oh my god, we absolutely cannot release without this fix, despite the fact this will smash the release cycle and push the release into 2006' when no-one agrees on how to fix it), and then co-ordinate with the mesa guys to get that into a mesa release. when all that's done, it will get into the next xorg release automatically. easy!
Just FYI, the reasons GL function calls aren't inlined are (1) the code executed for a call depends on the current RenderMode, whether display list construction is active, etc., so inlined code would have to include an expensive multiway test; (2) the code executed for a call is device-dependent in the general case, so there's no reliable way to know what to inline at compile time even for a single "preferred" RenderMode. The current way of doing things evolved to handle those problems, plus API functions that are supported by some drivers but not others (a subject that may be more complicated than we need to get into at the moment, so I'll leave it at that). Why are the fine-grained calls in the API in the first place? To (1) reduce cache pressure for apps that mix graphics-related and non-graphics data; (2) keep data flowing from the CPU to the GPU so that parallelism is maximized. Newer apps tend to be designed differently, so don't use the fine-grained calls heavily, but there are a lot of large older apps that really care about their performance. Carry on... Allen
(In reply to comment #9) > 'pax team': it is too late for the release at this point; it is frozen hard. > mesa is separate from xorg, so if this really needs to be fixed, then first of > all get your patch actually accepted by mesa that's exactly what we've been trying to do since august. the mesa bugzilla is here, this bug is assigned to them, and you saw all the responses on it... what else are we expected to do? anyway, lack of interest means that i'll stick to (hardened) gentoo from now on.
(In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #6) > > I also expect that it will kill the performance of quite a few apps. There's a > > reason that we use custom assembly for this part of the code, and this patch > > seems to defeat most of that purpose. > > if you *really* cared about performance (instead of trying to find an excuse) > then you'd have looked hard at the whole GL API/ABI picture already. let's see > what a GL API call ends up in on linux/i386. First of all, you will attract many more bees with honey than with vinegar. Second of all, OpenGL is an industry-wide defined application interface. We aren't at liberty to change it willy-nilly. Third of all, people on the graphics industry care about two things: correctness of rendering and the time required to do it. > app: call glNewList@PLT > app.plt: jmp [glNewList@GOT] > gl.dispatch: mov eax,[_glapi_Dispatch] > test eax,eax > jz .get_dispatch > jmp [eax+0x...] > .get_dispatch: > call _glapi_get_dispatch > jmp [eax+0x...] > > on the fast path, that's 6 insns, with no less than 3 memory accesses (potential > cache misses), 2 of which are indirect control flow changes (potential branch > misprediction). and all that just to get to the first insn of the actual GL API > (which will do its prologue/epilogue on top of doing real work, mind you). if > that's not an overkill then i don't know what is. This isn't the fast path. The fast path is the TLS case. What is your point? > my patch adds 5 insns to this, 3 of which are memory accesses. one memory access > is the same as the old code (same potential cache miss), the other two are So, your patch roughly doubles the worst-case function call overhead. There are a number of drivers (e.g., the open-source R200 driver) where the ABI overhead is *more* instructions than the actual function being called (e.g., the run-time codegen immediate mode functions). Doubling (or even increasing by 30%) that overhead will make a measurable impact on industry standard benchmarks. I can guarnantee that people that care about graphics care way the hell more about that they do about texrels. > guaranteed to be cached, one is the return address on the stack, the other is > the call insn executed just before the access. in other words, the execution > overhead in absolute terms (clock cycles) is minimal. where you can have a > measurable impact is when the absolut overhead is comparable to the given GL API > execution time itself. i recall ajax mentioned that some of them are very short > (in terms of asm insns), but those are also the APIs that already suffer the 6 > insns/3 memory access 'custom assembly' that you had for this purpose (again, in > addition to the API's prologue/epilogue). adding 5 more will make it worse, but > it's already bad and can't be the reason for outright rejection. if you really > wanted to fix it, then you'd find a way for inlining such short API calls, that > will not only eliminate the API call overhead (including prologue/epilogue code) You clearly have zero understanding of the OpenGL ABI works on Linux. We *CAN'T* inline this stuff in the application. There is no way to know at compile time what the actual function will be! This layer provides an level of indirection similar to a C++ virtual function table. Are you going to suggest that G++ somehow inline virtual functions at compile time? There's no way to know what function will be called because: 1. It depends on the driver that is currently loaded. 2. It depends on the libGL being used (Nvidia's libGL handles some of these cases very differently than we do). 3. It depends on the current GL state. > > Given the choice between removing TEXTRELs and improving (or maintaining) > > performance, I will pick performance every time. > > i think you don't realize what choice you're making (and i definitely disagree > that you should be making that choice for all users and deny them a textrel-free > X/GL). textrels are a subset of runtime code generation which itself is a I think that penalizing the majority of our users to satisfy a minority is a mistake. If some particular disto wants a textrel-free libGL, I welcome them to build the C version of the API dispatch routines with -fPIC. End of story.
(In reply to comment #12) > I think that penalizing the majority of our users to satisfy a minority is a > mistake. If some particular disto wants a textrel-free libGL, I welcome them to > build the C version of the API dispatch routines with -fPIC. i guess you missed the whole point of this bug entry. it wasn't at all about "penalizing the majority", it was about making it possible for some "minority" to have a secure X server (and applications, and eventually, a whole desktop). when you learn to conduct intelligent discussions, we can get back to the topic. btw, even virtual functions can be inlined - at runtime.
*** Bug 5710 has been marked as a duplicate of this bug. ***
(In reply to comment #14) > *** Bug 5710 has been marked as a duplicate of this bug. *** The take 2 patch works on mesa-6.4.1 also. In view of some of the vitriol above, I felt I should outline a perspective of the user (not a programmer)of a hardened system. Hardened systems are for resisting attackers, and the main application is 'always on' servers. As such, I have no business installing X, or DRI. It is because mine is a learning exercise that these went in. Quake, kde, gnome, or any other bug rich eye candy will never be installed. Not here. When dri failed to install, I had 2 choices 1. Forget DRI (Over the textrels) 2. Recompile a kernel without CONFIG_PAX_NOELFRELOCS This is such a major weakness that no hardened system would contemplate it for long. Attackers intrude by exploiting some program; key to the continuance of the attack is rewriting a pointer to point at their own code. Relocating position independent code with the kernel randomising location of the code makes this latter step exceptionally difficult and noisy, as an attacker won't know beforehand where anything is. No sitting ducks in ram. I will be quite happy with 75% of the performance of a gaming box, or 50% of it. I don't ever want to see this adapted into the mainstream Mesa code unless the various protections (Grsecurity, Pax, ssp, etc) are also adopted by the mainstream development. But I would always like the patch available. I don't care if it breaks with other boxes, or the API paper spec, as long as things work.
Created attachment 6009 [details] [review] GLX_NO_TEXREL to enable/disable. I modified patch: 3954: updated patch take 2. The only difference here is that #ifdef GLX_NO_TEXREL was added to every single case in the said patch. This should allow for developers to explicitly pass -DGLX_NO_TEXREL My hope here is that this can be accepted upstream such that those who donot want NO_TEXREL security, will not get it by default, but those who do, can simply pass -DGLX_NO_TEXREL. I was careful not to apply any other changes in this patch, but to help push this one upstream, I removed the configs/linux-dri-x86 portion of the patch as that would also have to be left alone to be accepted upstream. I wish for this to be re-opened so that it can be accepted upstream, but I leave that task to those who started/continued this thread, if they think it is worth trying to push this patch upstream.
Comment on attachment 6009 [details] [review] GLX_NO_TEXREL to enable/disable. This is royally screwed up. Use the -2 version of this patch.
Created attachment 6010 [details] [review] GLX_NO_TEXREL to enable/disable. (version 2) This is the corrected version. I screwed up the #ifdefs all over the place for some reason. This does compile.
Mass version move, cvs -> git
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.