Attached is a patch that speeds up some operations in the Render extension by using gcc 3.4 MMX intrinsics. A benchmark rendering a paragraph of component alpha text to a pixmap gave these results on a 1200 MHz laptop with an i830 chip running Fedora Core I: Unmodified X server and the pixmap in system RAM: [ssp@localhost x]$ ./a.out total time: 41.394618 average rect time: 0.683200 worst rect: 9 average glyph time: 3.550500 with the MMX optimizations: [ssp@localhost x]$ ./a.out total time: 22.972553 average rect time: 0.677900 worst rect: 9 average glyph time: 1.692000 Ie., text rendering is more than twice as fast. The 'average glyph time' here is the time it takes to render the entire paragraph of text. With the pixmap in video RAM, the speedup is not quite as spectacular: Unmodified X server: [ssp@localhost x]$ ./a.out total time: 95.900768 average rect time: 0.003300 worst rect: 1 average glyph time: 9.693500 With MMXified compositing: total time: 66.559287 average rect time: 0.015100 worst rect: 6 average glyph time: 6.720500 But still a nice improvement. The patch includes improved code for these cases: Subpixel text: - (constant color) in (component alpha mask) over 565 destination - (constant color) in (component alpha mask) over 32bit destination - (32 bit component alpha) Saturate (32 bit destination) Regular antialiased text: - (8 bit alpha) Saturate (8 bit destination) - (constant color) in (8 bit alpha mask) over 565 destination - (constant color) in (8 bit alpha mask) over 32bit destination GdkPixbuf: - (reversed, non-premultiplied source) over 32bit destination - (reversed, non-premultiplied source) over 565 destination Alpha rectangle (e.g., Nautilus selection rectangle): - (constant color) over 32bit destination - (constant color) over 565 destination Solid fill - solid fill of 32 bit drawable - solid fill of 16 bit drawable The code can optionally be compiled to use the pshufw instruction, which is only available on pentium III. One question: The patch has a bad hack where it redefines DefaultCCOptions for all of the framebuffer code. How should this be done properly? The problem with the existing DefaultCCOptions is that they include -pedantic which doesn't work with the MMX intrinsics. Attaching the patch, two new files and the benchmark.
Created attachment 452 [details] fbmmx.c
Created attachment 453 [details] fbmmx.h
Created attachment 454 [details] [review] The patch
Created attachment 455 [details] The benchmark
Is there any particular reason why you didn't use mmintrin.h ?
> Is there any particular reason why you didn't use mmintrin.h ? Well, that file is not installed on my system.
(In reply to comment #6) > > Is there any particular reason why you didn't use mmintrin.h ? > > Well, that file is not installed on my system. I don't know how you could manage to install gcc without it's system headers; are you sure it's not in the gcc install dir? Try running the following: ls $(gcc -print-search-dirs | grep install | awk '{ print $2 }')/include/*intrin.h
Yeah, I got the file. I'll have to change my story then to "No particular reason, I just didn't know about the file". It looks like it contains a lot of useful stuff that I duplicated in fbmmx.c.
(In reply to comment #3) > Created an attachment (id=454) > The patch > Instead of surrounding all of the calls to the *mmx functions in fbpict.c with USE_GCC34_MMX why not just have fbHaveMMX() return 0 when USE_GCC34_MMX is not defined?
Hi Søren, This looks like a great patch, but I'm wondering, why not change things so there's an fb_asm.c (or whatever) which appropriately #includes another source file with the MMX code? This would make things cleaner for other people who want to do the same thing for other platforms (PowerPC's Altivec, IA64's multimedia instructions etc). On that note, I think the potential speed gains for text rendering will be even more significant on these other platforms, since GCC does a worse job there, and vendor-accelerated RENDER extensions (e.g. NVIDIA's) are nowhere to be seen.
> Instead of surrounding all of the calls to the *mmx functions in fbpict.c with > USE_GCC34_MMX why not just have fbHaveMMX() return 0 when USE_GCC34_MMX is not > defined? The problem doing that is thew all the MMX functions are not even defined if we don't have gcc 3.4 on i386. From Duraid Madina: > This looks like a great patch, but I'm wondering, why not change things so > there's an fb_asm.c (or whatever) which appropriately #includes another source > file with the MMX code? This would make things cleaner for other people who > want That might not be a bad idea, but I'd rather wait until someone actually creates optimizations for those other platforms.
The patch is committed now.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.