This is a patch to use special instructions on the Alpha architecture to speed up rendering operations. It is based on the similar works for i386 using MMX instructions (Bug 839). Instead of modifying fbmmx.c, I chose to reimplement mmintrin.h for Alpha. This should already give noticeably better code for most cases. The patch is not quite finished, but I was hoping to get some feedback. In particular, is there any documentation on the functions in fbmmx.c? For example about expected alignment of arguments? I'm currently running into unaligned accesses for example in fbCompositeSrcAdd_8888x8888mmx. Apparently i386 does not care, but on the Alpha architecture, this incurs a trap to the OS, which is why this patch doesn't really speed anything up yet...
Created attachment 1737 [details] [review] Patch for using MVI instructions in rendering
i386 does not "not care" about unaligned access. They don't generate traps, but they are much slower than aligned accesses. The code in fbCompositeSrcAdd_8888x8888mmx() assumes that the drawables are aligned on 4 byte boundaries. Since the drawables are 32 bits per pixel, anything else would be insane. Are you seeing drawables that are aligned on a 16 bit boundary?
(In reply to comment #2) > i386 does not "not care" about unaligned access. They don't generate traps, but > they are much slower than aligned accesses. The code in > fbCompositeSrcAdd_8888x8888mmx() assumes that the drawables are aligned on 4 > byte boundaries. Since the drawables are 32 bits per pixel, anything else would > be insane. > > Are you seeing drawables that are aligned on a 16 bit boundary? No. However, it happens frequently that src and dst aren't co-aligned, that is, while (w && (unsigned long)dst & 7) will make dst 8-byte-aligned, but (src % 8) == 4. Is it possible to somehow ensure co-alignment? Otherwise, I would have to compensate for this in the source (and that would probably also be helpful for i386, depending on how slow unaligned accesses there really are...)
Sorry about the phenomenal bug spam, guys. Adding xorg-team@ to the QA contact so bugs don't get lost in future.
I've been playing with implementing Alpha fast paths for pixman, but writing code using MVI instructions is very difficult as MVI is very limited. Basic operations such as addition need to be simulated. To compound this, MVI instructions have a latency of 3 on EV6, which is awful (only a latency of 2 on PCA56) so to get good performance on both platforms separate code paths need to be written for each to reduce stalling. Nothing to show yet. May not be worthwhile at all.
If you do write fast paths for Alpha, please file a new bug against pixman, or send mail to cairo@cairographics.org.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.