This is a patch to use special instructions on the Alpha architecture to speed
up rendering operations. It is based on the similar works for i386 using MMX
instructions (Bug 839). Instead of modifying fbmmx.c, I chose to reimplement
mmintrin.h for Alpha. This should already give noticeably better code for most
The patch is not quite finished, but I was hoping to get some feedback. In
particular, is there any documentation on the functions in fbmmx.c? For example
about expected alignment of arguments? I'm currently running into unaligned
accesses for example in fbCompositeSrcAdd_8888x8888mmx. Apparently i386 does
not care, but on the Alpha architecture, this incurs a trap to the OS, which
is why this patch doesn't really speed anything up yet...
Created attachment 1737 [details] [review]
Patch for using MVI instructions in rendering
i386 does not "not care" about unaligned access. They don't generate traps, but
they are much slower than aligned accesses. The code in
fbCompositeSrcAdd_8888x8888mmx() assumes that the drawables are aligned on 4
byte boundaries. Since the drawables are 32 bits per pixel, anything else would
Are you seeing drawables that are aligned on a 16 bit boundary?
(In reply to comment #2)
> i386 does not "not care" about unaligned access. They don't generate traps, but
> they are much slower than aligned accesses. The code in
> fbCompositeSrcAdd_8888x8888mmx() assumes that the drawables are aligned on 4
> byte boundaries. Since the drawables are 32 bits per pixel, anything else would
> be insane.
> Are you seeing drawables that are aligned on a 16 bit boundary?
No. However, it happens frequently that src and dst aren't co-aligned, that is,
while (w && (unsigned long)dst & 7)
will make dst 8-byte-aligned, but (src % 8) == 4. Is it possible to somehow
ensure co-alignment? Otherwise, I would have to compensate for this in the
source (and that would probably also be helpful for i386, depending on how
slow unaligned accesses there really are...)
Sorry about the phenomenal bug spam, guys. Adding xorg-team@ to the QA contact so bugs don't get lost in future.
I've been playing with implementing Alpha fast paths for pixman, but writing code using MVI instructions is very difficult as MVI is very limited. Basic operations such as addition need to be simulated.
To compound this, MVI instructions have a latency of 3 on EV6, which is awful (only a latency of 2 on PCA56) so to get good performance on both platforms separate code paths need to be written for each to reduce stalling.
Nothing to show yet. May not be worthwhile at all.
If you do write fast paths for Alpha, please file a new bug against pixman, or send mail to email@example.com.