Bug 2357

Summary:	Speeding up render on Alpha with MVI instructions
Product:	pixman	Reporter:	Falk Hueffner <falk>
Component:	pixman	Assignee:	Xorg Project Team <xorg-team>
Status:	RESOLVED NOTABUG	QA Contact:	Xorg Project Team <xorg-team>
Severity:	normal
Priority:	high	CC:	ajax, mattst88, roland.mainz
Version:	other
Hardware:	Alpha
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:
Attachments:	Patch for using MVI instructions in rendering

Description Falk Hueffner 2005-01-22 13:11:17 UTC

This is a patch to use special instructions on the Alpha architecture to speed
up rendering operations. It is based on the similar works for i386 using MMX
instructions (Bug 839). Instead of modifying fbmmx.c, I chose to reimplement
mmintrin.h for Alpha. This should already give noticeably better code for most 
cases.

The patch is not quite finished, but I was hoping to get some feedback. In
particular, is there any documentation on the functions in fbmmx.c? For example
about expected alignment of arguments? I'm currently running into unaligned
accesses for example in fbCompositeSrcAdd_8888x8888mmx. Apparently i386 does
not care, but on the Alpha architecture, this incurs a trap to the OS, which
is why this patch doesn't really speed anything up yet...

Comment 1 Falk Hueffner 2005-01-22 13:12:58 UTC

Created attachment 1737 [details] [review]
Patch for using MVI instructions in rendering

Comment 2 Søren Sandmann Pedersen 2005-01-22 16:43:22 UTC

i386 does not "not care" about unaligned access. They don't generate traps, but
they are much slower than aligned accesses. The code in
fbCompositeSrcAdd_8888x8888mmx() assumes that the drawables are aligned on 4
byte boundaries. Since the drawables are 32 bits per pixel, anything else would
be insane.

Are you seeing drawables that are aligned on a 16 bit boundary?

Comment 3 Falk Hueffner 2005-01-23 13:34:51 UTC

(In reply to comment #2)
> i386 does not "not care" about unaligned access. They don't generate traps, but
> they are much slower than aligned accesses. The code in
> fbCompositeSrcAdd_8888x8888mmx() assumes that the drawables are aligned on 4
> byte boundaries. Since the drawables are 32 bits per pixel, anything else would
> be insane.
> 
> Are you seeing drawables that are aligned on a 16 bit boundary?

No. However, it happens frequently that src and dst aren't co-aligned, that is,
    while (w && (unsigned long)dst & 7)
will make dst 8-byte-aligned, but (src % 8) == 4. Is it possible to somehow
ensure co-alignment? Otherwise, I would have to compensate for this in the
source (and that would probably also be helpful for i386, depending on how
slow unaligned accesses there really are...)

Comment 4 Daniel Stone 2007-02-27 01:25:12 UTC

Sorry about the phenomenal bug spam, guys.  Adding xorg-team@ to the QA contact so bugs don't get lost in future.

Comment 5 Matt Turner 2009-01-31 10:11:13 UTC

I've been playing with implementing Alpha fast paths for pixman, but writing code using MVI instructions is very difficult as MVI is very limited. Basic operations such as addition need to be simulated.

To compound this, MVI instructions have a latency of 3 on EV6, which is awful (only a latency of 2 on PCA56) so to get good performance on both platforms separate code paths need to be written for each to reduce stalling.

Nothing to show yet. May not be worthwhile at all.

Comment 6 Søren Sandmann Pedersen 2009-06-05 03:36:55 UTC

If you do write fast paths for Alpha, please file a new bug against pixman, or send mail to cairo@cairographics.org.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.