Summary: | Use SSE2/MMX for fbFetch_x8r8g8b8() | ||
---|---|---|---|
Product: | pixman | Reporter: | Steve Snyder <fdt0317.20.swsnyder> |
Component: | pixman | Assignee: | Søren Sandmann Pedersen <soren.sandmann> |
Status: | RESOLVED FIXED | QA Contact: | Søren Sandmann Pedersen <soren.sandmann> |
Severity: | normal | ||
Priority: | medium | ||
Version: | other | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | Use SSE2/MMX to copy ARGB pixels, setting alpha to 0xFF |
Description
Steve Snyder
2009-03-17 07:04:07 UTC
A few random notes: 1. I am aware that the SSE2/SSE/MMX code should be in file pixman-[simd].c, but that requires architectural decisions that I didn't want to make. The patch is a drop-in enhancement that doesn't require runtime analysis of CPU capabilities. 2. MSVC8 generates really bad code for MMX (a lot of superfluous copies between registers). Using x86 ASM instead of compiler intrinsics would yield an improvement. But then you'd have the headache of AT&T/Microsoft syntax to deal with. 3. Overall performance would greatly benefit from *guaranteeing* 16-byte alignment of both source and destination buffers. 4. The other fbFetch_xxxxxx{} routines in pixman-access.c would likely benefit from SIMD handling as well, but I didn't see much use of them in my environment. A couple of comments - You may want to find out *why* you are seeing so much time spent in fbFetch_a8r8g8b8(). If this activity only comes from a few operations, maybe a better approach is to write a special-case fast path for those? - If this activity is actually coming from transformed compositing, then maybe the patch that Andre Tupinamba recently posted to cairo-list could help. - Any use of MMX and SSE should go into the appropriate files. It is important that people are able to build the library such that any special CPU features are runtime detected. Edited to correct function name. It is actually fbFetch_x8r8g8b8() that the patch addresses, not fbFetch_a8r8g8b8(). To move forward here, we need a way for pixman_implementations to plug in fetchers to the general code. Ie., implementations will have to be involved when images are created and be given the opportunity to set the fetch_scanline variations to something CPU specific. Not only will that allow this and the a8 SSE2 fetchers, it will also make it possible to have CPU specific gradients and bilinear filters. Fixed in master. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.