Summary: | multsum_f64 is not optimized | ||
---|---|---|---|
Product: | liboil | Reporter: | Marcus Brubaker <aurelius.marcus> |
Component: | unknown | Assignee: | David Schleef <ds> |
Status: | REOPENED --- | QA Contact: | David Schleef <ds> |
Severity: | enhancement | ||
Priority: | high | CC: | aurelius.marcus, jdc+fd |
Version: | HEAD | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
multsum_f64_unroll8 and multsum_f64_sse2_unroll4
multsum.patch |
Description
Marcus Brubaker
2006-05-18 04:03:22 UTC
Created attachment 5660 [details] [review] multsum_f64_unroll8 and multsum_f64_sse2_unroll4 Applied. Created attachment 5770 [details] [review] multsum.patch This is a new version of the patch against latest anoncvs. It does two things. First, it fixes the previously broken SSE2 implementation. This version actually speeds things up notably. Witness: multsum_f64 multsum_f64_unroll8 ave=576 std=1.14286 multsum_f64_sse2_unrollb ave=573 std=1.14286 multsum_f64_sse2_unrolla ave=568 std=1.14286 multsum_f64_sse2 ave=576 std=1.14286 multsum_f64_ref ave=850.444 std=1.16741 Second, it introduces an unstrided version of multsum for f32 and f64 with SSE2 optimized versions. Results: multsum_f32_ns multsum_f32_ns_sse ave=330 std=11.3928 multsum_f32_ns_ref ave=737 std=1.125 multsum_f64_ns multsum_f64_ns_sse2_unroll2 ave=372 std=1.14286 multsum_f64_ns_sse2_unroll ave=382 std=1.14286 multsum_f64_ns_sse2 ave=463.444 std=3.59953 multsum_f64_ns_ref ave=734 std=1.125 Reopening because of new patch. There appears to be a bug in the unrolled SSE2 versions. It doesn't manifest it on my main dev machine (a Pentium M laptop) but shows up on another machine (a Pentium 4). I'm trying to track it down now. Will report back when it's resolved. Patch doesn't apply. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.