Bug 6957 - multsum_f64 is not optimized
Summary: multsum_f64 is not optimized
Status: REOPENED
Alias: None
Product: liboil
Classification: Unclassified
Component: unknown (show other bugs)
Version: HEAD
Hardware: x86 (IA32) Linux (All)
: high enhancement
Assignee: David Schleef
QA Contact: David Schleef
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-18 04:03 UTC by Marcus Brubaker
Modified: 2008-02-22 17:16 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
multsum_f64_unroll8 and multsum_f64_sse2_unroll4 (4.55 KB, patch)
2006-05-18 04:04 UTC, Marcus Brubaker
Details | Splinter Review
multsum.patch (10.38 KB, patch)
2006-05-31 11:31 UTC, Marcus Brubaker
Details | Splinter Review

Description Marcus Brubaker 2006-05-18 04:03:22 UTC
Multsum_f64 has only a reference implementation.  Patch with two optimized
versions to be attached.

Producing other levels of loop unrolling is easily doable.  Is it a good idea to
add these other versions?
Comment 1 Marcus Brubaker 2006-05-18 04:04:30 UTC
Created attachment 5660 [details] [review]
multsum_f64_unroll8 and multsum_f64_sse2_unroll4
Comment 2 David Schleef 2006-05-19 16:57:33 UTC
Applied.
Comment 3 Marcus Brubaker 2006-05-31 11:31:24 UTC
Created attachment 5770 [details] [review]
multsum.patch

This is a new version of the patch against latest anoncvs.  It does two things.
 First, it fixes the previously broken SSE2 implementation.  This version
actually speeds things up notably.  Witness: 

multsum_f64
  multsum_f64_unroll8
    ave=576 std=1.14286
  multsum_f64_sse2_unrollb
    ave=573 std=1.14286
  multsum_f64_sse2_unrolla
    ave=568 std=1.14286
  multsum_f64_sse2
    ave=576 std=1.14286
  multsum_f64_ref
    ave=850.444 std=1.16741

Second, it introduces an unstrided version of multsum for f32 and f64 with SSE2
optimized versions.  Results:

multsum_f32_ns
  multsum_f32_ns_sse
    ave=330 std=11.3928
  multsum_f32_ns_ref
    ave=737 std=1.125

multsum_f64_ns
  multsum_f64_ns_sse2_unroll2
    ave=372 std=1.14286
  multsum_f64_ns_sse2_unroll
    ave=382 std=1.14286
  multsum_f64_ns_sse2
    ave=463.444 std=3.59953
  multsum_f64_ns_ref
    ave=734 std=1.125
Comment 4 Marcus Brubaker 2006-05-31 11:33:35 UTC
Reopening because of new patch.
Comment 5 Marcus Brubaker 2006-06-07 14:06:34 UTC
There appears to be a bug in the unrolled SSE2 versions.  It doesn't manifest it
on my main dev machine (a Pentium M laptop) but shows up on another machine (a
Pentium 4).  I'm trying to track it down now.  Will report back when it's resolved.
Comment 6 David Schleef 2008-02-18 18:24:55 UTC
Patch doesn't apply.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.