Bug 94758 - Add support for aarch64 neon optimization
Summary: Add support for aarch64 neon optimization
Alias: None
Product: pixman
Classification: Unclassified
Component: pixman (show other bugs)
Version: git master
Hardware: ARM All
: medium normal
Assignee: Oded Gabbay
QA Contact:
Depends on:
Reported: 2016-03-30 09:46 UTC by Mizuki Asakura
Modified: 2016-04-17 11:40 UTC (History)
0 users

See Also:
i915 platform:
i915 features:

Proposed patch (236.35 KB, patch)
2016-03-30 09:48 UTC, Mizuki Asakura
Details | Splinter Review
Benchmark result without neon (47.20 KB, text/plain)
2016-03-30 09:55 UTC, Mizuki Asakura
Benchmark result with neon (47.21 KB, text/plain)
2016-03-30 09:56 UTC, Mizuki Asakura
Benchmark result of aarch32-neon (47.21 KB, text/plain)
2016-04-04 10:18 UTC, Mizuki Asakura
Proposed patch v3 (191.71 KB, patch)
2016-04-08 11:33 UTC, Mizuki Asakura
Details | Splinter Review
Proposed patch v4 (232.79 KB, patch)
2016-04-14 13:36 UTC, Mizuki Asakura
Details | Splinter Review
Benchmark result of proposal patch v4 (47.21 KB, text/plain)
2016-04-17 10:58 UTC, Mizuki Asakura
Patches containing bavison's optimization (239.12 KB, patch)
2016-04-17 11:28 UTC, Mizuki Asakura
Details | Splinter Review
benchmark result of v4 + Ben's optimizations (47.21 KB, text/plain)
2016-04-17 11:29 UTC, Mizuki Asakura

Description Mizuki Asakura 2016-03-30 09:46:24 UTC
Since aarch64 has different neon syntax from aarch32 and has no support for (older) arm-simd,
there are no SIMD accelerations for pixman on aarch64.

We need new implementations.
Comment 1 Mizuki Asakura 2016-03-30 09:48:38 UTC
Created attachment 122634 [details] [review]
Proposed patch

Proposed patch.
Comment 2 Mizuki Asakura 2016-03-30 09:55:59 UTC
Created attachment 122635 [details]
Benchmark result without neon
Comment 3 Mizuki Asakura 2016-03-30 09:56:29 UTC
Created attachment 122636 [details]
Benchmark result with neon
Comment 4 Mizuki Asakura 2016-03-30 09:59:33 UTC
Typical benchmark score:
            src_n_8_x888 =  L1:  38.33  L2:  40.58  M: 39.91 ( 11.87%)  HT: 31.31  VT: 30.42  R: 29.14  RT: 18.14 ( 171Kops/s)
            src_n_8_8888 =  L1:  38.37  L2:  40.61  M: 39.92 ( 11.87%)  HT: 31.30  VT: 30.41  R: 29.14  RT: 18.11 ( 171Kops/s)

            src_n_8_x888 =  L1: 344.76  L2: 348.59  M:275.93 ( 80.42%)  HT:116.32  VT:109.72  R: 92.61  RT: 40.25 ( 348Kops/s)
            src_n_8_8888 =  L1: 346.17  L2: 348.63  M:276.15 ( 80.48%)  HT:116.43  VT:109.72  R: 92.48  RT: 40.28 ( 348Kops/s)
Comment 5 Mizuki Asakura 2016-03-30 10:33:48 UTC
additional note:

above benchmarks are run on Qualcomm DragonBoard 410c (Cortex-A53*4, 1.2GHz).
Comment 6 Mizuki Asakura 2016-03-30 10:56:33 UTC
> We need new implementations.

The patch is not a "new implementations", but just a "converted codes" from original pixman-arm-neon-XXX.S.
Some architecture chages from aarch32 to aarch64 made overheads for this conversion. Especially, each neon registers are independent. Now v30 / v31 is not a low / high of v15.

But increasing independent registers may be useful for gaining more aarch64 specific optimizations. It should be the future plan.
Comment 7 Mizuki Asakura 2016-04-04 10:18:44 UTC
Created attachment 122695 [details]
Benchmark result of aarch32-neon

Compiled with armeabihf with neon.
Attached benchmark result on same environment.
Comment 8 Mizuki Asakura 2016-04-08 11:33:35 UTC
Created attachment 122816 [details] [review]
Proposed patch v3

This patch contains Siarhei's optimizations.

And also added a configuration flag for usage of cache-prefetching.
Please check PREFETCH_MODE in pixman-arma64-neon-asm.h.
Comment 9 Mizuki Asakura 2016-04-14 13:36:20 UTC
Created attachment 122937 [details] [review]
Proposed patch v4

Now the patch contains all nearest / bilinear implementations.
bilinear codes are (almost) identical to original aarch32 implementations
(but still need some modifications to omit registers conflictions).
Comment 10 Mizuki Asakura 2016-04-17 10:58:46 UTC
Created attachment 123008 [details]
Benchmark result of proposal patch v4

Almost idential to original result, but some improvements.
Comment 11 Mizuki Asakura 2016-04-17 11:28:32 UTC
Created attachment 123009 [details] [review]
Patches containing bavison's optimization

I also tested the Ben's series of optimization on aarch64,
and the result is impressively fine.

Please also check the following benchmark result.
Comment 12 Mizuki Asakura 2016-04-17 11:29:26 UTC
Created attachment 123010 [details]
benchmark result of v4 + Ben's optimizations
Comment 13 Oded Gabbay 2016-04-17 11:39:59 UTC
Hi Mizuki,
Thanks for the patches but you don't need to file a bz on it. I monitor the mailing list and I can see your patches :)
I'm closing the bug, and let's continue this in the pixman mailing list.



Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.