Since aarch64 has different neon syntax from aarch32 and has no support for (older) arm-simd,
there are no SIMD accelerations for pixman on aarch64.
We need new implementations.
Created attachment 122634 [details] [review]
Created attachment 122635 [details]
Benchmark result without neon
Created attachment 122636 [details]
Benchmark result with neon
Typical benchmark score:
src_n_8_x888 = L1: 38.33 L2: 40.58 M: 39.91 ( 11.87%) HT: 31.31 VT: 30.42 R: 29.14 RT: 18.14 ( 171Kops/s)
src_n_8_8888 = L1: 38.37 L2: 40.61 M: 39.92 ( 11.87%) HT: 31.30 VT: 30.41 R: 29.14 RT: 18.11 ( 171Kops/s)
src_n_8_x888 = L1: 344.76 L2: 348.59 M:275.93 ( 80.42%) HT:116.32 VT:109.72 R: 92.61 RT: 40.25 ( 348Kops/s)
src_n_8_8888 = L1: 346.17 L2: 348.63 M:276.15 ( 80.48%) HT:116.43 VT:109.72 R: 92.48 RT: 40.28 ( 348Kops/s)
above benchmarks are run on Qualcomm DragonBoard 410c (Cortex-A53*4, 1.2GHz).
> We need new implementations.
The patch is not a "new implementations", but just a "converted codes" from original pixman-arm-neon-XXX.S.
Some architecture chages from aarch32 to aarch64 made overheads for this conversion. Especially, each neon registers are independent. Now v30 / v31 is not a low / high of v15.
But increasing independent registers may be useful for gaining more aarch64 specific optimizations. It should be the future plan.
Created attachment 122695 [details]
Benchmark result of aarch32-neon
Compiled with armeabihf with neon.
Attached benchmark result on same environment.
Created attachment 122816 [details] [review]
Proposed patch v3
This patch contains Siarhei's optimizations.
And also added a configuration flag for usage of cache-prefetching.
Please check PREFETCH_MODE in pixman-arma64-neon-asm.h.
Created attachment 122937 [details] [review]
Proposed patch v4
Now the patch contains all nearest / bilinear implementations.
bilinear codes are (almost) identical to original aarch32 implementations
(but still need some modifications to omit registers conflictions).
Created attachment 123008 [details]
Benchmark result of proposal patch v4
Almost idential to original result, but some improvements.
Created attachment 123009 [details] [review]
Patches containing bavison's optimization
I also tested the Ben's series of optimization on aarch64,
and the result is impressively fine.
Please also check the following benchmark result.
Created attachment 123010 [details]
benchmark result of v4 + Ben's optimizations
Thanks for the patches but you don't need to file a bz on it. I monitor the mailing list and I can see your patches :)
I'm closing the bug, and let's continue this in the pixman mailing list.