Since aarch64 has different neon syntax from aarch32 and has no support for (older) arm-simd, there are no SIMD accelerations for pixman on aarch64. We need new implementations.
Created attachment 122634 [details] [review] Proposed patch Proposed patch.
Created attachment 122635 [details] Benchmark result without neon
Created attachment 122636 [details] Benchmark result with neon
Typical benchmark score: normal: src_n_8_x888 = L1: 38.33 L2: 40.58 M: 39.91 ( 11.87%) HT: 31.31 VT: 30.42 R: 29.14 RT: 18.14 ( 171Kops/s) src_n_8_8888 = L1: 38.37 L2: 40.61 M: 39.92 ( 11.87%) HT: 31.30 VT: 30.41 R: 29.14 RT: 18.11 ( 171Kops/s) neon: src_n_8_x888 = L1: 344.76 L2: 348.59 M:275.93 ( 80.42%) HT:116.32 VT:109.72 R: 92.61 RT: 40.25 ( 348Kops/s) src_n_8_8888 = L1: 346.17 L2: 348.63 M:276.15 ( 80.48%) HT:116.43 VT:109.72 R: 92.48 RT: 40.28 ( 348Kops/s)
additional note: above benchmarks are run on Qualcomm DragonBoard 410c (Cortex-A53*4, 1.2GHz).
> We need new implementations. The patch is not a "new implementations", but just a "converted codes" from original pixman-arm-neon-XXX.S. Some architecture chages from aarch32 to aarch64 made overheads for this conversion. Especially, each neon registers are independent. Now v30 / v31 is not a low / high of v15. But increasing independent registers may be useful for gaining more aarch64 specific optimizations. It should be the future plan.
Created attachment 122695 [details] Benchmark result of aarch32-neon Compiled with armeabihf with neon. Attached benchmark result on same environment.
Created attachment 122816 [details] [review] Proposed patch v3 This patch contains Siarhei's optimizations. And also added a configuration flag for usage of cache-prefetching. Please check PREFETCH_MODE in pixman-arma64-neon-asm.h.
Created attachment 122937 [details] [review] Proposed patch v4 Now the patch contains all nearest / bilinear implementations. bilinear codes are (almost) identical to original aarch32 implementations (but still need some modifications to omit registers conflictions).
Created attachment 123008 [details] Benchmark result of proposal patch v4 Almost idential to original result, but some improvements.
Created attachment 123009 [details] [review] Patches containing bavison's optimization I also tested the Ben's series of optimization on aarch64, and the result is impressively fine. Please also check the following benchmark result.
Created attachment 123010 [details] benchmark result of v4 + Ben's optimizations
Hi Mizuki, Thanks for the patches but you don't need to file a bz on it. I monitor the mailing list and I can see your patches :) I'm closing the bug, and let's continue this in the pixman mailing list. Thanks, Oded
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.