Summary: | Add support for aarch64 neon optimization | ||
---|---|---|---|
Product: | pixman | Reporter: | Mizuki Asakura <ed6e117f> |
Component: | pixman | Assignee: | Oded Gabbay <oded.gabbay> |
Status: | RESOLVED NOTABUG | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | git master | ||
Hardware: | ARM | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Proposed patch
Benchmark result without neon Benchmark result with neon Benchmark result of aarch32-neon Proposed patch v3 Proposed patch v4 Benchmark result of proposal patch v4 Patches containing bavison's optimization benchmark result of v4 + Ben's optimizations |
Description
Mizuki Asakura
2016-03-30 09:46:24 UTC
Created attachment 122634 [details] [review] Proposed patch Proposed patch. Created attachment 122635 [details]
Benchmark result without neon
Created attachment 122636 [details]
Benchmark result with neon
Typical benchmark score: normal: src_n_8_x888 = L1: 38.33 L2: 40.58 M: 39.91 ( 11.87%) HT: 31.31 VT: 30.42 R: 29.14 RT: 18.14 ( 171Kops/s) src_n_8_8888 = L1: 38.37 L2: 40.61 M: 39.92 ( 11.87%) HT: 31.30 VT: 30.41 R: 29.14 RT: 18.11 ( 171Kops/s) neon: src_n_8_x888 = L1: 344.76 L2: 348.59 M:275.93 ( 80.42%) HT:116.32 VT:109.72 R: 92.61 RT: 40.25 ( 348Kops/s) src_n_8_8888 = L1: 346.17 L2: 348.63 M:276.15 ( 80.48%) HT:116.43 VT:109.72 R: 92.48 RT: 40.28 ( 348Kops/s) additional note: above benchmarks are run on Qualcomm DragonBoard 410c (Cortex-A53*4, 1.2GHz). > We need new implementations.
The patch is not a "new implementations", but just a "converted codes" from original pixman-arm-neon-XXX.S.
Some architecture chages from aarch32 to aarch64 made overheads for this conversion. Especially, each neon registers are independent. Now v30 / v31 is not a low / high of v15.
But increasing independent registers may be useful for gaining more aarch64 specific optimizations. It should be the future plan.
Created attachment 122695 [details]
Benchmark result of aarch32-neon
Compiled with armeabihf with neon.
Attached benchmark result on same environment.
Created attachment 122816 [details] [review] Proposed patch v3 This patch contains Siarhei's optimizations. And also added a configuration flag for usage of cache-prefetching. Please check PREFETCH_MODE in pixman-arma64-neon-asm.h. Created attachment 122937 [details] [review] Proposed patch v4 Now the patch contains all nearest / bilinear implementations. bilinear codes are (almost) identical to original aarch32 implementations (but still need some modifications to omit registers conflictions). Created attachment 123008 [details]
Benchmark result of proposal patch v4
Almost idential to original result, but some improvements.
Created attachment 123009 [details] [review] Patches containing bavison's optimization I also tested the Ben's series of optimization on aarch64, and the result is impressively fine. Please also check the following benchmark result. Created attachment 123010 [details]
benchmark result of v4 + Ben's optimizations
Hi Mizuki, Thanks for the patches but you don't need to file a bz on it. I monitor the mailing list and I can see your patches :) I'm closing the bug, and let's continue this in the pixman mailing list. Thanks, Oded |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.