I have managed to run my kernel on iGPU using beignet under Linux, great job. At the same time I test the performance of the same kernel under windows. Following is the performance result for my kernel (deblocking filter in HEVC), the performance (time in seconds) was not obtained by binding event to kernel launching in OpenCL as it also depends on the OpenCL runtime implementation under windows and linux, instead, it was obtained by the host side CPU profiling utilities. H2D Kernel D2H Linux 1.95, 3.89, 1.56 Windows 6.74, 0.85, 1.44 I am not sure whether you use the same compiler to the windows OpenCL compiler, but the performance of kernel differs too much under these two operation systems (but with the same hardware). Also the host to device copy take much more time on Windows, can not figure out why. Any hints? my testbed configuration hardware: CPU: i5-4570R, iGPU (HD5200) OS: Win8.1 iGPU driver version 10.18.10.3960, latest INDE, Visual Studio 2013 Linux :14.04, kernel 3.13 Beignet Release v1.0 gcc 4.8.3
Functionally the kernel works, but I am curious why there is so much difference under windows and linux
Could you share your kernel here? Or you can share the LLVM IR? To get the LLVM IR and GEN IR,you can set the following environment variable before run your application: # export OCL_OUTPUT_LLVM_AFTER_GEN=1 # export OCL_OUTPUT_GEN_IR=1 Then run your application # ./test_app > ir.log Then you can paster the ir.log here.
Could you try to add the following code right before the for() loops in the kernel? #pragma unroll 1024
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/beignet/beignet/issues/72.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.