Bug 89307 - Same kernel but huge performance difference under linux and windows
Summary: Same kernel but huge performance difference under linux and windows
Alias: None
Product: Beignet
Classification: Unclassified
Component: Beignet (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Zhigang Gong
QA Contact:
Depends on:
Reported: 2015-02-24 22:50 UTC by wangbiaouestc
Modified: 2018-10-12 21:27 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Description wangbiaouestc 2015-02-24 22:50:14 UTC
I have managed to run my kernel on iGPU using beignet under Linux, great job.
At the same time I test the performance of the same kernel under windows.
Following is the performance result for my kernel (deblocking filter in HEVC), the performance (time in seconds) was not obtained by binding event to kernel launching in OpenCL as it also depends on the OpenCL runtime implementation under windows and linux, instead, it was obtained by the host side CPU profiling utilities.

                      H2D     Kernel     D2H
Linux             1.95,    3.89,        1.56
Windows       6.74,    0.85,        1.44

I am not sure whether you use the same compiler  to the windows OpenCL compiler, but the performance of kernel differs too much under these two operation systems (but with the same hardware). Also the host to device copy take much more time on Windows, can not figure out why. 
Any hints?

my testbed configuration 

    CPU: i5-4570R,  iGPU (HD5200)

OS: Win8.1 

    iGPU driver version, latest INDE, Visual Studio 2013

Linux :14.04,

    kernel 3.13
    Beignet Release v1.0
    gcc 4.8.3
Comment 1 wangbiaouestc 2015-02-24 22:52:02 UTC
Functionally the kernel works, but I am curious why there is so much difference under windows and linux
Comment 2 Zhigang Gong 2015-02-25 06:29:14 UTC
Could you share your kernel here? Or you can share the LLVM IR?

To get the LLVM IR and GEN IR,you can set the following environment variable before run your application:

# export OCL_OUTPUT_GEN_IR=1

Then run your application
# ./test_app > ir.log

Then you can paster the ir.log here.
Comment 3 Zhigang Gong 2015-02-28 02:58:43 UTC
Could you try to add the following code right before the for() loops in the kernel?
#pragma unroll 1024
Comment 4 GitLab Migration User 2018-10-12 21:27:13 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/beignet/beignet/issues/72.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.