89307 – Same kernel but huge performance difference under linux and windows

Bug 89307 - Same kernel but huge performance difference under linux and windows

Summary: Same kernel but huge performance difference under linux and windows

Status:	RESOLVED MOVED

Alias:	None

Product:	Beignet
Classification:	Unclassified
Component:	Beignet (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) All

Importance:	medium normal
Assignee:	Zhigang Gong
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-02-24 22:50 UTC by wangbiaouestc
Modified:	2018-10-12 21:27 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments

Description wangbiaouestc 2015-02-24 22:50:14 UTC

I have managed to run my kernel on iGPU using beignet under Linux, great job.
At the same time I test the performance of the same kernel under windows.
Following is the performance result for my kernel (deblocking filter in HEVC), the performance (time in seconds) was not obtained by binding event to kernel launching in OpenCL as it also depends on the OpenCL runtime implementation under windows and linux, instead, it was obtained by the host side CPU profiling utilities.

                      H2D     Kernel     D2H
Linux             1.95,    3.89,        1.56
Windows       6.74,    0.85,        1.44

I am not sure whether you use the same compiler  to the windows OpenCL compiler, but the performance of kernel differs too much under these two operation systems (but with the same hardware). Also the host to device copy take much more time on Windows, can not figure out why. 
Any hints?

my testbed configuration 
hardware:

    CPU: i5-4570R,  iGPU (HD5200)

OS: Win8.1 

    iGPU driver version 10.18.10.3960, latest INDE, Visual Studio 2013

Linux :14.04,

    kernel 3.13
    Beignet Release v1.0
    gcc 4.8.3

Comment 1 wangbiaouestc 2015-02-24 22:52:02 UTC

Functionally the kernel works, but I am curious why there is so much difference under windows and linux

Comment 2 Zhigang Gong 2015-02-25 06:29:14 UTC

Could you share your kernel here? Or you can share the LLVM IR?

To get the LLVM IR and GEN IR，you can set the following environment variable before run your application:

# export OCL_OUTPUT_LLVM_AFTER_GEN=1
# export OCL_OUTPUT_GEN_IR=1

Then run your application
# ./test_app > ir.log

Then you can paster the ir.log here.

Comment 3 Zhigang Gong 2015-02-28 02:58:43 UTC

Could you try to add the following code right before the for() loops in the kernel?
#pragma unroll 1024

Comment 4 GitLab Migration User 2018-10-12 21:27:13 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/beignet/beignet/issues/72.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.