98185 – Following code (from Eigen) crashes on compilation

Bug 98185 - Following code (from Eigen) crashes on compilation

Summary: Following code (from Eigen) crashes on compilation

Status:	RESOLVED MOVED

Alias:	None

Product:	Beignet
Classification:	Unclassified
Component:	Beignet (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium blocker
Assignee:	Zhigang Gong
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-10-10 14:15 UTC by Hugh Perkins
Modified:	2018-10-12 21:22 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Hugh Perkins 2016-10-10 14:15:33 UTC

Following code, from running Eigen in OpenCL (https://github.com/tensorflow/tensorflow/issues/22#issuecomment-252608921 ), crashes on compilation:

struct Eigen__TensorEvaluator {
    global float* f0;
    float f1;
};

kernel void foo(global struct Eigen__TensorEvaluator* pstruct, global float* floats0) {
    pstruct[0].f0 = floats0;
    pstruct[0].f0[0] = pstruct[0].f1;
}

Result:

context <pyopencl.Context at 0x3330740 on <pyopencl.Device 'Intel(R) HD Graphics 5500 BroadWell U-Processor GT2' on 'Intel Gen OCL Driver' at 0x7f32bde1dbe0>>
make: 'test/eigen/generated/test_cuda_nullary-device.cl' is up to date.

compiling cl for test/eigen/generated/test_cuda_nullary-device.cl ...
ASSERTION FAILED: (isa<AllocaInst>(ptr) || ptrCandidate.empty()) && "storing/loading pointers only support private array"
  at file /home/ubuntu/git/beignet/backend/src/llvm/llvm_gen_backend.cpp, function void gbe::GenWriter::findPointerEscape(llvm::Value*, std::set<llvm::Value*>&, bool, std::vector<llvm::Value*, std::allocator<llvm::Value*> >&), line 953
Trace/breakpoint trap

This is a blocker for porting Eigen to OpenCL.  Porting Eigen to OpenCL is a pre-requisite for porting Tensorflow to OpenCL.  Tensorflow is a highly popular machine learning library recently.

Comment 1 ruiling 2016-10-11 05:23:13 UTC

I saw that you have work around the issue, right?
It is hard to support passing pointer to pointer in kernel argument.
This is restriction in OpenCL 1.2. documented in chapter 6.9 of OpenCL Spec.

Comment 2 Hugh Perkins 2016-10-11 08:52:32 UTC

> This is restriction in OpenCL 1.2. documented in chapter 6.9 of OpenCL Spec

That's fine.  I can rewrite the kernel to not pass any structs as kernel parameters:

struct Eigen__TensorEvaluator {
    global float* f0;
    float f1;
};

kernel void foo(global float *floats0, global float* floats1) {
    global struct Eigen__TensorEvaluator *pstruct = (global struct Eigen__TensorEvaluator *)floats0;
    pstruct[0].f0 = floats1;
    pstruct[0].f0[0] = pstruct[0].f1;
}

This crashes with the same error as before:

context <pyopencl.Context at 0x1a726b0 on <pyopencl.Device 'Intel(R) HD Graphics 5500 BroadWell U-Processor GT2' on 'Intel Gen OCL Driver' at 0x7fc1d0e61be0>>
ASSERTION FAILED: (isa<AllocaInst>(ptr) || ptrCandidate.empty()) && "storing/loading pointers only support private array"
  at file /home/ubuntu/git/beignet/backend/src/llvm/llvm_gen_backend.cpp, function void gbe::GenWriter::findPointerEscape(llvm::Value*, std::set<llvm::Value*>&, bool, std::vector<llvm::Value*, std::allocator<llvm::Value*> >&), line 953
Trace/breakpoint trap

Comment 3 ruiling 2016-10-11 09:07:02 UTC

From my understanding, the rewritten code is semantically the same as "pointer to pointer". because you still interpret the pointer as a "pointer to pointer". Could you tell me why you are storing the pointer to global memory? this is very strange because the pointer your store to global memory will be invalid after this NDRange. If you will use the stored pointer in the same kernel, can "storing the pointer in private array" solve your problem?

Comment 4 Hugh Perkins 2016-10-11 09:14:07 UTC

The pointer I'm storing is a valid pointer.  It's a global float *, passed in normally to the kernel, as a standard kernel parameter.

> this is very strange because the pointer your store to global memory will be invalid after this NDRange. 

This is just the start of the kernel.  the full kernel looks like this: https://github.com/hughperkins/cuda-on-cl/blob/9a7d9175629e9f267b05d63295308d2515929b82/test/eigen/generated/test_cuda_nullary-device.cl#L74

Comment 5 Hugh Perkins 2016-10-11 09:15:16 UTC

(well, like this: https://github.com/hughperkins/cuda-on-cl/blob/9a7d9175629e9f267b05d63295308d2515929b82/test/eigen/generated/test_cuda_nullary-device.cl#L164 )

Comment 6 Zhigang Gong 2016-10-11 09:22:20 UTC

(In reply to ruiling from comment #3)
> From my understanding, the rewritten code is semantically the same as
> "pointer to pointer". because you still interpret the pointer as a "pointer
> to pointer". Could you tell me why you are storing the pointer to global
> memory? this is very strange because the pointer your store to global memory
> will be invalid after this NDRange. If you will use the stored pointer in
> the same kernel, can "storing the pointer in private array" solve your
> problem?

@Ruiling, I believe what Perks want to do is to translate CUDA to OpenCL at IR level. And he may only want to use that pointers witin that kernel and will not access it after this kernel enqueue. @Perkins, correct me if I am wrong. Although I haven't found the related statement in OpenCL Spec, it is indeed doable based on our current implementation. But it's up to beignet team to whether adding this support.

Comment 7 Hugh Perkins 2016-10-11 09:23:56 UTC

Zhigang wrote:
> I believe what Perks want to do is to translate CUDA to OpenCL at IR level. And he may only want to use that pointers witin that kernel and will not access it after this kernel enqueue.

Yes, this is correct.

Comment 8 ruiling 2016-10-12 05:05:28 UTC

(In reply to Hugh Perkins from comment #5)
> (well, like this:
> https://github.com/hughperkins/cuda-on-cl/blob/
> 9a7d9175629e9f267b05d63295308d2515929b82/test/eigen/generated/
> test_cuda_nullary-device.cl#L164 )

I take a look at the kernel. Such kind of usage is supported by OpenCL 2.0.
currently the support of this feature is in OCL20 branch. Are you interested in having a try against the OCL20 branch? You need to use Linux kernel >=4.4 and libdrm >=2.4.66

Comment 9 Hugh Perkins 2016-10-15 16:11:32 UTC

Hi Ruiling,

Using 2.0 branch sounds great!  It looks like eg ubuntu 16.04 has the requirements you mention out of the box.  So, I can continue to use OpenCL 1.2 api, but use the 2.0 version of the Beignet driver, and all should work approximately in the way I'm looking for?

Hugh

Comment 10 ruiling 2016-10-24 02:10:33 UTC

(In reply to Hugh Perkins from comment #9)
> Hi Ruiling,
> 
> Using 2.0 branch sounds great!  It looks like eg ubuntu 16.04 has the
> requirements you mention out of the box.  So, I can continue to use OpenCL
> 1.2 api, but use the 2.0 version of the Beignet driver, and all should work
> approximately in the way I'm looking for?
> 
> Hugh

Yes, it should work using OpenCL 1.2 API. There may some little difference. But mostly it will not cause any issue to you.

Comment 11 GitLab Migration User 2018-10-12 21:22:31 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/beignet/beignet/issues/5.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.