Tensorflow allocates one massive block of memory, then carves individual tensors out of this. unary/binary eigen kernels are then passed these tensors, as arguments.
In my opencl implementation for tensorflow, the huge block of memory is then one clmem object. I pass this clmem object into the kernel multiple times, once for each tensor, along with the appropriate offset.
A unit test for this approach fails on beignet. https://github.com/hughperkins/cuda-on-cl/blob/f240ad6c7d339f3244d8ce6acc4253f7c6a515ad/test/test_singlebuffer.py#L74-L85 The results tensor comes back all zeros.
I tried working aroudn the problem by passing in each clmem just once, and then connecting it to the appropriate tensors, but this crashes the beignet opencl compiler, at runtime, with an llvm error inside gbe.
I can think of various ways to workaround the issue, but I'm wondering what your thoughts are on workable approaches to workaround the issue reliably?
*** Bug 98660 has been marked as a duplicate of this bug. ***
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/beignet/beignet/issues/6.