86326 – clEnqueueNDRangeKernel global_work_offset ignored

Bug 86326 - clEnqueueNDRangeKernel global_work_offset ignored

Summary: clEnqueueNDRangeKernel global_work_offset ignored

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Other (show other bugs)
Version:	10.3
Hardware:	x86 (IA32) Linux (All)

Importance:	medium normal
Assignee:	mesa-dev
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-11-16 00:22 UTC by Luke-Jr
Modified:	2016-07-22 21:49 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments
LLVM AMDGPU get_global_offset() patch (3.07 KB, patch) 2015-03-16 07:42 UTC, Ronie Salgado	Details \| Splinter Review
libclc get_global_offset() and get_global_id() patch (3.18 KB, patch) 2015-03-16 07:42 UTC, Ronie Salgado	Details \| Splinter Review
LLVM intrinsics patch (21.42 KB, patch) 2015-03-17 04:47 UTC, Ronie Salgado	Details \| Splinter Review
Updated libclc patch (10.45 KB, patch) 2015-03-17 04:50 UTC, Ronie Salgado	Details \| Splinter Review
Clang patch (4.86 KB, patch) 2015-03-17 04:51 UTC, Ronie Salgado	Details \| Splinter Review
LLVM intrinsics patch (v2) (22.56 KB, patch) 2015-03-17 04:58 UTC, Ronie Salgado	Details \| Splinter Review
Radeon patch for the other patches (740 bytes, patch) 2015-03-17 05:22 UTC, Ronie Salgado	Details \| Splinter Review
attachment-29014-0.html (1.50 KB, text/html) 2015-12-14 22:46 UTC, Ronie Salgado	Details
attachment-7779-0.html (2.10 KB, text/html) 2015-12-14 23:23 UTC, Ronie Salgado	Details
Show Obsolete (3) View All

Description Luke-Jr 2014-11-16 00:22:46 UTC

"global_work_offset can be used to specify an array of work_dim unsigned values that describe the offset used to calculate the global ID of a work-item. If global_work_offset is NULL, the global IDs start at offset (0, 0, ... 0)."
From: https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clEnqueueNDRangeKernel.html

However, Mesa passes this into clover/core/kernel.cpp kernel::launch, which then simply ignores it entirely.

Note that OpenCL 1.0 required global_work_offset to be NULL, but Mesa claims OpenCL 1.1, and if it was only OpenCL 1.0 it would still need to fail if global_work_offset was non-NULL.

As a result of this bug, software tries to use global_work_offset and ends up with kernels executing with the wrong values for get_global_id(0)

Comment 1 Ronie Salgado 2015-03-16 07:42:14 UTC

Created attachment 114336 [details] [review]
LLVM AMDGPU get_global_offset() patch

Comment 2 Ronie Salgado 2015-03-16 07:42:58 UTC

Created attachment 114337 [details] [review]
libclc get_global_offset() and get_global_id() patch

Comment 3 Ronie Salgado 2015-03-16 07:46:11 UTC

I checked the latest version of the clover codebase and it pushes the global offsets into the end of the input buffer.

I made it work with AMD capeverde(HD7770) by adding the intrinsics needed to implement OpenCL 1.1 get_global_offset(). Then I implemented get_global_offset() in libclc and I modified get_global_id() to use the offset.

I also stubbed get_global_offset() in the ptx version.

Those changes are in the attached patches.

Comment 4 Tom Stellard 2015-03-16 14:24:50 UTC

Comment on attachment 114336 [details] [review]
LLVM AMDGPU get_global_offset() patch

Review of attachment 114336 [details] [review]:
-----------------------------------------------------------------

Hi, Thanks for the patch.  After making the changes below, could you send the new patch to llvm-commits@cs.uiuc.edu?

::: lib/Target/R600/SIInstrInfo.h
@@ -371,4 +371,4 @@
> >    GLOBAL_SIZE_Z = 20,
> >    LOCAL_SIZE_X = 24,
> >    LOCAL_SIZE_Y = 28,
> > -  LOCAL_SIZE_Z = 32
> > +  LOCAL_SIZE_Z = 32,

> > +  LOCAL_SIZE_Z = 32,

This additional comma is unnecessary.

We should add another enum here for KernelABIInputOffsets and then add entries for WORK_DIM and GLOBAL_OFFSET_{X,Y,Z}, and use these instead of constants in SIISelLowering.cp

Comment 5 Tom Stellard 2015-03-16 14:36:41 UTC

Comment on attachment 114337 [details] [review]
libclc get_global_offset() and get_global_id() patch

Review of attachment 114337 [details] [review]:
-----------------------------------------------------------------

Hi,

Thanks for the patch.  Would you be able to send your updated version to: libclc-dev@pcc.me.uk

::: ptx-nvidiacl/lib/workitem/get_global_offset.cl
@@ +6,5 @@
> +  case 1:  return __builtin_ptx_read_global_offset_y();
> +  case 2:  return __builtin_ptx_read_global_offset_z();*/
> +  default: return 0;
> +  }
> +}

Why is this commented out?  Also the function name is wrong: get_local_offset().

::: r600/lib/workitem/get_global_offset.ll
@@ +14,5 @@
> +  %z = call i32 @llvm.AMDGPU.read.global.offset.z() nounwind readnone
> +  ret i32 %z
> +default:
> +  ret i32 0
> +}

This should be implemented in OpenCL C rather than LLVM IR.  Now that DataLayouts are mandatory, it makes implementing common code in LLVM IR much more difficult.

You can use the builtins defined in your previous patch for this:

__builtin_amdgpu_read_global_offset_x ...

Comment 6 Ronie Salgado 2015-03-17 04:47:24 UTC

Created attachment 114363 [details] [review]
LLVM intrinsics patch

I tried to use the builtin function, but it did not work. Clang uses the triple target prefix when trying to map GGC builtins into LLVM intrinsics.

In the case of the R600, the triple target prefix is "amdgpu", but IntrinsicsR600.td declares some intrinsics in the separate "r600" and "AMDGPU" namespaces.

So I decided to put all of them in the "amdgpu" namespace. Then table gen complained that they have to start "amdgpu". So I made the subsequent changes until I made it work, and the tests in opencl-example are all passing in capeverde.

I'am leaving those updated patches here for reviewing and documentation before sending them to the respective mailing lists.

Comment 7 Ronie Salgado 2015-03-17 04:50:48 UTC

Created attachment 114364 [details] [review]
Updated libclc patch

Updated libclc patch. I am also taking the opportunity to use C builtin functions for the r600/lib/workitem/*.ll implementation.

Removed the comment from the ptx version of get_global_offset(). The comment was there because I had not implemented those builtin intrinsics. Now I am just leaving some potential stub in LLVM for its implementation.

Comment 8 Ronie Salgado 2015-03-17 04:51:51 UTC

Created attachment 114365 [details] [review]
Clang patch

Clang patch to be able to use the new/old builtin functions.

Comment 9 Ronie Salgado 2015-03-17 04:58:57 UTC

Created attachment 114366 [details] [review]
LLVM intrinsics patch (v2)

Added a missing file, whith an enum for the kernel abi offsets.

Comment 10 Ronie Salgado 2015-03-17 05:22:52 UTC

Created attachment 114367 [details] [review]
Radeon patch for the other patches

It seems that the radeon driver depends in some of the llvm intrinsics. Perphaps some refactoring is required. This patch makes glxgears working again.

Comment 11 Tom Stellard 2015-03-23 20:34:23 UTC

Are these these same patches you sent to the mailing list?

Comment 12 Tom Stellard 2015-04-01 21:13:04 UTC

(In reply to Tom Stellard from comment #11)
> Are these these same patches you sent to the mailing list?

Do you have any interest in fixing up these patches?  If not, I may give it a try.

Comment 13 Ronie Salgado 2015-04-02 20:12:53 UTC

Sorry, I have been busy lately. I will do it today.

Comment 14 Vedran Miletić 2015-12-14 15:06:32 UTC

Ronie, was this fixed?

Comment 15 Ronie Salgado 2015-12-14 22:46:32 UTC

Created attachment 120505 [details]
attachment-29014-0.html

>
> Ronie, was this fixed?
>
> I did not fix my patch. Sorry, but I do not have time to work on this.

Best regards,
Ronie

2015-12-14 12:06 GMT-03:00 <bugzilla-daemon@freedesktop.org>:

> *Comment # 14 <https://bugs.freedesktop.org/show_bug.cgi?id=86326#c14> on
> bug 86326 <https://bugs.freedesktop.org/show_bug.cgi?id=86326> from Vedran
> Miletić <rivanvx@gmail.com> *
>
> Ronie, was this fixed?
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are on the CC list for the bug.
>
>

Comment 16 Vedran Miletić 2015-12-14 23:06:06 UTC

(In reply to Ronie Salgado from comment #15)
> I did not fix my patch. Sorry, but I do not have time to work on this.

In case this turns out to be necessary to make GROMACS OpenCL work, do you mind if I finish your patch and make it suitable for merging (with proper credit, of course)?

Comment 17 Ronie Salgado 2015-12-14 23:23:43 UTC

Created attachment 120506 [details]
attachment-7779-0.html

>
> In case this turns out to be necessary to make GROMACS OpenCL work, do you mind
> if I finish your patch and make it suitable for merging (with proper credit, of
> course)?
>
> I don't mind. Just go ahead. You should only need to split the patch so
that it can be accepted by the different projects that it affects.

Best regards,
Ronie

2015-12-14 20:06 GMT-03:00 <bugzilla-daemon@freedesktop.org>:

> *Comment # 16 <https://bugs.freedesktop.org/show_bug.cgi?id=86326#c16> on
> bug 86326 <https://bugs.freedesktop.org/show_bug.cgi?id=86326> from Vedran
> Miletić <rivanvx@gmail.com> *
>
> (In reply to Ronie Salgado from comment #15 <https://bugs.freedesktop.org/show_bug.cgi?id=86326#c15>)> I did not fix my patch. Sorry, but I do not have time to work on this.
>
> In case this turns out to be necessary to make GROMACS OpenCL work, do you mind
> if I finish your patch and make it suitable for merging (with proper credit, of
> course)?
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are on the CC list for the bug.
>
>

Comment 18 Jan Vesely 2016-07-22 21:49:04 UTC

Fixed with latest llvm/libclc.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.