"global_work_offset can be used to specify an array of work_dim unsigned values that describe the offset used to calculate the global ID of a work-item. If global_work_offset is NULL, the global IDs start at offset (0, 0, ... 0)." From: https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clEnqueueNDRangeKernel.html However, Mesa passes this into clover/core/kernel.cpp kernel::launch, which then simply ignores it entirely. Note that OpenCL 1.0 required global_work_offset to be NULL, but Mesa claims OpenCL 1.1, and if it was only OpenCL 1.0 it would still need to fail if global_work_offset was non-NULL. As a result of this bug, software tries to use global_work_offset and ends up with kernels executing with the wrong values for get_global_id(0)
Created attachment 114336 [details] [review] LLVM AMDGPU get_global_offset() patch
Created attachment 114337 [details] [review] libclc get_global_offset() and get_global_id() patch
I checked the latest version of the clover codebase and it pushes the global offsets into the end of the input buffer. I made it work with AMD capeverde(HD7770) by adding the intrinsics needed to implement OpenCL 1.1 get_global_offset(). Then I implemented get_global_offset() in libclc and I modified get_global_id() to use the offset. I also stubbed get_global_offset() in the ptx version. Those changes are in the attached patches.
Comment on attachment 114336 [details] [review] LLVM AMDGPU get_global_offset() patch Review of attachment 114336 [details] [review]: ----------------------------------------------------------------- Hi, Thanks for the patch. After making the changes below, could you send the new patch to llvm-commits@cs.uiuc.edu? ::: lib/Target/R600/SIInstrInfo.h @@ -371,4 +371,4 @@ > > GLOBAL_SIZE_Z = 20, > > LOCAL_SIZE_X = 24, > > LOCAL_SIZE_Y = 28, > > - LOCAL_SIZE_Z = 32 > > + LOCAL_SIZE_Z = 32, > > + LOCAL_SIZE_Z = 32, This additional comma is unnecessary. We should add another enum here for KernelABIInputOffsets and then add entries for WORK_DIM and GLOBAL_OFFSET_{X,Y,Z}, and use these instead of constants in SIISelLowering.cp
Comment on attachment 114337 [details] [review] libclc get_global_offset() and get_global_id() patch Review of attachment 114337 [details] [review]: ----------------------------------------------------------------- Hi, Thanks for the patch. Would you be able to send your updated version to: libclc-dev@pcc.me.uk ::: ptx-nvidiacl/lib/workitem/get_global_offset.cl @@ +6,5 @@ > + case 1: return __builtin_ptx_read_global_offset_y(); > + case 2: return __builtin_ptx_read_global_offset_z();*/ > + default: return 0; > + } > +} Why is this commented out? Also the function name is wrong: get_local_offset(). ::: r600/lib/workitem/get_global_offset.ll @@ +14,5 @@ > + %z = call i32 @llvm.AMDGPU.read.global.offset.z() nounwind readnone > + ret i32 %z > +default: > + ret i32 0 > +} This should be implemented in OpenCL C rather than LLVM IR. Now that DataLayouts are mandatory, it makes implementing common code in LLVM IR much more difficult. You can use the builtins defined in your previous patch for this: __builtin_amdgpu_read_global_offset_x ...
Created attachment 114363 [details] [review] LLVM intrinsics patch I tried to use the builtin function, but it did not work. Clang uses the triple target prefix when trying to map GGC builtins into LLVM intrinsics. In the case of the R600, the triple target prefix is "amdgpu", but IntrinsicsR600.td declares some intrinsics in the separate "r600" and "AMDGPU" namespaces. So I decided to put all of them in the "amdgpu" namespace. Then table gen complained that they have to start "amdgpu". So I made the subsequent changes until I made it work, and the tests in opencl-example are all passing in capeverde. I'am leaving those updated patches here for reviewing and documentation before sending them to the respective mailing lists.
Created attachment 114364 [details] [review] Updated libclc patch Updated libclc patch. I am also taking the opportunity to use C builtin functions for the r600/lib/workitem/*.ll implementation. Removed the comment from the ptx version of get_global_offset(). The comment was there because I had not implemented those builtin intrinsics. Now I am just leaving some potential stub in LLVM for its implementation.
Created attachment 114365 [details] [review] Clang patch Clang patch to be able to use the new/old builtin functions.
Created attachment 114366 [details] [review] LLVM intrinsics patch (v2) Added a missing file, whith an enum for the kernel abi offsets.
Created attachment 114367 [details] [review] Radeon patch for the other patches It seems that the radeon driver depends in some of the llvm intrinsics. Perphaps some refactoring is required. This patch makes glxgears working again.
Are these these same patches you sent to the mailing list?
(In reply to Tom Stellard from comment #11) > Are these these same patches you sent to the mailing list? Do you have any interest in fixing up these patches? If not, I may give it a try.
Sorry, I have been busy lately. I will do it today.
Ronie, was this fixed?
Created attachment 120505 [details] attachment-29014-0.html > > Ronie, was this fixed? > > I did not fix my patch. Sorry, but I do not have time to work on this. Best regards, Ronie 2015-12-14 12:06 GMT-03:00 <bugzilla-daemon@freedesktop.org>: > *Comment # 14 <https://bugs.freedesktop.org/show_bug.cgi?id=86326#c14> on > bug 86326 <https://bugs.freedesktop.org/show_bug.cgi?id=86326> from Vedran > Miletić <rivanvx@gmail.com> * > > Ronie, was this fixed? > > ------------------------------ > You are receiving this mail because: > > - You are on the CC list for the bug. > >
(In reply to Ronie Salgado from comment #15) > I did not fix my patch. Sorry, but I do not have time to work on this. In case this turns out to be necessary to make GROMACS OpenCL work, do you mind if I finish your patch and make it suitable for merging (with proper credit, of course)?
Created attachment 120506 [details] attachment-7779-0.html > > In case this turns out to be necessary to make GROMACS OpenCL work, do you mind > if I finish your patch and make it suitable for merging (with proper credit, of > course)? > > I don't mind. Just go ahead. You should only need to split the patch so that it can be accepted by the different projects that it affects. Best regards, Ronie 2015-12-14 20:06 GMT-03:00 <bugzilla-daemon@freedesktop.org>: > *Comment # 16 <https://bugs.freedesktop.org/show_bug.cgi?id=86326#c16> on > bug 86326 <https://bugs.freedesktop.org/show_bug.cgi?id=86326> from Vedran > Miletić <rivanvx@gmail.com> * > > (In reply to Ronie Salgado from comment #15 <https://bugs.freedesktop.org/show_bug.cgi?id=86326#c15>)> I did not fix my patch. Sorry, but I do not have time to work on this. > > In case this turns out to be necessary to make GROMACS OpenCL work, do you mind > if I finish your patch and make it suitable for merging (with proper credit, of > course)? > > ------------------------------ > You are receiving this mail because: > > - You are on the CC list for the bug. > >
Fixed with latest llvm/libclc.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.