Created attachment 80970 [details] Piglit test that triggers the error on SI Running the attached piglit CL test, I'm getting the following on R600/SI (Pitcairn, 7850): LLVM ERROR: Cannot select: 0x2add8b0: ch = store 0x2add7b0, 0x2adcfb0, 0x2adc640, 0x2adbd40<ST4[%out+4], trunc to i32> [ORD=4] [ID=17] 0x2adcfb0: i64,ch = load 0x20468b8, 0x2add1b0, 0x2adbd40<LD4[%mem+4], zext from i32> [ORD=1] [ID=14] 0x2add1b0: i64 = add 0x2adc140, 0x2adc540 [ORD=1] [ID=11] 0x2adc140: i64,ch = load 0x20468b8, 0x2adc040, 0x2adbd40<LD8[undef]> [ID=9] 0x2adc040: i64 = add 0x2adba40, 0x2adbf40 [ID=7] 0x2adba40: i64,ch = CopyFromReg 0x20468b8, 0x2adb940 [ID=6] 0x2adb940: i64 = Register %vreg0 [ID=1] 0x2adbf40: i64 = Constant<44> [ID=4] 0x2adbd40: i64 = undef [ID=3] 0x2adc540: i64 = Constant<4> [ID=5] 0x2adbd40: i64 = undef [ID=3] 0x2adc640: i64 = add 0x2adbe40, 0x2adc540 [ORD=4] [ID=13] 0x2adbe40: i64,ch = load 0x20468b8, 0x2adbc40, 0x2adbd40<LD8[undef]> [ID=10] 0x2adbc40: i64 = add 0x2adba40, 0x2adbb40 [ID=8] 0x2adba40: i64,ch = CopyFromReg 0x20468b8, 0x2adb940 [ID=6] 0x2adb940: i64 = Register %vreg0 [ID=1] 0x2adbb40: i64 = Constant<36> [ID=2] 0x2adbd40: i64 = undef [ID=3] 0x2adc540: i64 = Constant<4> [ID=5] 0x2adbd40: i64 = undef [ID=3] I'll attach a dump from llc -debug-only=isel in a minute.
Created attachment 80971 [details] LLVM assembly that is produced from the vload-int.cl test case
Created attachment 80972 [details] Output from llc -debug-only=isel --march=r600 --mcpu=verde < vload-int.ll
These patches should fix the issue: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130617/178156.html The first patch is unrelated to the bug, but it is required in order to prevent the test case in the second patch from crashing on Cayman. I've tested this on llc, can you verify it works on real hardware.
The good news is that the store error is gone... Now I get a load error :/ LLVM ERROR: Cannot select: 0x2c92af0: i64,ch = load 0x21fe0c8, 0x2c924f0, 0x2c921f0<LD4[undef](align=8), zext from i32> [ID=15] 0x2c924f0: i64 = add 0x2c91ef0, 0x2c923f0 [ID=13] 0x2c91ef0: i64,ch = CopyFromReg 0x21fe0c8, 0x2c91df0 [ID=12] 0x2c91df0: i64 = Register %vreg0 [ID=1] 0x2c923f0: i64 = Constant<44> [ID=4] 0x2c921f0: i64 = undef [ID=3] I'll attach the llvm assembly in a follow-up
Created attachment 81037 [details] LLVM Assembly that is produced from vload-int.cl (after first patch)
Created attachment 81038 [details] Output from llc -debug-only=isel --march=r600 --mcpu=verde < vload-int.ll
The SI backend needs to be fixed so it can handle zext i64 loads, but there is also a bug in libclc. The vload implementation assumes pointer types are 32-bit: define <2 x i32> @__clc_vload2_impl_i32__global(i32 %offset, i32 addrspace(1)* nocapture %addr) nounwind readonly alwaysinline { %1 = ptrtoint i32 addrspace(1)* %addr to i32 %2 = add i32 %1, %offset %3 = inttoptr i32 %2 to <2 x i32> addrspace(1)* %4 = load <2 x i32> addrspace(1)* %3, align 4, !tbaa !3 ret <2 x i32> %4 } Pointers on SI are 64-bit. This ptrtoint, add, inttoptr sequence should be replaced with a getelementptr and a bitcast.
"Pointers on SI are 64-bit." Talk about a "duh" moment. I'll see what I can do to fix and test the vload/vstore implementation pointer generation since this is broken on SI, maybe 64-bit PTX, and would also be broken on x86_64 (and others) if we ever get around to supporting CPU targets in libclc.
The piglit test should work with this patch: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130624/179364.html But we should still try to fix the bug in libclc.
I've got a draft fix in my libclc repo on fdo.o. I've tested for regressions on Cedar (successfully), and I will test on SI with your 64-bit load patch asap (hopefully tonight, but we'll see).
(In reply to comment #10) > I've got a draft fix in my libclc repo on fdo.o. I've tested for > regressions on Cedar (successfully), and I will test on SI with your 64-bit > load patch asap (hopefully tonight, but we'll see). I took a look at your libclc fix, and I realized that we can't use address space qualifiers in common code, because addrspace(1) may not mean global address space on all targets. I think you might find that the optimizers are good enough to turn a .cl implementation into optimal code, but if not we may be have to move the LLVM IR implementations into target specific code.
No problem. I'll just revert the vload/vstore code to pure CLC implementations and then override with assembly in the R600 back-end.
(In reply to comment #9) > The piglit test should work with this patch: > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130624/179364. > html > > But we should still try to fix the bug in libclc. I've just pushed this patch. What is the status of this bug with your vstore/vload fixes?
I still get errors with the v8i32/v16i32 loads, but those aren't currently enabled in the R600 or SI back-ends as far as I know. If I comment out the vload8 and vload16 tests everything runs correctly for me on my 7850 with git master llvm/clang and my vload/vstore changes for libclc.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.