65873 – R600/SI: Cannot select store with truncate to 32-bit

Bug 65873 - R600/SI: Cannot select store with truncate to 32-bit

Summary: R600/SI: Cannot select store with truncate to 32-bit

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/Gallium/radeonsi (show other bugs)
Version:	git
Hardware:	Other Linux (All)

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-06-18 00:05 UTC by Aaron Watry
Modified:	2013-07-15 23:43 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
Piglit test that triggers the error on SI (1.74 KB, text/plain) 2013-06-18 00:05 UTC, Aaron Watry	Details
LLVM assembly that is produced from the vload-int.cl test case (1.37 KB, text/plain) 2013-06-18 00:06 UTC, Aaron Watry	Details
Output from llc -debug-only=isel --march=r600 --mcpu=verde < vload-int.ll (10.74 KB, text/plain) 2013-06-18 00:08 UTC, Aaron Watry	Details
LLVM Assembly that is produced from vload-int.cl (after first patch) (1.44 KB, text/plain) 2013-06-18 20:19 UTC, Aaron Watry	Details
Output from llc -debug-only=isel --march=r600 --mcpu=verde < vload-int.ll (14.05 KB, text/plain) 2013-06-18 20:20 UTC, Aaron Watry	Details
Show Obsolete (2) View All

Description Aaron Watry 2013-06-18 00:05:46 UTC

Created attachment 80970 [details]
Piglit test that triggers the error on SI

Running the attached piglit CL test, I'm getting the following on R600/SI (Pitcairn, 7850):

LLVM ERROR: Cannot select: 0x2add8b0: ch = store 0x2add7b0, 0x2adcfb0, 0x2adc640, 0x2adbd40<ST4[%out+4], trunc to i32> [ORD=4] [ID=17]
  0x2adcfb0: i64,ch = load 0x20468b8, 0x2add1b0, 0x2adbd40<LD4[%mem+4], zext from i32> [ORD=1] [ID=14]
    0x2add1b0: i64 = add 0x2adc140, 0x2adc540 [ORD=1] [ID=11]
      0x2adc140: i64,ch = load 0x20468b8, 0x2adc040, 0x2adbd40<LD8[undef]> [ID=9]
        0x2adc040: i64 = add 0x2adba40, 0x2adbf40 [ID=7]
          0x2adba40: i64,ch = CopyFromReg 0x20468b8, 0x2adb940 [ID=6]
            0x2adb940: i64 = Register %vreg0 [ID=1]
          0x2adbf40: i64 = Constant<44> [ID=4]
        0x2adbd40: i64 = undef [ID=3]
      0x2adc540: i64 = Constant<4> [ID=5]
    0x2adbd40: i64 = undef [ID=3]
  0x2adc640: i64 = add 0x2adbe40, 0x2adc540 [ORD=4] [ID=13]
    0x2adbe40: i64,ch = load 0x20468b8, 0x2adbc40, 0x2adbd40<LD8[undef]> [ID=10]
      0x2adbc40: i64 = add 0x2adba40, 0x2adbb40 [ID=8]
        0x2adba40: i64,ch = CopyFromReg 0x20468b8, 0x2adb940 [ID=6]
          0x2adb940: i64 = Register %vreg0 [ID=1]
        0x2adbb40: i64 = Constant<36> [ID=2]
      0x2adbd40: i64 = undef [ID=3]
    0x2adc540: i64 = Constant<4> [ID=5]
  0x2adbd40: i64 = undef [ID=3]


I'll attach a dump from llc -debug-only=isel in a minute.

Comment 1 Aaron Watry 2013-06-18 00:06:48 UTC

Created attachment 80971 [details]
LLVM assembly that is produced from the vload-int.cl test case

Comment 2 Aaron Watry 2013-06-18 00:08:14 UTC

Created attachment 80972 [details]
Output from llc -debug-only=isel --march=r600 --mcpu=verde < vload-int.ll

Comment 3 Tom Stellard 2013-06-18 16:56:27 UTC

These patches should fix the issue:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130617/178156.html

The first patch is unrelated to the bug, but it is required in order to prevent the test case in the second patch from crashing on Cayman.  I've tested this on llc, can you verify it works on real hardware.

Comment 4 Aaron Watry 2013-06-18 20:17:46 UTC

The good news is that the store error is gone...

Now I get a load error :/

LLVM ERROR: Cannot select: 0x2c92af0: i64,ch = load 0x21fe0c8, 0x2c924f0, 0x2c921f0<LD4[undef](align=8), zext from i32> [ID=15]
  0x2c924f0: i64 = add 0x2c91ef0, 0x2c923f0 [ID=13]
    0x2c91ef0: i64,ch = CopyFromReg 0x21fe0c8, 0x2c91df0 [ID=12]
      0x2c91df0: i64 = Register %vreg0 [ID=1]
    0x2c923f0: i64 = Constant<44> [ID=4]
  0x2c921f0: i64 = undef [ID=3]

I'll attach the llvm assembly in a follow-up

Comment 5 Aaron Watry 2013-06-18 20:19:25 UTC

Created attachment 81037 [details]
LLVM Assembly that is produced from vload-int.cl (after first patch)

Comment 6 Aaron Watry 2013-06-18 20:20:15 UTC

Created attachment 81038 [details]
Output from llc -debug-only=isel --march=r600 --mcpu=verde < vload-int.ll

Comment 7 Tom Stellard 2013-06-27 17:26:53 UTC

The SI backend needs to be fixed so it can handle zext i64 loads, but there is also a bug in libclc.  The vload implementation assumes pointer types are 32-bit:

define <2 x i32> @__clc_vload2_impl_i32__global(i32 %offset,  i32 addrspace(1)* nocapture %addr) nounwind readonly alwaysinline {
  %1 = ptrtoint i32 addrspace(1)* %addr to i32
  %2 = add i32 %1, %offset
  %3 = inttoptr i32 %2 to <2 x i32> addrspace(1)*
  %4 = load <2 x i32> addrspace(1)* %3, align 4, !tbaa !3
  ret <2 x i32> %4
}

Pointers on SI are 64-bit.  This ptrtoint, add, inttoptr sequence should be replaced with a getelementptr and a bitcast.

Comment 8 Aaron Watry 2013-06-27 18:18:59 UTC

"Pointers on SI are 64-bit."

Talk about a "duh" moment. I'll see what I can do to fix and test the vload/vstore implementation pointer generation since this is broken on SI, maybe 64-bit PTX, and would also be broken on x86_64 (and others) if we ever get around to supporting CPU targets in libclc.

Comment 9 Tom Stellard 2013-06-27 20:54:13 UTC

The piglit test should work with this patch: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130624/179364.html

But we should still try to fix the bug in libclc.

Comment 10 Aaron Watry 2013-06-27 22:19:00 UTC

I've got a draft fix in my libclc repo on fdo.o.  I've tested for regressions on Cedar (successfully), and I will test on SI with your 64-bit load patch asap (hopefully tonight, but we'll see).

Comment 11 Tom Stellard 2013-06-28 04:27:55 UTC

(In reply to comment #10)
> I've got a draft fix in my libclc repo on fdo.o.  I've tested for
> regressions on Cedar (successfully), and I will test on SI with your 64-bit
> load patch asap (hopefully tonight, but we'll see).

I took a look at your libclc fix, and I realized that we can't use address space qualifiers in common code, because addrspace(1) may not mean global address space on all targets.

I think you might find that the optimizers are good enough to turn a .cl implementation into optimal code, but if not we may be have to move the LLVM IR implementations into target specific code.

Comment 12 Aaron Watry 2013-06-28 14:45:38 UTC

No problem.  I'll just revert the vload/vstore code to pure CLC implementations and then override with assembly in the R600 back-end.

Comment 13 Tom Stellard 2013-07-15 19:04:20 UTC

(In reply to comment #9)
> The piglit test should work with this patch:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130624/179364.
> html
> 
> But we should still try to fix the bug in libclc.

I've just pushed this patch.  What is the status of this bug with your vstore/vload fixes?

Comment 14 Aaron Watry 2013-07-15 20:26:35 UTC

I still get errors with the v8i32/v16i32 loads, but those aren't currently enabled in the R600 or SI back-ends as far as I know.

If I comment out the vload8 and vload16 tests everything runs correctly for me on my 7850 with git master llvm/clang and my vload/vstore changes for libclc.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.