3469715a8a171512cf9b528702e70393f01c6041 is the first bad commit commit 3469715a8a171512cf9b528702e70393f01c6041 Author: José Fonseca <jfonseca@vmware.com> Date: Fri Jul 13 18:09:30 2012 +0100 gallivm,draw,llvmpipe: Support wider native registers.
Could you describe the failure and environment a bit more? I'm not seeing it here.
mesa: 761131ce4591e5f55f38d13f2c4d2194bc9cb0fd (master) llvm: 2.9+dfsg-3ubuntu4 Linux distribution: Ubuntu 12.04 amd64 $ ./build/linux-x86_64-debug/bin/lp_test_format Testing PIPE_FORMAT_B8G8R8A8_UNORM (float) ... Testing PIPE_FORMAT_B8G8R8A8_UNORM (unorm8) ... Testing PIPE_FORMAT_B8G8R8X8_UNORM (float) ... Testing PIPE_FORMAT_B8G8R8X8_UNORM (unorm8) ... Testing PIPE_FORMAT_A8R8G8B8_UNORM (float) ... Testing PIPE_FORMAT_A8R8G8B8_UNORM (unorm8) ... Testing PIPE_FORMAT_X8R8G8B8_UNORM (float) ... Testing PIPE_FORMAT_X8R8G8B8_UNORM (unorm8) ... Testing PIPE_FORMAT_B5G5R5A1_UNORM (float) ... Testing PIPE_FORMAT_B5G5R5A1_UNORM (unorm8) ... Testing PIPE_FORMAT_B4G4R4A4_UNORM (float) ... Testing PIPE_FORMAT_B4G4R4A4_UNORM (unorm8) ... Testing PIPE_FORMAT_B5G6R5_UNORM (float) ... Testing PIPE_FORMAT_B5G6R5_UNORM (unorm8) ... Testing PIPE_FORMAT_R10G10B10A2_UNORM (float) ... Testing PIPE_FORMAT_R10G10B10A2_UNORM (unorm8) ... Testing PIPE_FORMAT_L8_UNORM (float) ... Testing PIPE_FORMAT_L8_UNORM (unorm8) ... Testing PIPE_FORMAT_A8_UNORM (float) ... Testing PIPE_FORMAT_A8_UNORM (unorm8) ... Testing PIPE_FORMAT_I8_UNORM (float) ... Testing PIPE_FORMAT_I8_UNORM (unorm8) ... Testing PIPE_FORMAT_L8A8_UNORM (float) ... Testing PIPE_FORMAT_L8A8_UNORM (unorm8) ... Testing PIPE_FORMAT_L16_UNORM (float) ... Testing PIPE_FORMAT_L16_UNORM (unorm8) ... Testing PIPE_FORMAT_UYVY (float) ... Testing PIPE_FORMAT_UYVY (unorm8) ... Testing PIPE_FORMAT_YUYV (float) ... Testing PIPE_FORMAT_YUYV (unorm8) ... Testing PIPE_FORMAT_R32_FLOAT (float) ... Testing PIPE_FORMAT_R32_FLOAT (unorm8) ... Testing PIPE_FORMAT_R32G32_FLOAT (float) ... Testing PIPE_FORMAT_R32G32_FLOAT (unorm8) ... Testing PIPE_FORMAT_R32G32B32_FLOAT (float) ... Testing PIPE_FORMAT_R32G32B32_FLOAT (unorm8) ... Testing PIPE_FORMAT_R32G32B32A32_FLOAT (float) ... Testing PIPE_FORMAT_R32G32B32A32_FLOAT (unorm8) ... Testing PIPE_FORMAT_R32_UNORM (float) ... Testing PIPE_FORMAT_R32_UNORM (unorm8) ... Testing PIPE_FORMAT_R32G32_UNORM (float) ... Testing PIPE_FORMAT_R32G32_UNORM (unorm8) ... Testing PIPE_FORMAT_R32G32B32_UNORM (float) ... Testing PIPE_FORMAT_R32G32B32_UNORM (unorm8) ... Testing PIPE_FORMAT_R32G32B32A32_UNORM (float) ... Testing PIPE_FORMAT_R32G32B32A32_UNORM (unorm8) ... Testing PIPE_FORMAT_R32_USCALED (float) ... Testing PIPE_FORMAT_R32_USCALED (unorm8) ... Testing PIPE_FORMAT_R32G32_USCALED (float) ... Testing PIPE_FORMAT_R32G32_USCALED (unorm8) ... FAILED Packed: 00 00 00 01 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 ff obtained 00 ff 00 ff expected FAILED Packed: 00 00 00 01 Unpacked (0,0): 00 00 00 ff obtained ff ff 00 ff expected Testing PIPE_FORMAT_R32G32B32_USCALED (float) ... Testing PIPE_FORMAT_R32G32B32_USCALED (unorm8) ... FAILED Packed: 00 00 00 01 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 ff obtained 00 ff 00 ff expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 ff obtained 00 00 ff ff expected FAILED Packed: 00 00 00 01 Unpacked (0,0): 00 00 00 ff obtained ff ff ff ff expected Testing PIPE_FORMAT_R32G32B32A32_USCALED (float) ... Testing PIPE_FORMAT_R32G32B32A32_USCALED (unorm8) ... FAILED Packed: 00 00 00 01 Unpacked (0,0): 00 00 00 00 obtained ff 00 00 00 expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 00 obtained 00 ff 00 00 expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 00 obtained 00 00 ff 00 expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 00 obtained 00 00 00 ff expected FAILED Packed: 00 00 00 01 Unpacked (0,0): 00 00 00 00 obtained ff ff ff ff expected Testing PIPE_FORMAT_R32_SNORM (float) ... Testing PIPE_FORMAT_R32_SNORM (unorm8) ... Testing PIPE_FORMAT_R32G32_SNORM (float) ... Testing PIPE_FORMAT_R32G32_SNORM (unorm8) ... Testing PIPE_FORMAT_R32G32B32_SNORM (float) ... Testing PIPE_FORMAT_R32G32B32_SNORM (unorm8) ... Testing PIPE_FORMAT_R32G32B32A32_SNORM (float) ... Testing PIPE_FORMAT_R32G32B32A32_SNORM (unorm8) ... Testing PIPE_FORMAT_R32_SSCALED (float) ... Testing PIPE_FORMAT_R32_SSCALED (unorm8) ... FAILED Packed: 00 00 00 01 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected Testing PIPE_FORMAT_R32G32_SSCALED (float) ... Testing PIPE_FORMAT_R32G32_SSCALED (unorm8) ... FAILED Packed: 00 00 00 01 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 ff obtained 00 ff 00 ff expected Testing PIPE_FORMAT_R32G32B32_SSCALED (float) ... Testing PIPE_FORMAT_R32G32B32_SSCALED (unorm8) ... FAILED Packed: 00 00 00 01 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 ff obtained 00 ff 00 ff expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 ff obtained 00 00 ff ff expected Testing PIPE_FORMAT_R32G32B32A32_SSCALED (float) ... Testing PIPE_FORMAT_R32G32B32A32_SSCALED (unorm8) ... FAILED Packed: 00 00 00 01 Unpacked (0,0): 00 00 00 00 obtained ff 00 00 00 expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 00 obtained 00 ff 00 00 expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 00 obtained 00 00 ff 00 expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 00 obtained 00 00 00 ff expected Testing PIPE_FORMAT_R16_UNORM (float) ... Testing PIPE_FORMAT_R16_UNORM (unorm8) ... Testing PIPE_FORMAT_R16G16_UNORM (float) ... Testing PIPE_FORMAT_R16G16_UNORM (unorm8) ... Testing PIPE_FORMAT_R16G16B16_UNORM (float) ... Testing PIPE_FORMAT_R16G16B16_UNORM (unorm8) ... Testing PIPE_FORMAT_R16G16B16A16_UNORM (float) ... Testing PIPE_FORMAT_R16G16B16A16_UNORM (unorm8) ... Testing PIPE_FORMAT_R16_USCALED (float) ... Testing PIPE_FORMAT_R16_USCALED (unorm8) ... Testing PIPE_FORMAT_R16G16_USCALED (float) ... Testing PIPE_FORMAT_R16G16_USCALED (unorm8) ... Testing PIPE_FORMAT_R16G16B16_USCALED (float) ... Testing PIPE_FORMAT_R16G16B16_USCALED (unorm8) ... FAILED Packed: ff ff 00 00 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected FAILED Packed: 00 00 ff ff Unpacked (0,0): 00 00 00 ff obtained 00 ff 00 ff expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 ff obtained 00 00 ff ff expected FAILED Packed: ff ff ff ff Unpacked (0,0): 00 00 00 ff obtained ff ff ff ff expected Testing PIPE_FORMAT_R16G16B16A16_USCALED (float) ... Testing PIPE_FORMAT_R16G16B16A16_USCALED (unorm8) ... FAILED Packed: ff ff 00 00 Unpacked (0,0): 00 00 00 00 obtained ff 00 00 00 expected FAILED Packed: 00 00 ff ff Unpacked (0,0): 00 00 00 00 obtained 00 ff 00 00 expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 00 obtained 00 00 ff 00 expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 00 obtained 00 00 00 ff expected FAILED Packed: ff ff ff ff Unpacked (0,0): 00 00 00 00 obtained ff ff ff ff expected Testing PIPE_FORMAT_R16_SNORM (float) ... Testing PIPE_FORMAT_R16_SNORM (unorm8) ... Testing PIPE_FORMAT_R16G16_SNORM (float) ... Testing PIPE_FORMAT_R16G16_SNORM (unorm8) ... Testing PIPE_FORMAT_R16G16B16_SNORM (float) ... Testing PIPE_FORMAT_R16G16B16_SNORM (unorm8) ... Testing PIPE_FORMAT_R16G16B16A16_SNORM (float) ... Testing PIPE_FORMAT_R16G16B16A16_SNORM (unorm8) ... Testing PIPE_FORMAT_R16_SSCALED (float) ... Testing PIPE_FORMAT_R16_SSCALED (unorm8) ... FAILED Packed: ff 7f 00 00 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected Testing PIPE_FORMAT_R16G16_SSCALED (float) ... Testing PIPE_FORMAT_R16G16_SSCALED (unorm8) ... FAILED Packed: ff 7f 00 00 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected FAILED Packed: 00 00 ff 7f Unpacked (0,0): 00 00 00 ff obtained 00 ff 00 ff expected Testing PIPE_FORMAT_R16G16B16_SSCALED (float) ... Testing PIPE_FORMAT_R16G16B16_SSCALED (unorm8) ... FAILED Packed: ff 7f 00 00 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected FAILED Packed: 00 00 ff 7f Unpacked (0,0): 00 00 00 ff obtained 00 ff 00 ff expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 ff obtained 00 00 ff ff expected Testing PIPE_FORMAT_R16G16B16A16_SSCALED (float) ... Testing PIPE_FORMAT_R16G16B16A16_SSCALED (unorm8) ... FAILED Packed: ff 7f 00 00 Unpacked (0,0): 00 00 00 00 obtained ff 00 00 00 expected FAILED Packed: 00 00 ff 7f Unpacked (0,0): 00 00 00 00 obtained 00 ff 00 00 expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 00 obtained 00 00 ff 00 expected FAILED Packed: 00 00 00 00 Unpacked (0,0): 00 00 00 00 obtained 00 00 00 ff expected Testing PIPE_FORMAT_R8_UNORM (float) ... Testing PIPE_FORMAT_R8_UNORM (unorm8) ... Testing PIPE_FORMAT_R8G8_UNORM (float) ... Testing PIPE_FORMAT_R8G8_UNORM (unorm8) ... Testing PIPE_FORMAT_R8G8B8_UNORM (float) ... Testing PIPE_FORMAT_R8G8B8_UNORM (unorm8) ... Testing PIPE_FORMAT_R8G8B8A8_UNORM (float) ... Testing PIPE_FORMAT_R8G8B8A8_UNORM (unorm8) ... Testing PIPE_FORMAT_X8B8G8R8_UNORM (float) ... Testing PIPE_FORMAT_X8B8G8R8_UNORM (unorm8) ... Testing PIPE_FORMAT_R8_USCALED (float) ... Testing PIPE_FORMAT_R8_USCALED (unorm8) ... Testing PIPE_FORMAT_R8G8_USCALED (float) ... Testing PIPE_FORMAT_R8G8_USCALED (unorm8) ... Testing PIPE_FORMAT_R8G8B8_USCALED (float) ... Testing PIPE_FORMAT_R8G8B8_USCALED (unorm8) ... FAILED Packed: ff 00 00 00 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected FAILED Packed: 00 ff 00 00 Unpacked (0,0): 00 00 00 ff obtained 00 ff 00 ff expected FAILED Packed: 00 00 ff 00 Unpacked (0,0): 00 00 00 ff obtained 00 00 ff ff expected FAILED Packed: ff ff ff 00 Unpacked (0,0): 00 00 00 ff obtained ff ff ff ff expected Testing PIPE_FORMAT_R8G8B8A8_USCALED (float) ... Testing PIPE_FORMAT_R8G8B8A8_USCALED (unorm8) ... Testing PIPE_FORMAT_R8_SNORM (float) ... Testing PIPE_FORMAT_R8_SNORM (unorm8) ... Testing PIPE_FORMAT_R8G8_SNORM (float) ... Testing PIPE_FORMAT_R8G8_SNORM (unorm8) ... Testing PIPE_FORMAT_R8G8B8_SNORM (float) ... Testing PIPE_FORMAT_R8G8B8_SNORM (unorm8) ... Testing PIPE_FORMAT_R8G8B8A8_SNORM (float) ... Testing PIPE_FORMAT_R8G8B8A8_SNORM (unorm8) ... Testing PIPE_FORMAT_R8_SSCALED (float) ... Testing PIPE_FORMAT_R8_SSCALED (unorm8) ... FAILED Packed: 7f 00 00 00 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected Testing PIPE_FORMAT_R8G8_SSCALED (float) ... Testing PIPE_FORMAT_R8G8_SSCALED (unorm8) ... FAILED Packed: 7f 00 00 00 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected FAILED Packed: 00 7f 00 00 Unpacked (0,0): 00 00 00 ff obtained 00 ff 00 ff expected Testing PIPE_FORMAT_R8G8B8_SSCALED (float) ... Testing PIPE_FORMAT_R8G8B8_SSCALED (unorm8) ... FAILED Packed: 7f 00 00 00 Unpacked (0,0): 00 00 00 ff obtained ff 00 00 ff expected FAILED Packed: 00 7f 00 00 Unpacked (0,0): 00 00 00 ff obtained 00 ff 00 ff expected FAILED Packed: 00 00 7f 00 Unpacked (0,0): 00 00 00 ff obtained 00 00 ff ff expected Testing PIPE_FORMAT_R8G8B8A8_SSCALED (float) ... Testing PIPE_FORMAT_R8G8B8A8_SSCALED (unorm8) ... FAILED Packed: 7f 00 00 00 Unpacked (0,0): 00 00 00 00 obtained ff 00 00 00 expected FAILED Packed: 00 7f 00 00 Unpacked (0,0): 00 00 00 00 obtained 00 ff 00 00 expected FAILED Packed: 00 00 7f 00 Unpacked (0,0): 00 00 00 00 obtained 00 00 ff 00 expected FAILED Packed: 00 00 00 7f Unpacked (0,0): 00 00 00 00 obtained 00 00 00 ff expected Testing PIPE_FORMAT_R32_FIXED (float) ... Testing PIPE_FORMAT_R32_FIXED (unorm8) ... Testing PIPE_FORMAT_R32G32_FIXED (float) ... Testing PIPE_FORMAT_R32G32_FIXED (unorm8) ... Testing PIPE_FORMAT_R32G32B32_FIXED (float) ... Testing PIPE_FORMAT_R32G32B32_FIXED (unorm8) ... Testing PIPE_FORMAT_R32G32B32A32_FIXED (float) ... Testing PIPE_FORMAT_R32G32B32A32_FIXED (unorm8) ... Testing PIPE_FORMAT_R16_FLOAT (float) ... Testing PIPE_FORMAT_R16_FLOAT (unorm8) ... Testing PIPE_FORMAT_R16G16_FLOAT (float) ... Testing PIPE_FORMAT_R16G16_FLOAT (unorm8) ... Testing PIPE_FORMAT_R16G16B16_FLOAT (float) ... Testing PIPE_FORMAT_R16G16B16_FLOAT (unorm8) ... Testing PIPE_FORMAT_R16G16B16A16_FLOAT (float) ... Testing PIPE_FORMAT_R16G16B16A16_FLOAT (unorm8) ... Testing PIPE_FORMAT_L8_SRGB (float) ... Testing PIPE_FORMAT_L8_SRGB (unorm8) ... Testing PIPE_FORMAT_L8A8_SRGB (float) ... Testing PIPE_FORMAT_L8A8_SRGB (unorm8) ... Testing PIPE_FORMAT_R8G8B8_SRGB (float) ... Testing PIPE_FORMAT_R8G8B8_SRGB (unorm8) ... Testing PIPE_FORMAT_A8B8G8R8_SRGB (float) ... Testing PIPE_FORMAT_A8B8G8R8_SRGB (unorm8) ... Testing PIPE_FORMAT_X8B8G8R8_SRGB (float) ... Testing PIPE_FORMAT_X8B8G8R8_SRGB (unorm8) ... Testing PIPE_FORMAT_B8G8R8A8_SRGB (float) ... Testing PIPE_FORMAT_B8G8R8A8_SRGB (unorm8) ... Testing PIPE_FORMAT_B8G8R8X8_SRGB (float) ... Testing PIPE_FORMAT_B8G8R8X8_SRGB (unorm8) ... Testing PIPE_FORMAT_A8R8G8B8_SRGB (float) ... Testing PIPE_FORMAT_A8R8G8B8_SRGB (unorm8) ... Testing PIPE_FORMAT_X8R8G8B8_SRGB (float) ... Testing PIPE_FORMAT_X8R8G8B8_SRGB (unorm8) ... Testing PIPE_FORMAT_R8G8B8A8_SRGB (float) ... Testing PIPE_FORMAT_R8G8B8A8_SRGB (unorm8) ... Testing PIPE_FORMAT_DXT1_RGB (float) ... Testing PIPE_FORMAT_DXT1_RGB (unorm8) ... Testing PIPE_FORMAT_DXT1_RGBA (float) ... Testing PIPE_FORMAT_DXT1_RGBA (unorm8) ... Testing PIPE_FORMAT_DXT3_RGBA (float) ... Testing PIPE_FORMAT_DXT3_RGBA (unorm8) ... Testing PIPE_FORMAT_DXT5_RGBA (float) ... Testing PIPE_FORMAT_DXT5_RGBA (unorm8) ... Testing PIPE_FORMAT_R8G8_B8G8_UNORM (float) ... Testing PIPE_FORMAT_R8G8_B8G8_UNORM (unorm8) ... Testing PIPE_FORMAT_G8R8_G8B8_UNORM (float) ... Testing PIPE_FORMAT_G8R8_G8B8_UNORM (unorm8) ... Testing PIPE_FORMAT_R8SG8SB8UX8U_NORM (float) ... Testing PIPE_FORMAT_R8SG8SB8UX8U_NORM (unorm8) ... Testing PIPE_FORMAT_R5SG5SB6U_NORM (float) ... Testing PIPE_FORMAT_R5SG5SB6U_NORM (unorm8) ... Testing PIPE_FORMAT_A8B8G8R8_UNORM (float) ... Testing PIPE_FORMAT_A8B8G8R8_UNORM (unorm8) ... Testing PIPE_FORMAT_B5G5R5X1_UNORM (float) ... Testing PIPE_FORMAT_B5G5R5X1_UNORM (unorm8) ... Testing PIPE_FORMAT_R10G10B10X2_USCALED (float) ... Testing PIPE_FORMAT_R10G10B10X2_USCALED (unorm8) ... Testing PIPE_FORMAT_R10G10B10X2_SNORM (float) ... Testing PIPE_FORMAT_R10G10B10X2_SNORM (unorm8) ... Testing PIPE_FORMAT_L4A4_UNORM (float) ... Testing PIPE_FORMAT_L4A4_UNORM (unorm8) ... Testing PIPE_FORMAT_B10G10R10A2_UNORM (float) ... Testing PIPE_FORMAT_B10G10R10A2_UNORM (unorm8) ... Testing PIPE_FORMAT_R10SG10SB10SA2U_NORM (float) ... Testing PIPE_FORMAT_R10SG10SB10SA2U_NORM (unorm8) ... Testing PIPE_FORMAT_R8G8Bx_SNORM (float) ... Testing PIPE_FORMAT_R8G8Bx_SNORM (unorm8) ... Testing PIPE_FORMAT_R8G8B8X8_UNORM (float) ... Testing PIPE_FORMAT_R8G8B8X8_UNORM (unorm8) ... Testing PIPE_FORMAT_B4G4R4X4_UNORM (float) ... Testing PIPE_FORMAT_B4G4R4X4_UNORM (unorm8) ...
Thanks. And what's your CPU, and LLVM version? Does setting LP_NATIVE_VECTOR_WIDTH=128 help? If not, please run the test as GALLIVM_DEBUG=tgsi,ir,asm /path/to/lp_test_format -v -v and attatch the output. Before and after the faulty commit above.
Created attachment 64334 [details] lp_test_format.log
(In reply to comment #3) > Thanks. And what's your CPU, and LLVM version? Intel Westmere llvm-2.9 > Does setting > > LP_NATIVE_VECTOR_WIDTH=128 > > help? No. > If not, please run the test as > > GALLIVM_DEBUG=tgsi,ir,asm /path/to/lp_test_format -v -v > > and attatch the output. Before and after the faulty commit above. See attachment #64334 [details] for output after faulty commit.
lp_test_format only passes on llvm-3.1 on my machine. llvm-2.6: fail llvm-2.7: fail llvm-2.8: fail llvm-2.9: fail llvm-3.0: fail llvm-3.1: pass llvm-3.2svn: fail
Since the test doesn't use any sized vectors depending on cpu caps LP_NATIVE_VECTOR_WIDTH shouldn't affect anything. Here's the IR of a test which fails: define void @fetch_r32g32_sscaled_unorm8(<4 x i8>*, i8*, i32, i32) { entry: %4 = bitcast i8* %1 to <2 x i32>* %5 = load <2 x i32>* %4, align 4 %6 = shufflevector <2 x i32> %5, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 2> %7 = call <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32> %6, <4 x i32> zeroinitializer) %8 = call <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32> %7, <4 x i32> <i32 1, i32 1, i32 1, i32 1>) %9 = ashr <4 x i32> %8, <i32 -1, i32 -1, i32 -1, i32 -1> %10 = sub <4 x i32> %8, %9 %11 = extractelement <4 x i32> %10, i32 0 %12 = extractelement <4 x i32> %10, i32 1 %13 = extractelement <4 x i32> %10, i32 2 %14 = extractelement <4 x i32> %10, i32 3 %15 = bitcast i32 %11 to <2 x i16> %16 = bitcast i32 %12 to <2 x i16> %17 = shufflevector <2 x i16> %15, <2 x i16> %16, <2 x i32> <i32 0, i32 2> %18 = bitcast i32 %13 to <2 x i16> %19 = bitcast i32 %14 to <2 x i16> %20 = shufflevector <2 x i16> %18, <2 x i16> %19, <2 x i32> <i32 0, i32 2> %21 = bitcast <2 x i16> %17 to <4 x i8> %22 = bitcast <2 x i16> %20 to <4 x i8> %23 = shufflevector <4 x i8> %21, <4 x i8> %22, <4 x i32> <i32 0, i32 2, i32 4, i32 6> %24 = shl <4 x i8> %23, <i8 8, i8 8, i8 8, i8 8> %25 = sub <4 x i8> %24, %23 %26 = bitcast <4 x i8> %25 to i32 %27 = and i32 %26, 65535 %28 = or i32 bitcast (<4 x i8> <i8 0, i8 0, i8 0, i8 -1> to i32), %27 %29 = bitcast i32 %28 to <4 x i8> store <4 x i8> %29, <4 x i8>* %0 ret void } With llvm 3.1 it passes but not with 2.9/3.0. But there's more to it, with 2.9 AND a cpu which isn't sse41-capable it also passes (and on top of it the code generated is way _better_ despite it can't use the pminsd/pmaxsd intrinsics but those aren't the issue). So with sse41 or avx capable cpu llvm 3.1 generates correct but crappy code, whereas it is crappy and wrong with 2.9/3.0. Only if you have a not-sse41 capable cpu it produces correct and good code... I believe the issue here is use of the non-native vectors toward the end (2x16, 4x8) since llvm uses padded vector elements for them (a 4xi8 vector looks like 4xi32) so it has to do lots of weird shuffles (those harmless looking bitcasts cause lots of unpacks, shuffles etc.). Well that's the explanation for the crappy code (probably some optimization wasn't available without sse41 which turned out to be much better in the end). Fortunately it shouldn't happen with llvmpipe since we don't generally use such vectors (we always fetch multiple of 4 values). This doesn't explain why it isn't correct though. Maybe we're relying somewhere on some properties of those values when resizing which don't hold true if the vector elements aren't packed but padded. There's another issue with this code, which may or may not be related to this bug: %9 = ashr <4 x i32> %8, <i32 -1, i32 -1, i32 -1, i32 -1> (the uscaled formats will have a lshr instead). This shuffle is illegal since shuffles with values larger or equal than vector width (which this is) are undefined in llvm (ok not illegal just the result is undefined). However, llvm itself doesn't care and with sse2 it just happily issues the psrad 255 instruction, which has defined (and reasonable) behavior (for the non-vector domain the hardware will just use the last count bits which would still work). This comes from lp_build_conv(), line 594 (since src_shift is zero, and src_offset is 0 and dst_offset is 1). So something seems wrong with this calculation, maybe we'd need to do something different if destination is normalized format instead.
Ah I think I got it. %24 = shl <4 x i8> %23, <i8 8, i8 8, i8 8, i8 8> This is another extremely questionable shift (count must be smaller than number of bits), coming from lp_build_conv(), line 690). This shift is what probably causes most of the crazy code generated (because there is no native vector byte shift). Most likely this causes the wrong results - since llvm does some really crazy things to emulate this shift, I suspect this emulation can cause wrong results as those crazy things it does probably are only guaranteed to work when the shift count is legal (the code generated is too unreadable to be sure). Note that this format conversion is rather crazy anyway - all values get mapped to either 0 or 1, that shift above is what is done to get those 0/1 values into unorm format (together with the sub following it). In any case I don't think this is really a regression. Most likely we just got lucky before. I think the old code would have used extract/scalar truncation/insert before that shift, hence llvm probably would have just used scalar shifts instead of the complicated emulation (which actually in this case most likely was better). But technically the shift still was undefined. To fix this properly I think we must move arounds those shifts a bit (before/after lp_build_resize()) but that's going to depend on what the src/dst format is.
This now works, probably due to 56335b44417bc3d49625f9637e2b95457f522ad2.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.