Bug 94972

Summary: blend failures on llvmpipe with llvm 3.7 due to vector selects
Product: Mesa Reporter: Roland Scheidegger <sroland>
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact: mesa-dev
Severity: normal    
Priority: medium    
Version: git   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Roland Scheidegger 2016-04-16 21:16:49 UTC
Using vector selects (cdf7c6b83dad7eb6a7600af61403315b02dcf13f) caused some regressions when using llvm 3.7 (large number of blend tests in deqp, also mentioned in bug 94957, piglit gl-1.0-blend-func, and dolphin was mentioned on IRC).

The problem is actually entirely llvm's fault after some digging (initially wasn't sure if we weren't relying on some undefined behavior, hence tracked this down), which very obviously miscompiles rather simple vector selects (I suspect this only affects per-byte selects, possibly only when constant values are involved).
It only seems to affect 3.7, I've tried 3.3 through 3.8 and everything else worked.

Mostly filing a bug so I've got something to refer to when working around it in mesa.

define <16 x i8> @novsel(<16 x i8> %val1, <16 x i8> %val2) {
entry:
  %val1a = and <16 x i8> %val1, <i8 0, i8 0, i8 0, i8 -1, i8 0, i8 0, i8 0, i8 -1, i8 0, i8 0, i8 0, i8 -1, i8 0, i8 0, i8 0, i8 -1>
  %val2rgb = and <16 x i8> %val2, <i8 -1, i8 -1, i8 -1, i8 0, i8 -1, i8 -1, i8 -1, i8 0, i8 -1, i8 -1, i8 -1, i8 0, i8 -1, i8 -1, i8 -1, i8 0>
  %res = or <16 x i8> %val1a, %val2rgb
  ret <16 x i8> %res
}

define <16 x i8> @vsel(<16 x i8> %val1, <16 x i8> %val2) {
entry:
  %res = select <16 x i1> <i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false, i1 true>, <16 x i8> %val1, <16 x i8> %val2
   ret <16 x i8> %res
}

The former gets compiled to:
        vmovdqa .LCPI0_0(%rip), %xmm2   # xmm2 = [255,255,255,0,255,255,255,0,255,255,255,0,255,255,255,0]
        vpblendvb       %xmm2, %xmm1, %xmm0, %xmm0
        retq

But the latter to:
        vmovdqa .LCPI1_0(%rip), %xmm2   # xmm2 = [0,0,0,255,0,0,0,255,255,255,255,255,255,255,255,255]
        vpblendvb       %xmm2, %xmm0, %xmm1, %xmm0
        retq

So only the first 8 of the 16 values are correct (and yes the .LCPI1_0 byte values look the same as indicated in the comment).
Comment 1 Roland Scheidegger 2016-04-17 22:26:41 UTC
Fixed by d11111a5510815afb73f3a863330ddf51d5021df.

(For reference, the actual llvm issue was https://llvm.org/bugs/show_bug.cgi?id=24532)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.