94972 – blend failures on llvmpipe with llvm 3.7 due to vector selects

Bug 94972 - blend failures on llvmpipe with llvm 3.7 due to vector selects

Summary: blend failures on llvmpipe with llvm 3.7 due to vector selects

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Mesa core (show other bugs)
Version:	git
Hardware:	All All

Importance:	medium normal
Assignee:	mesa-dev
QA Contact:	mesa-dev

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-04-16 21:16 UTC by Roland Scheidegger
Modified:	2016-04-17 22:26 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments

Description Roland Scheidegger 2016-04-16 21:16:49 UTC

Using vector selects (cdf7c6b83dad7eb6a7600af61403315b02dcf13f) caused some regressions when using llvm 3.7 (large number of blend tests in deqp, also mentioned in bug 94957, piglit gl-1.0-blend-func, and dolphin was mentioned on IRC).

The problem is actually entirely llvm's fault after some digging (initially wasn't sure if we weren't relying on some undefined behavior, hence tracked this down), which very obviously miscompiles rather simple vector selects (I suspect this only affects per-byte selects, possibly only when constant values are involved).
It only seems to affect 3.7, I've tried 3.3 through 3.8 and everything else worked.

Mostly filing a bug so I've got something to refer to when working around it in mesa.

define <16 x i8> @novsel(<16 x i8> %val1, <16 x i8> %val2) {
entry:
  %val1a = and <16 x i8> %val1, <i8 0, i8 0, i8 0, i8 -1, i8 0, i8 0, i8 0, i8 -1, i8 0, i8 0, i8 0, i8 -1, i8 0, i8 0, i8 0, i8 -1>
  %val2rgb = and <16 x i8> %val2, <i8 -1, i8 -1, i8 -1, i8 0, i8 -1, i8 -1, i8 -1, i8 0, i8 -1, i8 -1, i8 -1, i8 0, i8 -1, i8 -1, i8 -1, i8 0>
  %res = or <16 x i8> %val1a, %val2rgb
  ret <16 x i8> %res
}

define <16 x i8> @vsel(<16 x i8> %val1, <16 x i8> %val2) {
entry:
  %res = select <16 x i1> <i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false, i1 true>, <16 x i8> %val1, <16 x i8> %val2
   ret <16 x i8> %res
}

The former gets compiled to:
        vmovdqa .LCPI0_0(%rip), %xmm2   # xmm2 = [255,255,255,0,255,255,255,0,255,255,255,0,255,255,255,0]
        vpblendvb       %xmm2, %xmm1, %xmm0, %xmm0
        retq

But the latter to:
        vmovdqa .LCPI1_0(%rip), %xmm2   # xmm2 = [0,0,0,255,0,0,0,255,255,255,255,255,255,255,255,255]
        vpblendvb       %xmm2, %xmm0, %xmm1, %xmm0
        retq

So only the first 8 of the 16 values are correct (and yes the .LCPI1_0 byte values look the same as indicated in the comment).

Comment 1 Roland Scheidegger 2016-04-17 22:26:41 UTC

Fixed by d11111a5510815afb73f3a863330ddf51d5021df.

(For reference, the actual llvm issue was https://llvm.org/bugs/show_bug.cgi?id=24532)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.