Bug 29407

Summary:	llvm 2.8 asserts: "Tried to create an integer operation on a non-integer type!"
Product:	Mesa	Reporter:	nobled <nobled>
Component:	Mesa core	Assignee:	mesa-dev
Status:	RESOLVED FIXED	QA Contact:
Severity:	normal
Priority:	medium	CC:	steckdenis
Version:	git
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Attachments:	gallivm: Always use floating point operators for floating point values (v1) gallivm: Always use floating point operators for floating point values (v2) llvmpipe: Use floating-point operators for floating-point types Output of glxgears when using llvmpipe Compressed output of neverball when running with llvmpipe (4,7 MB uncompressed) draw: Use the correct type gallivm: Always use floating point operators for floating point values (v3) llvmpipe: Always use floating-point operators for floating-point types draw: Use the correct type for integers Use fneg instead of neg when inverting the sign of floats gallivm: Always use floating point operators for floating point values (v4) gallivm: Use the correct context for integers gallivm: Always use floating point operators for floating point values (v5) gallivm: Even more type checking gallivm: Even more type checking gallivm: Fix bitwise ops for floats, division for integers

Description nobled 2010-08-05 07:12:59 UTC

mesa git: 48268e0f2a5e65b63586398db3a58523a8c7a7a0
LLVM svn: r110299

When linking with the latest LLVM trunk compiled in Debug mode, Mesa triggers this assert:
lp_test_blend: ../llvm-2.8svn/lib/VMCore/Instructions.cpp:1609: void llvm::BinaryOperator::init(llvm::Instruction::BinaryOps): Assertion `getType()->isIntOrIntVectorTy() && "Tried to create an integer operation on a non-integer type!"' failed.

lp_test_blend, lp_test_conv, and lp_test_format all trigger it from completely different places in gallivm. The problem is apparently using stuff like LLVMBuildMul (an integer-only function) when it should be LLVMBuildFMul.

FMul, FSub, and FAdd were added in LLVM 2.6, btw: http://llvm.org/releases/2.6/docs/ReleaseNotes.html

Comment 1 steckdenis 2010-08-05 07:28:35 UTC

I can confirm this bug. When compiled in debug mode, I get many lines like that :

Integer arithmetic operators only work with integral types!
  %239 = mul <4 x float> %163, %238               ; <<4 x float>> [#uses=1]

I posted a bug here ( #29404 ) and in the LLVM bugzilla : http://llvm.org/bugs/show_bug.cgi?id=7813 (this last bug report contains the output of Gallium and LLVM when trying to launch neverball).

Comment 2 Brian Paul 2010-08-05 08:46:15 UTC

We haven't been testing with LLVM 2.8 / SVN much.  You shouldn't have trouble with LLVM 2.6 or 2.7.

Comment 3 nobled 2010-08-05 09:31:49 UTC

Created attachment 37590 [details] [review]
gallivm: Always use floating point operators for floating point values (v1)

(In reply to comment #2)
> We haven't been testing with LLVM 2.8 / SVN much.  You shouldn't have trouble
> with LLVM 2.6 or 2.7.
Or I could just turn asserts off, yeah, but I was hoping to help patch the bug.

Unfortunately fixing this(or at least in the three files I looked at) introduces regressions; with this patch the asserts are gone as far as I tested but lp_test_format fails on PIPE_FORMAT_R8G8Bx_SNORM.

Output:
---
Testing PIPE_FORMAT_R10SG10SB10SA2U_NORM (float) ...
Testing PIPE_FORMAT_R10SG10SB10SA2U_NORM (unorm8) ...
Testing PIPE_FORMAT_R8G8Bx_SNORM (float) ...
FAILED
  Packed: 7f 00 00 00
  Unpacked (0,0): 1.000000 0.000000 0.000086 1.000000 obtained
                  1.000000 0.000000 0.000000 1.000000 expected
FAILED
  Packed: 81 00 00 00
  Unpacked (0,0): -1.000000 0.000000 0.000086 1.000000 obtained
                  -1.000000 0.000000 0.000000 1.000000 expected
FAILED
  Packed: 00 7f 00 00
  Unpacked (0,0): 0.000000 1.000000 0.000086 1.000000 obtained
                  0.000000 1.000000 0.000000 1.000000 expected
FAILED
  Packed: 00 81 00 00
  Unpacked (0,0): 0.000000 -1.000000 0.000086 1.000000 obtained
                  0.000000 -1.000000 0.000000 1.000000 expected

define void @fetch_r8g8bx_snorm_float(<4 x float>*, i8*, i32, i32) {
entry:
  %4 = alloca <4 x float>                         ; <<4 x float>*> [#uses=2]
  %5 = bitcast <4 x float>* %4 to float*          ; <float*> [#uses=1]
  %6 = bitcast i8* %1 to i8*                      ; <i8*> [#uses=1]
  call void @util_format_r8g8bx_snorm_fetch_rgba_float(float* %5, i8* %6, i32 %2, i32 %3)
  %7 = load <4 x float>* %4                       ; <<4 x float>> [#uses=1]
  store <4 x float> %7, <4 x float>* %0
  ret void
}

Testing PIPE_FORMAT_R8G8Bx_SNORM (unorm8) ...
Testing PIPE_FORMAT_R8G8B8X8_UNORM (float) ...
--

Comment 4 nobled 2010-08-06 04:46:42 UTC

Created attachment 37630 [details] [review]
gallivm: Always use floating point operators for floating point values (v2)

Comment 5 nobled 2010-08-06 04:49:01 UTC

Created attachment 37631 [details] [review]
llvmpipe: Use floating-point operators for floating-point types

These two patches together fix all the places that seemed to be going wrong.

Comment 6 steckdenis 2010-08-06 05:53:16 UTC

Created attachment 37636 [details]
Output of glxgears when using llvmpipe

Hello,

Thank you for your patch, the situation improved, but one small error message is still there : it seems you have replaced too many i32 instructions with their f32 counterpart.

I attached the output of glxgears when running with llvmpipe to this bug report. I have applied you two patches, and I run Mesa Master ("draw: Avoid mixed declarations and code.").

Comment 7 steckdenis 2010-08-06 08:14:55 UTC

Created attachment 37640 [details]
Compressed output of neverball when running with llvmpipe (4,7 MB uncompressed)

Hello,

I looked at the code and found the source of the bug I reported in my previous comment. The line 231 in gallivm/lp_bld_arit.c uses the lp_type found in the lp_build_context struct, without taking in account the ones of a and b.

When this function is called by draw_llvm_generate (line 737 of draw/draw_llvm.c), this type is set as .floating=1 and .fixed=0, by lp_build_context_init() I think.

This is wrong for this first addition, made just after the prolog of the LLVM function. I made a quick and highly dirty hack to work around this issue :

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c
index de99b00..8e40c35 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -734,7 +734,11 @@ draw_llvm_generate(struct draw_llvm *llvm, struct draw_llvm_variant *variant)
 
    lp_build_context_init(&bld, builder, vs_type);
 
+   bld.type.fixed = 1;
+   bld.type.floating = 0;
    end = lp_build_add(&bld, start, count);
+   bld.type.fixed = 0;
+   bld.type.floating = 1;
 
    step = LLVMConstInt(LLVMInt32Type(), max_vertices, 0);

It's awful, but it works. I was able to see Glxgears running at 97 fps entirely software rendered, on my AMD Athlon 64 L110 at 1,2Ghz, single core, without HT, with 512Kb of cache. Good job, it is nearly seven times as fast as softpipe !

Unfortunately, I was not able to test llvmpipe with Neverball because some errors are still there. I looked at your patch, and it is full of "if (type.floating)", where type is the lp_type of the current lp_build_context.

Is it possible to use the LLVMValueRefs passed to all these functions to get their type, and to choose the correct function for them ? It will be more robust than assuming that all variables used in a shader are of the same type.

Note:

I found in the Core.h file of LLVM some interesting things : 

LLVMTypeRef LLVMTypeOf(LLVMValueRef Val);  // You get the type of the variable
LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty); // Now you have informations for the type

// And here are the types :
typedef enum {
  LLVMVoidTypeKind,
  LLVMFloatTypeKind,
  LLVMDoubleTypeKind,
  LLVMX86_FP80TypeKind,
  LLVMFP128TypeKind,
  LLVMPPC_FP128TypeKind,
  LLVMLabelTypeKind,
  LLVMIntegerTypeKind,     // The interesting one
  LLVMFunctionTypeKind,
  LLVMStructTypeKind,
  LLVMArrayTypeKind,
  LLVMPointerTypeKind,
  LLVMOpaqueTypeKind,
  LLVMVectorTypeKind,      // Here we don't known what is contained in the vector
  LLVMMetadataTypeKind,
  LLVMUnionTypeKind
} LLVMTypeKind;

But I don't know if it's possible, and if it will be fast enough.

Comment 8 nobled 2010-08-06 09:17:33 UTC

Created attachment 37642 [details] [review]
draw: Use the correct type

(In reply to comment #7)
> I looked at the code and found the source of the bug I reported in my previous
> comment. The line 231 in gallivm/lp_bld_arit.c uses the lp_type found in the
> lp_build_context struct, without taking in account the ones of a and b.
I'm pretty sure a and b *have* to both be the same type and be used with a context that mentions what type they are. If they aren't then it's a bug in draw_llvm_generate, not the patch...

*checks* Okay, it says 'start' and 'count' are both supposed to be LLVMInt32Type, but it sets vs_type to lp_type_float_vec(32). Patch attached.

Comment 9 Jose Fonseca 2010-08-06 11:37:20 UTC

I like the direction of the nobled patches.

But that said, I can only accept the patches when they provoke no regressions for llvm-2.6, regardless where the fault lies.

It appears the draw module needs two lp_bld_contexts: one for floating point operations, other for the integer indices operations. It was indeed a coincidence that things worked so far.

Making the lp_bld_add() automatically detect the incoming variables is *not* the solution. That would only hide the problems and make it much harder to debug later, especially because the lp_type is richer than LLVM native vector types (e.g., fixed point, normalized values etc). Instead, we should add more asserts like

  assert(lp_check_vec_type(bld->type, a));

to ensure the callers are using consistent types.

Comment 10 nobled 2010-08-06 14:37:55 UTC

Created attachment 37651 [details] [review]
gallivm: Always use floating point operators for floating point values (v3)

Comment 11 nobled 2010-08-06 14:39:16 UTC

Created attachment 37652 [details] [review]
llvmpipe: Always use floating-point operators for floating-point types

Comment 12 nobled 2010-08-06 14:42:19 UTC

Created attachment 37653 [details] [review]
draw: Use the correct type for integers

With all three patches applied, I can't find any regressions using the lp_test_blend/conv/format tests. Could other people test? (the patch for the draw module at least could go in right now though.)

Comment 13 nobled 2010-08-06 14:47:54 UTC

(In reply to comment #12)
> Created an attachment (id=37653) [details]
> draw: Use the correct type for integers
> 
> With all three patches applied, I can't find any regressions using the
> lp_test_blend/conv/format tests. Could other people test? (the patch for the
> draw module at least could go in right now though.)

Actually, even the regression I thought I found in comment #3 with just the first patch wasn't a real regression; that format was already failing with llvm 2.7 (didn't test llvm 2.6).

BTW, upgrading to llvm 2.8 fixed every single other format test failure but that one.

Comment 14 steckdenis 2010-08-08 07:44:55 UTC

Created attachment 37702 [details]
Use fneg instead of neg when inverting the sign of floats

Hello,

I have very good news for you !

I recompiled my LLVM with all of its debug capabilities and asserts, and I ran Neverball on it. I used GDB to catch the precise moment when a wrong instruction is emitted, and I fixed the problem.

The fact is that more instructions than add, sub, mul and div need their float counterpart. My patch fixes two "neg" instructions in lp_bld_tgsi_soa.c. It was the fix I needed to be able to run Neverball. I will now see if there are more places where an instruction needs to be replaced with its float counterpart, but I need to find GL programs that use them. Maybe piglit can help me.

The second good news is that Neverball runs really fine with LLVM. I have an AMD Athlon 64 X2 processor at 1,2Ghz, and Neverball ran at nearly 5 fps (800x600), with its graphics settings set to the maximum. My processor was also busy rendering a really big 3D scene, so only the half of the processing power was available to llvmpipe. They weren't any glitches on the screen, even the beautiful reflections of Neverball worked perfectly.

Thanks for all :) .

Comment 15 nobled 2010-08-08 13:19:54 UTC

Created attachment 37704 [details] [review]
gallivm: Always use floating point operators for floating point values (v4)

(In reply to comment #14)
> Created an attachment (id=37702) [details]
> Use fneg instead of neg when inverting the sign of floats
[snip]
> The fact is that more instructions than add, sub, mul and div need their float
> counterpart. My patch fixes two "neg" instructions in lp_bld_tgsi_soa.c. It was
> the fix I needed to be able to run Neverball. I will now see if there are more
> places where an instruction needs to be replaced with its float counterpart,
> but I need to find GL programs that use them. Maybe piglit can help me.
> 
Awesome, thanks! (damn barely-mentioning-fneg-at-all release notes...)

I grep'd for all the other instances of LLVMBuildNeg-- updated patch attached (includes yours, with credit and some modifications).

Comment 16 nobled 2010-08-08 13:20:52 UTC

Created attachment 37705 [details] [review]
gallivm: Use the correct context for integers

Fixes another potential mis-match of types.

Comment 17 Jose Fonseca 2010-08-08 14:00:21 UTC

(In reply to comment #12)
> Created an attachment (id=37653) [details]
> draw: Use the correct type for integers
> 
> With all three patches applied, I can't find any regressions using the
> lp_test_blend/conv/format tests. Could other people test? (the patch for the
> draw module at least could go in right now though.)

I've commited the draw module patch. Thanks nobled.

I'll need more time to test the remainder.

Comment 18 Jose Fonseca 2010-08-08 14:22:07 UTC

(In reply to comment #16)
> Created an attachment (id=37705) [details]
> gallivm: Use the correct context for integers
> 
> Fixes another potential mis-match of types.

Applied thanks. Fixed a few more.

Comment 19 nobled 2010-08-08 21:39:32 UTC

Created attachment 37712 [details] [review]
gallivm: Always use floating point operators for floating point values (v5)

(In reply to comment #15)
> Created an attachment (id=37704) [details]
> gallivm: Always use floating point operators for floating point values (v4)
Okay, this version actually compiles this time.

Comment 20 Jose Fonseca 2010-08-09 09:39:22 UTC

(In reply to comment #19)
> Created an attachment (id=37712) [details]
> gallivm: Always use floating point operators for floating point values (v5)
> 
> Okay, this version actually compiles this time.

Commited with a few modifications. Thanks.

Comment 21 Jose Fonseca 2010-08-09 09:43:37 UTC

*** Bug 29404 has been marked as a duplicate of this bug. ***

Comment 22 nobled 2010-08-09 10:44:21 UTC

(In reply to comment #11)
> Created an attachment (id=37652) [details]
> llvmpipe: Always use floating-point operators for floating-point types

Can you commit this one, too?

Comment 23 nobled 2010-08-09 14:17:10 UTC

Created attachment 37746 [details] [review]
gallivm: Even more type checking

Comment 24 nobled 2010-08-09 14:23:54 UTC

Created attachment 37747 [details] [review]
gallivm: Even more type checking

Missed one.

Comment 25 nobled 2010-08-09 14:30:55 UTC

Created attachment 37748 [details] [review]
gallivm: Fix bitwise ops for floats, division for integers

And also these ones?

Comment 26 Jose Fonseca 2010-08-10 02:26:23 UTC

I've applied the three additional patches. Thanks.

(In reply to comment #25)
> Created an attachment (id=37748) [details]
> gallivm: Fix bitwise ops for floats, division for integers
> 
> And also these ones?

This one was full of compile errors...

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.