mesa: 13862812dc910a4ef57cb72cb9fe777ce3c14515 (master 10.4.0-devel) $ ./bin/gl-1.0-dlist-beginend -auto Segmentation fault (core dumped) (gdb) bt full #0 0x00007fa115586288 in ?? () No symbol table info available. #1 0x0000000000000000 in ?? () No symbol table info available.
This is introduced with llvm-3.6.0svn r220138. 9b2d091a9c2509a766a233a64453462faa2bdffc is the first bad commit commit 9b2d091a9c2509a766a233a64453462faa2bdffc Author: Chandler Carruth <chandlerc@gmail.com> Date: Sat Oct 18 06:36:22 2014 +0000 [InstCombine] Do an about-face on how LLVM canonicalizes (cast (load ...)) and (load (cast ...)): canonicalize toward the former. Historically, we've tried to load using the type of the *pointer*, and tried to match that type as closely as possible removing as many pointer casts as we could and trading them for bitcasts of the loaded value. This is deeply and fundamentally wrong. Repeat after me: memory does not have a type! This was a hard lesson for me to learn working on SROA. There is only one thing that should actually drive the type used for a pointer, and that is the type which we need to use to load from that pointer. Matching up pointer types to the loaded value types is very useful because it minimizes the physical size of the IR required for no-op casts. Similarly, the only thing that should drive the type used for a loaded value is *how that value is used*! Again, this minimizes casts. And in fact, the *only* thing motivating types in any part of LLVM's IR are the types used by the operations in the IR. We should match them as closely as possible. I've ended up removing some tests here as they were testing bugs or behavior that is no longer present. Mostly though, this is just cleanup to let the tests continue to function as intended. The only fallout I've found so far from this change was SROA and I have fixed it to not be impeded by the different type of load. If you find more places where this change causes optimizations not to fire, those too are likely bugs where we are assuming that the type of pointers is "significant" for optimization purposes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220138 91177308-0d34-0410-b5e6-96231b3b80d8 :040000 040000 982ecfc2706221c9731e5e8f54346370d2fe6283 667a68ea436b53840beda4b034cceee0e87dc6eb M lib :040000 040000 a6f61cd09e99c468c8fdb8dd6bda10060b4f751f cad7ad5d09cf0f43b3f47e36050c833b2debc9fc M test bisect run success
This is probably along the same lines as https://bugs.freedesktop.org/show_bug.cgi?id=85377 However it seems that, this time around, its not in an unit test, but that we're actually making some invalida assumptions about data alignment.
$ gdb --args ./bin/gl-1.0-dlist-beginend -auto [...] Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7eb8288 in ?? () (gdb) disassemble 0x00007ffff7eb8240,0x00007ffff7eb82ff Dump of assembler code from 0x7ffff7eb8240 to 0x7ffff7eb82ff: 0x00007ffff7eb8240: mov (%r15),%eax 0x00007ffff7eb8243: mul %edi 0x00007ffff7eb8245: seto %dl 0x00007ffff7eb8248: add 0x4(%r15),%eax 0x00007ffff7eb824c: setb %bl 0x00007ffff7eb824f: mov %eax,%edi 0x00007ffff7eb8251: add $0x10,%edi 0x00007ffff7eb8254: setb %r8b 0x00007ffff7eb8258: vxorpd %xmm1,%xmm1,%xmm1 0x00007ffff7eb825c: vxorps %xmm13,%xmm13,%xmm13 0x00007ffff7eb8261: cmp 0x8(%r9),%edi 0x00007ffff7eb8265: ja 0x7ffff7eb828d 0x00007ffff7eb8267: vxorps %xmm13,%xmm13,%xmm13 0x00007ffff7eb826c: test %dl,%dl 0x00007ffff7eb826e: jne 0x7ffff7eb828d 0x00007ffff7eb8270: vxorps %xmm13,%xmm13,%xmm13 0x00007ffff7eb8275: test %bl,%bl 0x00007ffff7eb8277: jne 0x7ffff7eb828d 0x00007ffff7eb8279: vxorps %xmm13,%xmm13,%xmm13 0x00007ffff7eb827e: test %r8b,%r8b 0x00007ffff7eb8281: jne 0x7ffff7eb828d 0x00007ffff7eb8283: mov (%r9),%rdx 0x00007ffff7eb8286: cltq => 0x00007ffff7eb8288: vmovaps (%rdx,%rax,1),%xmm13 [...] (gdb) info registers rax 0x18 24 <======= not 16-byte aligned!! rbx 0x0 0 rcx 0x0 0 rdx 0x6d29c0 7154112 rsi 0x73c920 7588128 rdi 0x28 40 rbp 0x7fffffffd330 0x7fffffffd330 rsp 0x7fffffffd160 0x7fffffffd160 r8 0x0 0 r9 0x620e78 6426232 r10 0x621e98 6430360 r11 0x3 3 r12 0x0 0 r13 0x7 7 r14 0x0 0 r15 0x620948 6424904 rip 0x7ffff7eb8288 0x7ffff7eb8288 eflags 0x10246 [ PF ZF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) bt #0 0x00007ffff7eb8288 in ?? () #1 0x00007fff00000000 in ?? () #2 0x00007fff00000000 in ?? () #3 0x000000000077cb99 in ?? () #4 0x000000000062fad8 in ?? () #5 0xffff4000ffff4000 in ?? () #6 0xffff4000ffff4000 in ?? () [...] (gdb) source ~/work/vmware/scripts/unwind-x64.gdb frame[0]: rip = 0xf7eb8288, rbp = 0xffffd330 No symbol matches $function. arg[0] = 0x0 arg[1] = 0x0 arg[2] = 0x0 arg[3] = 0xffffd420 frame[1]: rip = 0xf63abe28, rbp = 0xffffd4a0 llvm_pipeline_generic + 662 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1 arg[0] = 0x73ecd0 arg[1] = 0x4 arg[2] = 0x6328f0 arg[3] = 0x732940 frame[2]: rip = 0xf63ac321, rbp = 0xffffd530 llvm_middle_end_linear_run + 140 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1 arg[0] = 0xffffd588 arg[1] = 0x0 arg[2] = 0x62faa0 arg[3] = 0xffffd5d0 frame[3]: rip = 0xf62e357e, rbp = 0xffffd560 vsplit_segment_simple_linear + 95 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1 arg[0] = 0x62faa0 arg[1] = 0xf6e6de6f arg[2] = 0x4 arg[3] = 0x6328f0 frame[4]: rip = 0xf62e3875, rbp = 0xffffd5d0 vsplit_run_linear + 233 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1 arg[0] = 0x2 arg[1] = 0x0 arg[2] = 0x620860 arg[3] = 0x4 frame[5]: rip = 0xf62d74fe, rbp = 0xffffd620 draw_pt_arrays + 703 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1 arg[0] = 0x620860 arg[1] = 0xffffd680 arg[2] = 0x61ec48 arg[3] = 0x1fa0 frame[6]: rip = 0xf62d8363, rbp = 0xffffd6b0 draw_vbo + 644 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1 arg[0] = 0x61dba0 arg[1] = 0xffffd740 arg[2] = 0x0 arg[3] = 0x0 frame[7]: rip = 0xf5ea9dfd, rbp = 0xffffd710 llvmpipe_draw_vbo + 924 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1 arg[0] = 0x6b9e40 arg[1] = 0x0 arg[2] = 0x61dba0 arg[3] = 0xffffd830
$ GALLIUM_TRACE=gl-1.0-dlist-beginend.gtrace ./bin/gl-1.0-dlist-beginend -auto $ vim gl-1.0-dlist-beginend.gtrace # manually add the </call> and </trace> tags $ python mesa/src/gallium/tools/trace/dump_state.py -d 1 gl-1.0-dlist-beginend.gtrace [...] "vertex_buffers": [ { "buffer": { "array_size": 1, "bind": 16, "data": [ "000000000000803f000000000000803f", "000080bf000080bf000000000000803f", "000000000000803f0000803f000080bf", "000000000000803f000000000000803f", "0000803f0000803f000000000000803f", "000000000000803f000080bf0000803f", [...] ], "depth": 1, "flags": 0, "height": 1, "last_level": 0, "nr_samples": 1, "target": 0, "usage": 3, "width": 65536 }, "buffer_offset": 0, "stride": 24, <=========================== "user_buffer": null } ], "vertex_elements": [ { "src_format": "PIPE_FORMAT_R32G32_FLOAT", "src_offset": 16, "vertex_buffer_index": 0 }, { "src_format": "PIPE_FORMAT_R32G32B32A32_FLOAT", <===================== but llvmpipe/draw/gallivm is probably generating code that assumes this attribute is 16-byte aligned... "src_offset": 0, "vertex_buffer_index": 0 } ], In short: - vertex stride is not multiple of 16bytes - but llvmpipe/draw/gallivm is probably generating code that assumes that the R32G32B32A32_FLOAT attribute is 16-byte aligned... I think the bug is draw llvm code -- it can't assume that vertex elements are 16byte aligned. In face, unless it advertises TRUE to PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY it shouldn't even assume it's 4-byte aligned. (And BTW, llvpipe should probably advertise TRUE for these.)
This change fixes it diff --git a/src/gallium/auxiliary/gallivm/lp_bld_gather.c b/src/gallium/auxiliary/gallivm/lp_bld_gather.c index 9155d81..d69bb4a 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_gather.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_gather.c @@ -93,6 +93,12 @@ lp_build_gather_elem(struct gallivm_state *gallivm, ptr = LLVMBuildBitCast(gallivm->builder, ptr, src_ptr_type, ""); res = LLVMBuildLoad(gallivm->builder, ptr, ""); + // Enforce maximum 4-byte alignment + if (src_width % 32 == 0 && + src_width > 32) { + lp_set_load_alignment(res, 4); + } + assert(src_width <= dst_width); if (src_width > dst_width) { res = LLVMBuildTrunc(gallivm->builder, res, dst_elem_type, ""); but it is overkill. The right fix is to pass through the expected aligment as a new argument everywhere, such that only the code path generate_fetch -> lp_build_fetch_rgba_aos -> ... -> lp_build_gather -> lp_build_gather_elem changes. Over to you Roland..
The analysis is spot on, I'll fix this. We can only really guarantee alignment for textures. I am not sure what exactly to do with the potential non-4byte alignments. Setting the cap bits should force all elements to be 4 byte aligned, meaning for instance ordinary vbos with 3-byte rgb colors in them will have to be converted somewhere, which seems not desirable at least on x86, but other archs might require it.
Fixed by 9ff688a01efd230be7a405c789b1a4d17e7be35d.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.