Bug 85467

Summary: [llvmpipe] piglit gl-1.0-dlist-beginend failure with llvm-3.6.0svn
Product: Mesa Reporter: Vinson Lee <vlee>
Component: Mesa coreAssignee: Roland Scheidegger <sroland>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: brianp, jfonseca, sroland
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Vinson Lee 2014-10-26 08:18:21 UTC
mesa: 13862812dc910a4ef57cb72cb9fe777ce3c14515 (master 10.4.0-devel)

$ ./bin/gl-1.0-dlist-beginend -auto
Segmentation fault (core dumped)

(gdb) bt full
#0  0x00007fa115586288 in ?? ()
No symbol table info available.
#1  0x0000000000000000 in ?? ()
No symbol table info available.
Comment 1 Vinson Lee 2014-10-27 23:01:45 UTC
This is introduced with llvm-3.6.0svn r220138.

9b2d091a9c2509a766a233a64453462faa2bdffc is the first bad commit
commit 9b2d091a9c2509a766a233a64453462faa2bdffc
Author: Chandler Carruth <chandlerc@gmail.com>
Date:   Sat Oct 18 06:36:22 2014 +0000

    [InstCombine] Do an about-face on how LLVM canonicalizes (cast (load
    ...)) and (load (cast ...)): canonicalize toward the former.
    
    Historically, we've tried to load using the type of the *pointer*, and
    tried to match that type as closely as possible removing as many pointer
    casts as we could and trading them for bitcasts of the loaded value.
    This is deeply and fundamentally wrong.
    
    Repeat after me: memory does not have a type! This was a hard lesson for
    me to learn working on SROA.
    
    There is only one thing that should actually drive the type used for
    a pointer, and that is the type which we need to use to load from that
    pointer. Matching up pointer types to the loaded value types is very
    useful because it minimizes the physical size of the IR required for
    no-op casts. Similarly, the only thing that should drive the type used
    for a loaded value is *how that value is used*! Again, this minimizes
    casts. And in fact, the *only* thing motivating types in any part of
    LLVM's IR are the types used by the operations in the IR. We should
    match them as closely as possible.
    
    I've ended up removing some tests here as they were testing bugs or
    behavior that is no longer present. Mostly though, this is just cleanup
    to let the tests continue to function as intended.
    
    The only fallout I've found so far from this change was SROA and I have
    fixed it to not be impeded by the different type of load. If you find
    more places where this change causes optimizations not to fire, those
    too are likely bugs where we are assuming that the type of pointers is
    "significant" for optimization purposes.
    
    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220138 91177308-0d34-0410-b5e6-96231b3b80d8

:040000 040000 982ecfc2706221c9731e5e8f54346370d2fe6283 667a68ea436b53840beda4b034cceee0e87dc6eb M	lib
:040000 040000 a6f61cd09e99c468c8fdb8dd6bda10060b4f751f cad7ad5d09cf0f43b3f47e36050c833b2debc9fc M	test
bisect run success
Comment 2 Jose Fonseca 2014-10-28 13:43:13 UTC
This is probably along the same lines as https://bugs.freedesktop.org/show_bug.cgi?id=85377

However it seems that, this time around, its not in an unit test, but that we're actually making some invalida assumptions about data alignment.
Comment 3 Jose Fonseca 2014-10-28 13:56:48 UTC
$ gdb --args ./bin/gl-1.0-dlist-beginend -auto
[...]
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7eb8288 in ?? ()
(gdb) disassemble 0x00007ffff7eb8240,0x00007ffff7eb82ff
Dump of assembler code from 0x7ffff7eb8240 to 0x7ffff7eb82ff:
   0x00007ffff7eb8240:	mov    (%r15),%eax
   0x00007ffff7eb8243:	mul    %edi
   0x00007ffff7eb8245:	seto   %dl
   0x00007ffff7eb8248:	add    0x4(%r15),%eax
   0x00007ffff7eb824c:	setb   %bl
   0x00007ffff7eb824f:	mov    %eax,%edi
   0x00007ffff7eb8251:	add    $0x10,%edi
   0x00007ffff7eb8254:	setb   %r8b
   0x00007ffff7eb8258:	vxorpd %xmm1,%xmm1,%xmm1
   0x00007ffff7eb825c:	vxorps %xmm13,%xmm13,%xmm13
   0x00007ffff7eb8261:	cmp    0x8(%r9),%edi
   0x00007ffff7eb8265:	ja     0x7ffff7eb828d
   0x00007ffff7eb8267:	vxorps %xmm13,%xmm13,%xmm13
   0x00007ffff7eb826c:	test   %dl,%dl
   0x00007ffff7eb826e:	jne    0x7ffff7eb828d
   0x00007ffff7eb8270:	vxorps %xmm13,%xmm13,%xmm13
   0x00007ffff7eb8275:	test   %bl,%bl
   0x00007ffff7eb8277:	jne    0x7ffff7eb828d
   0x00007ffff7eb8279:	vxorps %xmm13,%xmm13,%xmm13
   0x00007ffff7eb827e:	test   %r8b,%r8b
   0x00007ffff7eb8281:	jne    0x7ffff7eb828d
   0x00007ffff7eb8283:	mov    (%r9),%rdx
   0x00007ffff7eb8286:	cltq   
=> 0x00007ffff7eb8288:	vmovaps (%rdx,%rax,1),%xmm13
[...]
(gdb) info registers
rax            0x18	24                            <======= not 16-byte aligned!!
rbx            0x0	0
rcx            0x0	0
rdx            0x6d29c0	7154112
rsi            0x73c920	7588128
rdi            0x28	40
rbp            0x7fffffffd330	0x7fffffffd330
rsp            0x7fffffffd160	0x7fffffffd160
r8             0x0	0
r9             0x620e78	6426232
r10            0x621e98	6430360
r11            0x3	3
r12            0x0	0
r13            0x7	7
r14            0x0	0
r15            0x620948	6424904
rip            0x7ffff7eb8288	0x7ffff7eb8288
eflags         0x10246	[ PF ZF IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0
(gdb) bt
#0  0x00007ffff7eb8288 in ?? ()
#1  0x00007fff00000000 in ?? ()
#2  0x00007fff00000000 in ?? ()
#3  0x000000000077cb99 in ?? ()
#4  0x000000000062fad8 in ?? ()
#5  0xffff4000ffff4000 in ?? ()
#6  0xffff4000ffff4000 in ?? ()
[...]
(gdb) source ~/work/vmware/scripts/unwind-x64.gdb 
frame[0]: rip = 0xf7eb8288, rbp = 0xffffd330
No symbol matches $function.
  arg[0] = 0x0
  arg[1] = 0x0
  arg[2] = 0x0
  arg[3] = 0xffffd420
frame[1]: rip = 0xf63abe28, rbp = 0xffffd4a0
llvm_pipeline_generic + 662 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1
  arg[0] = 0x73ecd0
  arg[1] = 0x4
  arg[2] = 0x6328f0
  arg[3] = 0x732940
frame[2]: rip = 0xf63ac321, rbp = 0xffffd530
llvm_middle_end_linear_run + 140 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1
  arg[0] = 0xffffd588
  arg[1] = 0x0
  arg[2] = 0x62faa0
  arg[3] = 0xffffd5d0
frame[3]: rip = 0xf62e357e, rbp = 0xffffd560
vsplit_segment_simple_linear + 95 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1
  arg[0] = 0x62faa0
  arg[1] = 0xf6e6de6f
  arg[2] = 0x4
  arg[3] = 0x6328f0
frame[4]: rip = 0xf62e3875, rbp = 0xffffd5d0
vsplit_run_linear + 233 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1
  arg[0] = 0x2
  arg[1] = 0x0
  arg[2] = 0x620860
  arg[3] = 0x4
frame[5]: rip = 0xf62d74fe, rbp = 0xffffd620
draw_pt_arrays + 703 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1
  arg[0] = 0x620860
  arg[1] = 0xffffd680
  arg[2] = 0x61ec48
  arg[3] = 0x1fa0
frame[6]: rip = 0xf62d8363, rbp = 0xffffd6b0
draw_vbo + 644 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1
  arg[0] = 0x61dba0
  arg[1] = 0xffffd740
  arg[2] = 0x0
  arg[3] = 0x0
frame[7]: rip = 0xf5ea9dfd, rbp = 0xffffd710
llvmpipe_draw_vbo + 924 in section .text of /home/jfonseca/work/vmware/opengl/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1
  arg[0] = 0x6b9e40
  arg[1] = 0x0
  arg[2] = 0x61dba0
  arg[3] = 0xffffd830
Comment 4 Jose Fonseca 2014-10-28 14:10:10 UTC
$ GALLIUM_TRACE=gl-1.0-dlist-beginend.gtrace ./bin/gl-1.0-dlist-beginend -auto
$ vim gl-1.0-dlist-beginend.gtrace # manually add the </call> and </trace> tags
$ python mesa/src/gallium/tools/trace/dump_state.py -d 1 gl-1.0-dlist-beginend.gtrace
[...]
    "vertex_buffers": [
        {
            "buffer": {
                "array_size": 1,
                "bind": 16,
                "data": [
                    "000000000000803f000000000000803f",
                    "000080bf000080bf000000000000803f",
                    "000000000000803f0000803f000080bf",
                    "000000000000803f000000000000803f",
                    "0000803f0000803f000000000000803f",
                    "000000000000803f000080bf0000803f",
    [...]
                ],
                "depth": 1,
                "flags": 0,
                "height": 1,
                "last_level": 0,
                "nr_samples": 1,
                "target": 0,
                "usage": 3,
                "width": 65536
            },
            "buffer_offset": 0,
            "stride": 24,                     <===========================
            "user_buffer": null
        }
    ],
    "vertex_elements": [
        {
            "src_format": "PIPE_FORMAT_R32G32_FLOAT",
            "src_offset": 16,
            "vertex_buffer_index": 0
        },
        {
            "src_format": "PIPE_FORMAT_R32G32B32A32_FLOAT", <===================== but llvmpipe/draw/gallivm is probably generating code that assumes this attribute is 16-byte aligned...
            "src_offset": 0,
            "vertex_buffer_index": 0
        }
    ],


In short:
- vertex stride is not multiple of 16bytes
- but llvmpipe/draw/gallivm is probably generating code that assumes that the R32G32B32A32_FLOAT attribute is 16-byte aligned...


I think the bug is draw llvm code -- it can't assume that vertex elements are 16byte aligned.

In face, unless it advertises TRUE to
  PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY
  PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY
  PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY
it shouldn't even assume it's 4-byte aligned. (And BTW, llvpipe should probably advertise TRUE for these.)
Comment 5 Jose Fonseca 2014-10-28 14:29:07 UTC
This change fixes it

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_gather.c b/src/gallium/auxiliary/gallivm/lp_bld_gather.c
index 9155d81..d69bb4a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_gather.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_gather.c
@@ -93,6 +93,12 @@ lp_build_gather_elem(struct gallivm_state *gallivm,
    ptr = LLVMBuildBitCast(gallivm->builder, ptr, src_ptr_type, "");
    res = LLVMBuildLoad(gallivm->builder, ptr, "");
 
+   // Enforce maximum 4-byte alignment
+   if (src_width % 32 == 0 &&
+       src_width > 32) {
+      lp_set_load_alignment(res, 4); 
+   }
+
    assert(src_width <= dst_width);
    if (src_width > dst_width) {
       res = LLVMBuildTrunc(gallivm->builder, res, dst_elem_type, "");


but it is overkill.  The right fix is to pass through the expected aligment as a new argument everywhere, such that only the code path

 generate_fetch -> lp_build_fetch_rgba_aos -> ... -> lp_build_gather -> lp_build_gather_elem 

changes.


Over to you Roland..
Comment 6 Roland Scheidegger 2014-10-28 17:44:55 UTC
The analysis is spot on, I'll fix this. We can only really guarantee alignment for textures.
I am not sure what exactly to do with the potential non-4byte alignments. Setting the cap bits should force all elements to be 4 byte aligned, meaning for instance ordinary vbos with 3-byte rgb colors in them will have to be converted somewhere, which seems not desirable at least on x86, but other archs might require it.
Comment 7 Roland Scheidegger 2014-11-18 14:28:54 UTC
Fixed by 9ff688a01efd230be7a405c789b1a4d17e7be35d.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.