Bug 94503

Summary: OpenCL kernel segfaults during compilation on Clover RadeonSI with Pitcairn GPU
Product: Mesa Reporter: Tyson Whitehead <twhitehead>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 99553    
Attachments: Simplified kernel that causes compiler segfault
Simplified kernel that causes other (different) compiler segfault

Description Tyson Whitehead 2016-03-11 20:38:43 UTC
Created attachment 122238 [details]
Simplified kernel that causes compiler segfault

I'm running Debian unstable with the mesa 11.1.2 packages installed and ran into an issue whereby the the OpenCL compiler is segfaulting.

I've chopped my kernel down as much as I could and have attached it.  Here is an example run and backtrace

https://github.com/twhitehead/clcc

$ clcc -l
Platform 0: Clover
  Device 0: AMD PITCAIRN (DRM 2.43.0, LLVM 3.7.1)
      Type = [ GPU, Accelerator, Custom ]
      Maximum compute units = 20
      Maximum work item dimensions = 3
      Maximum work item sizes = [ 256, 256, 256 ]
      Maximum work group size = 256
      Image support = False
      Global memory size = 1073741824
      Local memory size = 32768
Platform 1: Intel Gen OCL Driver
  Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
      Type = [ GPU, Accelerator, Custom ]
      Maximum compute units = 20
      Maximum work item dimensions = 3
      Maximum work item sizes = [ 512, 512, 512 ]
      Maximum work group size = 512
      Image support = True
        Image2D maximum width = 8192
        Image2D maximum height = 8192
        Image3D maximum width = 8192
        Image3D maximum height = 8192
        Image3D maximum depth = 2048
      Global memory size = 2147483648
      Local memory size = 65536

$ clcc -p "Clover" test.c
Segmentation fault

With the dbg packages installed the gdb backtrace functions give

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff48a77de in llvm::SlotIndex::getIndex (this=<synthetic pointer>)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/include/llvm/CodeGen/SlotIndexes.h:134

#0  0x00007ffff48a77de in llvm::SlotIndex::getIndex (this=<synthetic pointer>)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/include/llvm/CodeGen/SlotIndexes.h:134
#1  llvm::SlotIndex::operator>= (other=..., this=<synthetic pointer>)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/include/llvm/CodeGen/SlotIndexes.h:202
#2  llvm::LiveRange::find (this=this@entry=0x1df48570, Pos=..., Pos@entry=...)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/CodeGen/LiveInterval.cpp:307
#3  0x00007ffff499e477 in llvm::LiveRange::find (Pos=..., this=0x1df48570)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/include/llvm/CodeGen/LiveInterval.h:272
#4  llvm::LiveRange::liveAt (index=..., this=0x1df48570)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/include/llvm/CodeGen/LiveInterval.h:373
#5  (anonymous namespace)::RegisterCoalescer::updateRegDefsUses (this=this@entry=0x1f7e3f80, SrcReg=2147485471, 
    DstReg=2147485589, SubIdx=17)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/CodeGen/RegisterCoalescer.cpp:1199
#6  0x00007ffff49a3e62 in (anonymous namespace)::RegisterCoalescer::joinCopy (Again=<synthetic pointer>, 
    CopyMI=0x18fa720, this=0x1f7e3f80)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/CodeGen/RegisterCoalescer.cpp:1440
#7  (anonymous namespace)::RegisterCoalescer::copyCoalesceWorkList (this=this@entry=0x1f7e3f80, CurrList=...)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/CodeGen/RegisterCoalescer.cpp:2767
#8  0x00007ffff49a5ecb in (anonymous namespace)::RegisterCoalescer::coalesceLocals (this=this@entry=0x1f7e3f80)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/CodeGen/RegisterCoalescer.cpp:2892
#9  0x00007ffff49a6646 in (anonymous namespace)::RegisterCoalescer::joinAllIntervals (this=0x1f7e3f80)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/CodeGen/RegisterCoalescer.cpp:2923
#10 (anonymous namespace)::RegisterCoalescer::runOnMachineFunction (this=0x1f7e3f80, fn=...)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/CodeGen/RegisterCoalescer.cpp:2968
#11 0x00007ffff41ff037 in llvm::FPPassManager::runOnFunction (this=0x1f7382a0, F=...)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/IR/LegacyPassManager.cpp:1520
#12 0x00007ffff41ff28b in llvm::FPPassManager::runOnModule (this=0x1f7382a0, M=...)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/IR/LegacyPassManager.cpp:1540
#13 0x00007ffff41fecc4 in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x1685c760)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/IR/LegacyPassManager.cpp:1596
#14 llvm::legacy::PassManagerImpl::run (this=0x1dad02a0, M=...)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/IR/LegacyPassManager.cpp:1698
#15 0x00007ffff41fee59 in llvm::legacy::PassManager::run (this=this@entry=0x7fffffffc380, M=...)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/IR/LegacyPassManager.cpp:1729
#16 0x00007ffff4c737a7 in LLVMTargetMachineEmit (T=T@entry=0x1948d20, M=M@entry=0xa40dc0, OS=..., 
    codegen=codegen@entry=LLVMObjectFile, ErrorMessage=ErrorMessage@entry=0x7fffffffc498)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/Target/TargetMachineC.cpp:217
#17 0x00007ffff4c73950 in LLVMTargetMachineEmitToMemoryBuffer (T=T@entry=0x1948d20, M=M@entry=0xa40dc0, 
    codegen=codegen@entry=LLVMObjectFile, ErrorMessage=ErrorMessage@entry=0x7fffffffc498, 
    OutMemBuf=OutMemBuf@entry=0x7fffffffc628)
    at /build/llvm-toolchain-3.7-dRkmpB/llvm-toolchain-3.7-3.7.1/lib/Target/TargetMachineC.cpp:241
#18 0x00007ffff6733c14 in (anonymous namespace)::emit_code (tm=tm@entry=0x1948d20, mod=mod@entry=0xa40dc0, 
    file_type=file_type@entry=LLVMObjectFile, out_buffer=out_buffer@entry=0x7fffffffc628, 
    r_log="test.c:3:21: warning: double precision constant requires cl_khr_fp64, casting to single precision\ntest.c:3:32: warning: double precision constant requires cl_khr_fp64, casting to single precision\ntest"...)
    at ../../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp:621
#19 0x00007ffff6738977 in (anonymous namespace)::compile_native (
    r_log="test.c:3:21: warning: double precision constant requires cl_khr_fp64, casting to single precision\ntest.c:3:32: warning: double precision constant requires cl_khr_fp64, casting to single precision\ntest"..., 
    dump_asm=<optimized out>, processor="pitcairn", triple="amdgcn--", mod=0xa40dc0)
    at ../../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp:675
#20 clover::compile_program_llvm (
    source="#line 1 \"test.c\"\n//", '-' <repeats 111 times>, "//\n__constant const float16 mg_lbT =\n  (float16)(        1.,        0."..., headers=..., ir=<optimized out>, target="pitcairn-amdgcn--", opts="", 
    r_log="test.c:3:21: warning: double precision constant requires cl_khr_fp64, casting to single precision\ntest.c:3:32: warning: double precision constant requires cl_khr_fp64, casting to single precision\ntest"...)
---Type <return> to continue, or q <return> to quit---
    at ../../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp:886
#21 0x00007ffff672e9b0 in clover::program::build (this=this@entry=0xabab90, devs=..., opts=opts@entry=0x736ab0 "", 
    headers=std::vector of length 0, capacity 0)
    at ../../../../../../src/gallium/state_trackers/clover/core/program.cpp:63
#22 0x00007ffff6710278 in clBuildProgram (d_prog=0xabab98, num_devs=1, d_devs=0x7fffffffdcc0, 
    p_opts=<optimized out>, pfn_notify=0x0, user_data=0x0)
    at ../../../../../../src/gallium/state_trackers/clover/api/program.cpp:184
#23 0x000000000040460f in CL_programCreate (context=0x702be8, device=0x644fb8, codes=..., options=...)
    at clcc.c:1466
#24 0x0000000000406651 in Action_compile (settings=...) at clcc.c:1716
#25 0x0000000000406344 in main (argc=2, argv=0x7fffffffdfd8) at clcc.c:1658

I also tried the Debian mesa 11.2.0~rc3 packages without success (there is no dbg package for them though so I can't provide a backtrace).

Thanks!  -Tyson
Comment 1 Matt Arsenault 2016-03-11 20:59:47 UTC
This compiles fine with trunk llvm. This backtrace looks sort of familiar, and I think it was fixed within the last 3 months
Comment 2 Tyson Whitehead 2016-03-13 04:17:52 UTC
Thanks for the heads-up Matt.

I rebuilt the Debian package of mesa 11.2.0-rc3 against the Debian package of llvm 3.9~svn262954 and am pleased to say the simplified kernel I provided also now compiles for me.

Unfortunately the full set of my OpenCL code I still causing a segfault.  Pruning code reveals it is a different kernel though, and the backtrace is entirely different too, so progress is being made!

I'm attaching a simplified version of this next kernel function.  I would appreciate it if you could give it a go on your setup and see if it is segfaulting for you as well.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff3bad2a0 in (anonymous namespace)::JoinVals::pruneValues (this=this@entry=0x7fffffffb8a0, 
    Other=..., EndPoints=..., changeInstrs=changeInstrs@entry=false)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/CodeGen/RegisterCoalescer.cpp:2388

#0  0x00007ffff3bad2a0 in (anonymous namespace)::JoinVals::pruneValues (this=this@entry=0x7fffffffb8a0, 
    Other=..., EndPoints=..., changeInstrs=changeInstrs@entry=false)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/CodeGen/RegisterCoalescer.cpp:2388
#1  0x00007ffff3bb38da in (anonymous namespace)::RegisterCoalescer::joinSubRegRanges (this=0x2386a50, 
    this=0x2386a50, CP=..., LaneMask=8, RRange=..., LRange=...)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/CodeGen/RegisterCoalescer.cpp:2569
#2  (anonymous namespace)::RegisterCoalescer::mergeSubRangeInto (this=this@entry=0x2386a50, LI=..., 
    ToMerge=..., LaneMask=8, CP=...)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/CodeGen/RegisterCoalescer.cpp:2622
#3  0x00007ffff3bb4a31 in (anonymous namespace)::RegisterCoalescer::joinVirtRegs (
    this=this@entry=0x2386a50, CP=...)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/CodeGen/RegisterCoalescer.cpp:2688
#4  0x00007ffff3bb54a0 in (anonymous namespace)::RegisterCoalescer::joinIntervals (CP=..., 
    this=0x2386a50)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/CodeGen/RegisterCoalescer.cpp:2734
#5  (anonymous namespace)::RegisterCoalescer::joinCopy (Again=<synthetic pointer>, CopyMI=0xb991b0, 
    this=0x2386a50)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/CodeGen/RegisterCoalescer.cpp:1449
#6  (anonymous namespace)::RegisterCoalescer::copyCoalesceWorkList (this=this@entry=0x2386a50, 
    CurrList=...)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/CodeGen/RegisterCoalescer.cpp:2805
#7  0x00007ffff3bb70bb in (anonymous namespace)::RegisterCoalescer::coalesceLocals (
    this=this@entry=0x2386a50)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/CodeGen/RegisterCoalescer.cpp:2930
#8  0x00007ffff3bb7da8 in (anonymous namespace)::RegisterCoalescer::joinAllIntervals (this=0x2386a50)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/CodeGen/RegisterCoalescer.cpp:2956
#9  (anonymous namespace)::RegisterCoalescer::runOnMachineFunction (this=0x2386a50, fn=...)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/CodeGen/RegisterCoalescer.cpp:3006
#10 0x00007ffff39cc752 in llvm::FPPassManager::runOnFunction (this=0x238d260, F=...)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/IR/LegacyPassManager.cpp:1550
#11 0x00007ffff39cca8b in llvm::FPPassManager::runOnModule (this=0x238d260, M=...)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/IR/LegacyPassManager.cpp:1571
#12 0x00007ffff39cc3cf in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x238cfd0)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/IR/LegacyPassManager.cpp:1627
#13 llvm::legacy::PassManagerImpl::run (this=0xa96580, M=...)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/IR/LegacyPassManager.cpp:1730
#14 0x00007ffff39cc569 in llvm::legacy::PassManager::run (this=this@entry=0x7fffffffc6a0, M=...)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/IR/LegacyPassManager.cpp:1761
#15 0x00007ffff4507ef7 in LLVMTargetMachineEmit (T=T@entry=0x239f8a0, M=M@entry=0xb42a60, OS=..., 
    codegen=codegen@entry=LLVMObjectFile, ErrorMessage=ErrorMessage@entry=0x7fffffffc948)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/Target/TargetMachineC.cpp:206
#16 0x00007ffff4508219 in LLVMTargetMachineEmitToMemoryBuffer (T=T@entry=0x239f8a0, M=M@entry=0xb42a60, 
    codegen=codegen@entry=LLVMObjectFile, ErrorMessage=ErrorMessage@entry=0x7fffffffc948, 
    OutMemBuf=OutMemBuf@entry=0x7fffffffcae8)
    at /tmp/buildd/llvm-toolchain-snapshot-3.9~svn262954/lib/Target/TargetMachineC.cpp:230
#17 0x00007ffff6605584 in (anonymous namespace)::emit_code (tm=tm@entry=0x239f8a0, 
    mod=mod@entry=0xb42a60, file_type=file_type@entry=LLVMObjectFile, 
    out_buffer=out_buffer@entry=0x7fffffffcae8, 
    r_log="test2.c:36:31: warning: double precision constant requires cl_khr_fp64, casting to single precision\ntest2.c:44:45: warning: double precision constant requires cl_khr_fp64, casting to single precision\n"...) at ../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp:621
#18 0x00007ffff660a34e in (anonymous namespace)::compile_native (
    r_log="test2.c:36:31: warning: double precision constant requires cl_khr_fp64, casting to single precision\ntest2.c:44:45: warning: double precision constant requires cl_khr_fp64, casting to single precision\n"..., dump_asm=<optimized out>, processor="pitcairn", triple="amdgcn--", mod=0xb42a60)
    at ../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp:679
#19 clover::compile_program_llvm (
    source="#line 1 \"test2.c\"\n//", '-' <repeats 111 times>, "//\nfloat foldf3_mul(const float3 a) {\n ---Type <return> to continue, or q <return> to quit---
 return a.s0*a.s1*a.s2;\n}\n\nint"..., headers=..., ir=<optimized out>, target="pitcairn-amdgcn--", 
    opts="", 
    r_log="test2.c:36:31: warning: double precision constant requires cl_khr_fp64, casting to single precision\ntest2.c:44:45: warning: double precision constant requires cl_khr_fp64, casting to single precision\n"...) at ../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp:890
#20 0x00007ffff6600290 in clover::program::build (this=this@entry=0xb4a500, devs=..., 
    opts=opts@entry=0xb506d0 "", headers=std::vector of length 0, capacity 0)
    at ../../../../../src/gallium/state_trackers/clover/core/program.cpp:63
#21 0x00007ffff65e1a98 in clBuildProgram (d_prog=0xb4a508, num_devs=1, d_devs=0x7fffffffe1f0, 
    p_opts=<optimized out>, pfn_notify=0x0, user_data=0x0)
    at ../../../../../src/gallium/state_trackers/clover/api/program.cpp:184
#22 0x000000000040460f in CL_programCreate (context=0xb38018, device=0x6486b8, codes=..., options=...)
    at clcc.c:1466
#23 0x0000000000406651 in Action_compile (settings=...) at clcc.c:1716
#24 0x0000000000406344 in main (argc=4, argv=0x7fffffffe508) at clcc.c:1658

I've got a good feeling if this one can get resolved as well the whole thing might just compile.

Thanks!  -Tyson
Comment 3 Tyson Whitehead 2016-03-13 04:19:49 UTC
Created attachment 122262 [details]
Simplified kernel that causes other (different) compiler segfault
Comment 4 Vedran Miletić 2017-03-22 16:26:13 UTC
Neither kernel crashes anymore here with Mesa git and LLVM git, but it's possible that stable versions also work. Please reopen if this is still an issue.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.