Bug 105869

Summary: [amdgcn/llvm-5] clang crashes when compiling OpenCL kernel
Product: Mesa Reporter: Vedran Miletić <vedran>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: lyberta
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 99553    
Attachments: OpenCL dump.cl
OpenCL dump.link-0.ll
OpenCL dump.ll

Description Vedran Miletić 2018-04-03 21:01:13 UTC
(from #radeon and https://paste.debian.net/1018363/)

When compiling the following OpenCL kernel

kernel void PulseWave(global float* buffer, size_t num_harmonics,
	float duty_cycle)
{
	size_t index = get_global_id(0);
	float phase = buffer[index];
	float sample = duty_cycle;
	float precompute1 = (phase - duty_cycle / 2.0) * 2.0 * M_PI;
	for (size_t i = 1; i <= num_harmonics; ++i)
	{
		float harmonic = 2.0 / (i * M_PI) * sin(M_PI * i * duty_cycle) *
			cos(i * precompute1);
		sample += harmonic;
	}
	buffer[index] = sample * 2.0 - 1.0;
}

Clang will crash with

Thread 1 "ftz_chiptune_pu" received signal SIGSEGV, Segmentation fault.
0x00007fffe6501ba1 in llvm::LiveRange::find(llvm::SlotIndex) () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
(gdb) bt
#0  0x00007fffe6501ba1 in llvm::LiveRange::find(llvm::SlotIndex) () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#1  0x00007fffe66674a1 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#2  0x00007fffe66676c1 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#3  0x00007fffe66697e5 in llvm::RegPressureTracker::getLiveThroughAt(unsigned int, llvm::SlotIndex) const () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#4  0x00007fffe6669af1 in llvm::RegPressureTracker::recede(llvm::RegisterOperands const&, llvm::SmallVectorImpl<llvm::RegisterMaskPair>*) ()
   from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#5  0x00007fffe66a0229 in llvm::ScheduleDAGInstrs::buildSchedGraph(llvm::AAResults*, llvm::RegPressureTracker*, llvm::PressureDiffs*, llvm::LiveIntervals*, bool) () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#6  0x00007fffe65e7653 in llvm::ScheduleDAGMILive::buildDAGWithRegPressure() () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#7  0x00007fffe65e76b9 in llvm::ScheduleDAGMILive::schedule() () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#8  0x00007fffe758e501 in llvm::GCNScheduleDAGMILive::schedule() () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#9  0x00007fffe758e26e in llvm::GCNScheduleDAGMILive::finalizeSchedule() () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#10 0x00007fffe65e6fa7 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#11 0x00007fffe657cae0 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#12 0x00007fffe63deac8 in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#13 0x00007fffe6fb7b50 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#14 0x00007fffe63de3af in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /usr/lib/x86_64-linux-gnu/libLLVM-5.0.so.1
#15 0x00007ffff42227ce in ?? () from /usr/lib/x86_64-linux-gnu/libMesaOpenCL.so.1
#16 0x00007ffff4222ed0 in ?? () from /usr/lib/x86_64-linux-gnu/libMesaOpenCL.so.1
#17 0x00007ffff421e6a7 in ?? () from /usr/lib/x86_64-linux-gnu/libMesaOpenCL.so.1
#18 0x00007ffff420fea1 in ?? () from /usr/lib/x86_64-linux-gnu/libMesaOpenCL.so.1
#19 0x00007ffff41efe02 in ?? () from /usr/lib/x86_64-linux-gnu/libMesaOpenCL.so.1
#20 0x0000555555561129 in cl::Program::build (this=0x7fffffffde10, options=0x0, notifyFptr=0x0, data=0x0) at /usr/include/CL/cl2.hpp:6321
Comment 1 Jan Vesely 2018-04-04 20:15:57 UTC
What is the clang/llvm version? I cannot reproduce using clang-5.0.1:
$ cat foo.cl 
#pragma OPENCL EXTENSION cl_khr_fp64: enable

kernel void PulseWave(global float* buffer, size_t num_harmonics,
	float duty_cycle)
{
	size_t index = get_global_id(0);
	float phase = buffer[index];
	float sample = duty_cycle;
	float precompute1 = (phase - duty_cycle / 2.0) * 2.0 * M_PI;
	for (size_t i = 1; i <= num_harmonics; ++i)
	{
		float harmonic = 2.0 / (i * M_PI) * sin(M_PI * i * duty_cycle) *
			cos(i * precompute1);
		sample += harmonic;
	}
	buffer[index] = sample * 2.0 - 1.0;
}

$ clang-5.0 -target amdgcn-mesa-mesa3d -Xclang -mlink-bitcode-file -Xclang /usr/lib64/clc/carrizo-amdgcn-mesa-mesa3d.bc  -include clc/clc.h -S foo.cl

$ clang-5.0 --version
clang version 5.0.1 (tags/RELEASE_501/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Comment 2 Pierre-Loup A. Griffais 2018-04-05 01:20:56 UTC
[18:11:28] <Lyberta> can someone reply here https://bugs.freedesktop.org/show_bug.cgi?id=105869 that version is 5.0.1-4 from Debian Testing, if I ever find another bug I'll register myself
Comment 3 Jan Vesely 2018-04-05 12:40:10 UTC
Can you run clinfo and append the output?
Comment 4 Lyberta 2018-04-05 15:52:25 UTC
Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 17.3.7
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD Radeon (TM) R9 380 Series (TONGA / DRM 3.23.0 / 4.15.0-2-amd64, LLVM 5.0.1)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 17.3.7
  Driver Version                                  17.3.7
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Max compute units                               32
  Max clock frequency                             1040MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4292071424 (3.997GiB)
  Error Correction support                        No
  Max memory allocation                           3004449996 (2.798GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max number of constant args                     16
  Max constant buffer size                        2147483647 (2GiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
  clCreateContext(NULL, ...) [default]            Success [MESA]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD Radeon (TM) R9 380 Series (TONGA / DRM 3.23.0 / 4.15.0-2-amd64, LLVM 5.0.1)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD Radeon (TM) R9 380 Series (TONGA / DRM 3.23.0 / 4.15.0-2-amd64, LLVM 5.0.1)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD Radeon (TM) R9 380 Series (TONGA / DRM 3.23.0 / 4.15.0-2-amd64, LLVM 5.0.1)

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12
  ICD loader Profile                              OpenCL 2.2
Comment 5 Jan Vesely 2018-04-05 22:00:07 UTC
OK, I tried adding -mcpu=tonga to the command line to match the asic. Still no crash.
Are you sure it's the PulseWave kernel that's crashing?

At any rate, you should check llvm/clang-6 (that one can still be fixed if the problem persists).
Comment 6 Lyberta 2018-04-06 01:59:15 UTC
I'm 100% sure it is PulseWave because that's the only kernel I use to one of my programs and it still crashes at cl::Program::build.

How to upgrade to llvm/clang 6?
Comment 7 Jan Vesely 2018-04-06 18:29:37 UTC
(In reply to Lyberta from comment #6)
> I'm 100% sure it is PulseWave because that's the only kernel I use to one of
> my programs and it still crashes at cl::Program::build.

Is the posted snippet all that is compiled?
can you run with CLOVER_DEBUG=clc,llvm CLOVER_DEBUG_FILE=dump and attached the created dump.{cl,ll} files?

> How to upgrade to llvm/clang 6?

either there is a distro specific way (for your distro) to try testing packages. 
Packages for popular distros are also available here:
http://releases.llvm.org/download.html

You can also build from source. Note that you'll need to rebuild mesa and libclc after the upgrade.
Comment 8 Lyberta 2018-04-06 20:35:54 UTC
Created attachment 138663 [details]
OpenCL dump.cl
Comment 9 Lyberta 2018-04-06 20:36:24 UTC
Created attachment 138664 [details]
OpenCL dump.link-0.ll
Comment 10 Lyberta 2018-04-06 20:36:44 UTC
Created attachment 138665 [details]
OpenCL dump.ll
Comment 11 Jan Vesely 2018-05-08 00:31:27 UTC
Hi,

sorry for the delay. I thought I replied weeks ago.

I can reproduce the segfault on llvm-5 using the dump.link-0.ll:
/usr/lib/llvm/5/bin/llc -march=amdgcn < ../dump.link-0.ll

Since llvm-5 will not see any further changes you might try updating your libclc (plenty of changes recently), to see if it hides the bug.

Linking the old compiled code with newer libclc did not seem to help:
cd libclc;
/usr/lib/llvm/5/bin/llvm-link ../dump.ll built_libs/tahiti-amdgcn-mesa-mesa3d.bc | /usr/lib/llvm/5/bin/llc
still hits segfault.

running clang-5.0 directly does not hit the issue:
However, I cannot reproduce the error using clang-5.0:
cd libclc;
clang-5.0 -target amdgcn-mesa-mesa3d -Igeneric/include -Xclang -mlink-bitcode-file -Xclang built_libs/tahiti-amdgcn-mesa-mesa3d.bc -include clc/clc.h ../dump.cl -S -o -
works OK!

Unless I add '-g' to the command line, in which case it hits the same segfault.

I'd say this is a variant of
https://bugs.freedesktop.org/show_bug.cgi?id=100218

I've also tried running the same command using llvm-6:
clang-6.0 -g -target amdgcn-mesa-mesa3d -Igeneric/include -Xclang -mlink-bitcode-file -Xclang built_libs/tahiti-amdgcn-mesa-mesa3d.bc -include clc/clc.h ../dump.cl -S -o -

and it does not segfault.

I think the correct solution for you would be to upgrade to llvm-6.
I'd consider this fixed in that case.
Comment 12 Lyberta 2018-05-09 09:26:24 UTC
Alright, I've installed Mesa ICD that uses LLVM 6 and the program no longer crashes.
Comment 13 Jan Vesely 2018-05-09 09:29:36 UTC
closing per confirmation in comment #12.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.