When running the program: https://github.com/CNugteren/myGEMM/blob/master/extra/minimal.cpp it crashes. valgrind output is: valgrind ./minimal ==3618== Memcheck, a memory error detector ==3618== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==3618== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==3618== Command: ./minimal ==3618== >>> Initializing OpenCL... % Device: AMD TONGA (DRM 3.8.0 / 4.9.11-1-ARCH, LLVM 3.9.1), 7253.7 MiB memory, max allocation 1813.4 MiB, driver 17.0.1 ==3618== Invalid read of size 1 ==3618== at 0xAACAB18: llvm::SIInstrInfo::getInstSizeInBytes(llvm::MachineInstr const&) const (in /usr/lib/libLLVM-3.9.so) ==3618== by 0xAA5F550: llvm::AMDGPUAsmPrinter::getSIProgramInfo(llvm::AMDGPUAsmPrinter::SIProgramInfo&, llvm::MachineFunction const&) const (in /usr/lib/libLLVM-3.9.so) ==3618== by 0xAA625B9: llvm::AMDGPUAsmPrinter::runOnMachineFunction(llvm::MachineFunction&) (in /usr/lib/libLLVM-3.9.so) ==3618== by 0x9D591D0: llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (in /usr/lib/libLLVM-3.9.so) ==3618== by 0x9BF17C1: llvm::FPPassManager::runOnFunction(llvm::Function&) (in /usr/lib/libLLVM-3.9.so) ==3618== by 0x9BF1B4A: llvm::FPPassManager::runOnModule(llvm::Module&) (in /usr/lib/libLLVM-3.9.so) ==3618== by 0x9BF1E73: llvm::legacy::PassManagerImpl::run(llvm::Module&) (in /usr/lib/libLLVM-3.9.so) ==3618== by 0x7CB9742: ??? (in /usr/lib/libMesaOpenCL.so.1.0.0) ==3618== by 0x7CB9D5F: ??? (in /usr/lib/libMesaOpenCL.so.1.0.0) ==3618== by 0x7CB611D: ??? (in /usr/lib/libMesaOpenCL.so.1.0.0) ==3618== by 0x7CA7AF8: ??? (in /usr/lib/libMesaOpenCL.so.1.0.0) ==3618== by 0x7C859CB: ??? (in /usr/lib/libMesaOpenCL.so.1.0.0) ==3618== Address 0x400c07ab25 is not stack'd, malloc'd or (recently) free'd ==3618== ==3618== ==3618== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==3618== Access not within mapped region at address 0x400C07AB25 ==3618== at 0xAACAB18: llvm::SIInstrInfo::getInstSizeInBytes(llvm::MachineInstr const&) const (in /usr/lib/libLLVM-3.9.so) ==3618== by 0xAA5F550: llvm::AMDGPUAsmPrinter::getSIProgramInfo(llvm::AMDGPUAsmPrinter::SIProgramInfo&, llvm::MachineFunction const&) const (in /usr/lib/libLLVM-3.9.so) ==3618== by 0xAA625B9: llvm::AMDGPUAsmPrinter::runOnMachineFunction(llvm::MachineFunction&) (in /usr/lib/libLLVM-3.9.so) ==3618== by 0x9D591D0: llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (in /usr/lib/libLLVM-3.9.so) ==3618== by 0x9BF17C1: llvm::FPPassManager::runOnFunction(llvm::Function&) (in /usr/lib/libLLVM-3.9.so) ==3618== by 0x9BF1B4A: llvm::FPPassManager::runOnModule(llvm::Module&) (in /usr/lib/libLLVM-3.9.so) ==3618== by 0x9BF1E73: llvm::legacy::PassManagerImpl::run(llvm::Module&) (in /usr/lib/libLLVM-3.9.so) ==3618== by 0x7CB9742: ??? (in /usr/lib/libMesaOpenCL.so.1.0.0) ==3618== by 0x7CB9D5F: ??? (in /usr/lib/libMesaOpenCL.so.1.0.0) ==3618== by 0x7CB611D: ??? (in /usr/lib/libMesaOpenCL.so.1.0.0) ==3618== by 0x7CA7AF8: ??? (in /usr/lib/libMesaOpenCL.so.1.0.0) ==3618== by 0x7C859CB: ??? (in /usr/lib/libMesaOpenCL.so.1.0.0) ==3618== If you believe this happened as a result of a stack ==3618== overflow in your program's main thread (unlikely but ==3618== possible), you can try to increase the size of the ==3618== main thread stack using the --main-stacksize= flag. ==3618== The main thread stack size used in this run was 8388608. ==3618== ==3618== HEAP SUMMARY: ==3618== in use at exit: 27,389,789 bytes in 3,698 blocks ==3618== total heap usage: 82,326 allocs, 78,628 frees, 58,257,510 bytes allocated ==3618== ==3618== LEAK SUMMARY: ==3618== definitely lost: 16 bytes in 2 blocks ==3618== indirectly lost: 0 bytes in 0 blocks ==3618== possibly lost: 245,866 bytes in 375 blocks ==3618== still reachable: 27,143,907 bytes in 3,321 blocks ==3618== of which reachable via heuristic: ==3618== newarray : 340,712 bytes in 7 blocks ==3618== multipleinheritance: 632 bytes in 2 blocks ==3618== suppressed: 0 bytes in 0 blocks ==3618== Rerun with --leak-check=full to see details of leaked memory ==3618== ==3618== For counts of detected and suppressed errors, rerun with: -v ==3618== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Speicherzugriffsfehler (Speicherabzug geschrieben) It does not so when the int arguments in the OpenCL kernel are changed to unsigned int.
The same problem occurs when running the mat-mul example taken from https://cgit.freedesktop.org/~tstellar/opencl-example [mig@antergos-mig opencl-example]$ ./mat-mul There are 1 platforms. There are 1 GPU devices. clCreateProgramWithSource() suceeded. Speicherzugriffsfehler (Speicherabzug geschrieben) When int is replaced with unsigned int the the program runs fine: [mig@antergos-mig opencl-example]$ ./mat-mul There are 1 platforms. There are 1 GPU devices. clCreateProgramWithSource() suceeded. clBuildProgram() suceeded. clCreateKernel() suceeded. clSetKernelArg() succeeded. clSetKernelArg() succeeded. clSetKernelArg() succeeded. clSetKernelArg() succeeded. clSetKernelArg() succeeded. clSetKernelArg() succeeded. clSetKernelArg() succeeded. 50 94 178 60 120 220 Maybe it is related to the fact that the platform reports as 64bit adressspace: (clinfo reports incorrectly the "Global memory size" beeing 7GB whereas it shpuld report 2GB) [mig@antergos-mig ~]$ clinfo Number of platforms 1 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 Mesa 17.0.1 Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA Platform Name Clover Number of devices 1 Device Name AMD TONGA (DRM 3.8.0 / 4.9.11-1-ARCH, LLVM 3.9.1) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 Mesa 17.0.1 Driver Version 17.0.1 Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Profile FULL_PROFILE Max compute units 28 Max clock frequency 990MHz Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 64 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 2 / 2 half 0 / 0 (n/a) float 4 / 4 double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 7606060644 (7.084GiB) Error Correction support No Max memory allocation 1901515161 (1.771GiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type None Image support No Local memory type Local Local memory size 32768 (32KiB) Max constant buffer size 1901515161 (1.771GiB) Max number of constant args 16 Max size of kernel argument 1024 Queue properties Out-of-order execution No Profiling Yes Profiling timer resolution 0ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Device Available Yes Compiler Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64 NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA] clCreateContext(NULL, ...) [default] Success [MESA] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Clover Device Name AMD TONGA (DRM 3.8.0 / 4.9.11-1-ARCH, LLVM 3.9.1) clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Clover Device Name AMD TONGA (DRM 3.8.0 / 4.9.11-1-ARCH, LLVM 3.9.1) ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.10 ICD loader Profile OpenCL 2.1
Nice find, confirmed on Fiji with everything latest git. Let's figure out what is going on.
Created attachment 130386 [details] Kernel from minimal.cpp Sorry, that was a false positive, local installation issue. I can't reproduce it on Fiji with all current git or Hawaii Mesa 17.0.2/LLVM 3.9.1. Anyhow, I extracted the OpenCL kernel and tried compiling it $ clang -x cl -target amdgcn-- -mcpu=tonga -Dcl_clang_storage_class_specifiers=1 -Xclang -mlink-bitcode-file -Xclang /usr/local/lib64/clc/tonga-amdgcn--.bc -I/usr/local/include/clc -include /usr/local/include/clc/clc.h minimal.cl doesn't crash in LLVM and running Valgrind on the same command reports no errors.
Can you provide GDB backtraces of both minimal and mat-mul?
minimal.cpp compiled with gcc -c minimal.cpp -g -o minimal.o [miguel@antergos-mig extra]$ gdb GNU gdb (GDB) 7.12.1 Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word". (gdb) file ./minimal Reading symbols from ./minimal...done. (gdb) b 20 Breakpoint 1 at 0x400cee: file minimal.cpp, line 20. (gdb) run Starting program: /home/miguel/Dokumente/OpenCLExamples/myGEMM-master/extra/minimal >>> Initializing OpenCL... [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". [New Thread 0x7fffef7f8700 (LWP 3517)] [New Thread 0x7fffeedf2700 (LWP 3518)] [New Thread 0x7fffee5f1700 (LWP 3519)] [New Thread 0x7fffeddf0700 (LWP 3520)] [New Thread 0x7fffed5ef700 (LWP 3521)] % Device: AMD TONGA (DRM 3.9.0 / 4.10.4-1-ARCH, LLVM 3.9.1), 2046.3 MiB memory, max allocation 1432.4 MiB, driver 17.0.1 Thread 1 "minimal" received signal SIGSEGV, Segmentation fault. 0x00007ffff2b9eb18 in llvm::SIInstrInfo::getInstSizeInBytes(llvm::MachineInstr const&) const () from /usr/lib/libLLVM-3.9.so (gdb) Also when I compile minimal.cl with clang -x cl -target amdgcn-- -mcpu=tonga -Dcl_clang_storage_class_specifiers=1 -Xclang -mlink-bitcode-file -Xclang /usr/local/lib/clc/tonga-amdgcn--.bc -I/usr/local/include/clc -include /usr/local/include/clc/clc.h minimal.cl '+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature) '+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature) minimal.cl:2:15: error: unsupported call to function get_global_id.3 __kernel void myGEMM1(const int M, ^ minimal.cl:2:15: error: unsupported call to function get_global_id.3 clang-3.9: /home/miguel/Downloads/llvm-3.9.1.src/lib/Target/AMDGPU/SIInstrInfo.cpp:2428: void llvm::SIInstrInfo::legalizeOperands(llvm::MachineInstr&) const: Assertion `MBB.getParent()->getSubtarget<SISubtarget>().getGeneration() < SISubtarget::VOLCANIC_ISLANDS && "FIXME: Need to emit flat atomics here"' failed. #0 0x00007efe68ec1662 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/miguel/Downloads/llvm-3.9.1.src/lib/Support/Unix/Signals.inc:402:0 ... I get errors.
(In reply to Mig from comment #5) > minimal.cpp compiled with > > gcc -c minimal.cpp -g -o minimal.o > > [miguel@antergos-mig extra]$ gdb > GNU gdb (GDB) 7.12.1 > Copyright (C) 2017 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-pc-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > For help, type "help". > Type "apropos word" to search for commands related to "word". > (gdb) file ./minimal > Reading symbols from ./minimal...done. > (gdb) b 20 > Breakpoint 1 at 0x400cee: file minimal.cpp, line 20. > (gdb) run > Starting program: > /home/miguel/Dokumente/OpenCLExamples/myGEMM-master/extra/minimal > >>> Initializing OpenCL... > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/usr/lib/libthread_db.so.1". > [New Thread 0x7fffef7f8700 (LWP 3517)] > [New Thread 0x7fffeedf2700 (LWP 3518)] > [New Thread 0x7fffee5f1700 (LWP 3519)] > [New Thread 0x7fffeddf0700 (LWP 3520)] > [New Thread 0x7fffed5ef700 (LWP 3521)] > % Device: AMD TONGA (DRM 3.9.0 / 4.10.4-1-ARCH, LLVM 3.9.1), 2046.3 MiB > memory, max allocation 1432.4 MiB, driver 17.0.1 > > Thread 1 "minimal" received signal SIGSEGV, Segmentation fault. > 0x00007ffff2b9eb18 in > llvm::SIInstrInfo::getInstSizeInBytes(llvm::MachineInstr const&) const () > from /usr/lib/libLLVM-3.9.so > (gdb) > > I need backtrace here (command bt). > Also when I compile minimal.cl with > clang -x cl -target amdgcn-- -mcpu=tonga > -Dcl_clang_storage_class_specifiers=1 -Xclang -mlink-bitcode-file -Xclang > /usr/local/lib/clc/tonga-amdgcn--.bc -I/usr/local/include/clc -include > /usr/local/include/clc/clc.h minimal.cl > '+fp64-fp16-denormals' is not a recognized feature for this target (ignoring > feature) > '+fp64-fp16-denormals' is not a recognized feature for this target (ignoring > feature) Not really a problem, it's being ignored. > minimal.cl:2:15: error: unsupported call to function get_global_id.3 > __kernel void myGEMM1(const int M, > ^ > minimal.cl:2:15: error: unsupported call to function get_global_id.3 Could it be the same thing as bug 99856? > clang-3.9: > /home/miguel/Downloads/llvm-3.9.1.src/lib/Target/AMDGPU/SIInstrInfo.cpp:2428: > void llvm::SIInstrInfo::legalizeOperands(llvm::MachineInstr&) const: > Assertion `MBB.getParent()->getSubtarget<SISubtarget>().getGeneration() < > SISubtarget::VOLCANIC_ISLANDS && "FIXME: Need to emit flat atomics here"' > failed. This should be fixed (worked around, to be precise) in LLVM 4.0. > #0 0x00007efe68ec1662 llvm::sys::PrintStackTrace(llvm::raw_ostream&) > /home/miguel/Downloads/llvm-3.9.1.src/lib/Support/Unix/Signals.inc:402:0 > > ... > > I get errors.
Backtrace: (gdb) run Starting program: /home/miguel/Dokumente/OpenCLExamples/myGEMM-master/extra/minimal >>> Initializing OpenCL... [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". [New Thread 0x7fffef7f8700 (LWP 3383)] [New Thread 0x7fffeedf2700 (LWP 3384)] [New Thread 0x7fffee5f1700 (LWP 3385)] [New Thread 0x7fffeddf0700 (LWP 3386)] [New Thread 0x7fffed5ef700 (LWP 3387)] % Device: AMD TONGA (DRM 3.9.0 / 4.10.4-1-ARCH, LLVM 3.9.1), 2046.3 MiB memory, max allocation 1432.4 MiB, driver 17.0.1 Thread 1 "minimal" received signal SIGSEGV, Segmentation fault. 0x00007ffff2b9eb18 in llvm::SIInstrInfo::getInstSizeInBytes(llvm::MachineInstr const&) const () from /usr/lib/libLLVM-3.9.so (gdb) bt #0 0x00007ffff2b9eb18 in llvm::SIInstrInfo::getInstSizeInBytes(llvm::MachineInstr const&) const () from /usr/lib/libLLVM-3.9.so #1 0x00007ffff2b33551 in llvm::AMDGPUAsmPrinter::getSIProgramInfo(llvm::AMDGPUAsmPrinter::SIProgramInfo&, llvm::MachineFunction const&) const () from /usr/lib/libLLVM-3.9.so #2 0x00007ffff2b365ba in llvm::AMDGPUAsmPrinter::runOnMachineFunction(llvm::MachineFunction&) () from /usr/lib/libLLVM-3.9.so #3 0x00007ffff1e2d1d1 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () from /usr/lib/libLLVM-3.9.so #4 0x00007ffff1cc57c2 in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /usr/lib/libLLVM-3.9.so #5 0x00007ffff1cc5b4b in llvm::FPPassManager::runOnModule(llvm::Module&) () from /usr/lib/libLLVM-3.9.so #6 0x00007ffff1cc5e74 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /usr/lib/libLLVM-3.9.so #7 0x00007ffff4cc6743 in ?? () from /usr/lib/libMesaOpenCL.so.1 #8 0x00007ffff4cc6d60 in ?? () from /usr/lib/libMesaOpenCL.so.1 #9 0x00007ffff4cc311e in ?? () from /usr/lib/libMesaOpenCL.so.1 #10 0x00007ffff4cb4af9 in ?? () from /usr/lib/libMesaOpenCL.so.1 #11 0x00007ffff4c929cc in ?? () from /usr/lib/libMesaOpenCL.so.1 #12 0x00007ffff7bc650b in clBuildProgram () from /usr/lib/libOpenCL.so.1 #13 0x0000000000401190 in main (argc=1, argv=0x7fffffffe618) at minimal.cpp:113 (gdb)
BTW: I applied the patch from bug 99856 to clc but apparently the bug described above is not related to that.
With updated Mesa and llvm to version 4.0.0 I can compile and run the programs without any errors. [mig@antergos ~]$ clinfo Number of platforms 1 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 Mesa 17.0.4 Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA Platform Name Clover Number of devices 1 Device Name AMD TONGA (DRM 3.9.0 / 4.10.10-1-ARCH, LLVM 4.0.0) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 Mesa 17.0.4 Driver Version 17.0.4 Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Available Yes Device Profile FULL_PROFILE Max compute units 28 Max clock frequency 990MHz Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Compiler Available Yes Preferred work group size multiple 64
Thank you for reporting back!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.