Summary: | OpenCL kernel build crash using Blender Cycles | ||
---|---|---|---|
Product: | Beignet | Reporter: | Russell Palmer <russell.palmer> |
Component: | Beignet | Assignee: | ruiling <ruiling.song> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | anarsoul, hb9tmc, ruiling.song |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | pcBx3E.cl |
Description
Russell Palmer
2015-03-01 17:28:03 UTC
Hi Ressell, I tried blender 2.73a with llvm 3.6 and llvm 3.5.2 and also llvm 3.5.1. all the version I used are svn version. but still I cannot reproduce. So, I am not sure can you have I try on a different llvm version like 3.6? or try blender 2.73a? Thanks! Ruiling (In reply to ruiling from comment #1) > Hi Ressell, > > I tried blender 2.73a with llvm 3.6 and llvm 3.5.2 and also llvm 3.5.1. all > the version I used are svn version. but still I cannot reproduce. So, I am > not sure can you have I try on a different llvm version like 3.6? or try > blender 2.73a? > > Thanks! Ruiling I'm hitting the same issue with luxmark on ArchLinux. It just crashes few moments after start with: %1576 = load float addrspace(1)* %1575, align 4, !tbaa !45 Illegal pointer which is not from a valid memory space. llvm is 3.5.1, beignet is from git, commit 75690361b4014c0b877309d5c5a73167dbc21c3d (In reply to Vasily Khoruzhick from comment #2) > (In reply to ruiling from comment #1) > > Hi Ressell, > > > > I tried blender 2.73a with llvm 3.6 and llvm 3.5.2 and also llvm 3.5.1. all > > the version I used are svn version. but still I cannot reproduce. So, I am > > not sure can you have I try on a different llvm version like 3.6? or try > > blender 2.73a? > > > > Thanks! Ruiling > > I'm hitting the same issue with luxmark on ArchLinux. It just crashes few > moments after start with: > > %1576 = load float addrspace(1)* %1575, align 4, !tbaa !45 > Illegal pointer which is not from a valid memory space. > > llvm is 3.5.1, beignet is from git, commit > 75690361b4014c0b877309d5c5a73167dbc21c3d Could you try LLVM 3.6 with master beignet? (In reply to Zhigang Gong from comment #3) > Could you try LLVM 3.6 with master beignet? It didn't get yet into "extra" archlinux repo. I'll try as soon as it gets into it. (In reply to Zhigang Gong from comment #3) > Could you try LLVM 3.6 with master beignet? luxmark crashes with: %1549 = load float addrspace(1)* %1548, align 4, !tbaa !48 Illegal pointer which is not from a valid memory space. Aborting... llvm-3.6 beignet a153aba (In reply to Vasily Khoruzhick from comment #5) > (In reply to Zhigang Gong from comment #3) > > > Could you try LLVM 3.6 with master beignet? > > luxmark crashes with: > > %1549 = load float addrspace(1)* %1548, align 4, !tbaa !48 > Illegal pointer which is not from a valid memory space. > Aborting... > > llvm-3.6 > beignet a153aba Could you firstly check whether there is only on beignet related icd file in the /etc/OpenCL/vendor directory. If you found two beignet*.icd there, please remove all of them and reinstall beignet and try luxmark again. If there is just one, then we need to dig a little bit further. Could you try to apply the following patch to beignet master branch and rebuild a new beignet, and rerun the luxmark. Before it trigger the assertion, it should print out something like : "xxxx /tmp/yyyy.cl" The xxxx should be options, and may be empty, the yyyy.cl is a temporary cl file. Could you help to share the xxxx and the yyyy.cl file with us? That should be helpful for us to reproduce and fix the issue. Thanks. diff --git a/backend/src/backend/program.cpp b/backend/src/backend/program.cpp index eee7c3c..b24c19c 100644 --- a/backend/src/backend/program.cpp +++ b/backend/src/backend/program.cpp @@ -801,13 +801,14 @@ namespace gbe { *errSize += clangErrSize; if (OCL_OUTPUT_BUILD_LOG && options) llvm::errs() << options; + llvm::errs() << clName.c_str(); } else p = NULL; if (!llvm::llvm_is_multithreaded()) llvm_mutex.unlock(); - remove(clName.c_str()); + //remove(clName.c_str()); return p; } #endif @@ -848,9 +849,10 @@ namespace gbe { if (OCL_OUTPUT_BUILD_LOG && options) llvm::errs() << options; + llvm::errs() << clName.c_str(); } else p = NULL; - remove(clName.c_str()); + //remove(clName.c_str()); releaseLLVMContextLock(); return p; } Created attachment 114346 [details] pcBx3E.cl (In reply to Zhigang Gong from comment #6) > Could you firstly check whether there is only on beignet related icd file in > the /etc/OpenCL/vendor directory. If you found two beignet*.icd there, > please remove all of them and reinstall beignet and try luxmark again. If > there is just one, then we need to dig a little bit further. There's just one intel-beignet-.icd I'm building and installing a package all the time, so it's very unlikely that there're any leftovers after previous package. > Could you try to apply the following patch to beignet master branch and > rebuild a new beignet, and rerun the luxmark. Before it trigger the > assertion, it should print out something like : > "xxxx /tmp/yyyy.cl" > > The xxxx should be options, and may be empty, the yyyy.cl is a temporary cl > file. Could you help to share the xxxx and the yyyy.cl file with us? That > should be helpful for us to reproduce and fix the issue. Thanks. Options are empty, file is attached. (In reply to Vasily Khoruzhick from comment #7) > Created attachment 114346 [details] > pcBx3E.cl > > (In reply to Zhigang Gong from comment #6) > > Could you firstly check whether there is only on beignet related icd file in > > the /etc/OpenCL/vendor directory. If you found two beignet*.icd there, > > please remove all of them and reinstall beignet and try luxmark again. If > > there is just one, then we need to dig a little bit further. > > There's just one intel-beignet-.icd > I'm building and installing a package all the time, so it's very unlikely > that there're any leftovers after previous package. > > > Could you try to apply the following patch to beignet master branch and > > rebuild a new beignet, and rerun the luxmark. Before it trigger the > > assertion, it should print out something like : > > "xxxx /tmp/yyyy.cl" > > > > The xxxx should be options, and may be empty, the yyyy.cl is a temporary cl > > file. Could you help to share the xxxx and the yyyy.cl file with us? That > > should be helpful for us to reproduce and fix the issue. Thanks. > > Options are empty, file is attached. Thanks, now we can reproduce this bug, will fix it soon. Ruiling found this bug is caused by store/load pointers to/from memory. When store a pointer to a memory, we lost the information where the pointers are from. So when load the pointes back, beignet couldn't get correct address space and BTI for it. Now ruiling is working on fixing it. I was trying to run a BOINC application with beignet getting a similar error message. Is that problem related? store float %345, float addrspace(1)* %379, align 4, !tbaa !58 Illegal pointer which is not from a valid memory space. Aborting... yes, it looks like the same problem, I am still working on it. As the problem is a little complex, i still need about one week to give a clean fix for this issue. Status? I am really sorry for the long time delay. I finally add the support in beignet. One patch is still under review in the list. http://lists.freedesktop.org/archives/beignet/2015-May/005730.html we will merge it asap once it got positive comments. you can apply it to the latest master and have a try. What have I broken? ~$ clinfo Number of platforms 1 Platform Name Intel Gen OCL Driver Platform Vendor Intel Platform Version OpenCL 1.2 beignet 1.1 (git-38cf31f) Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd Platform Extensions function suffix Intel Beignet: self-test failed: (3, 7, 5) + (5, 7, 3) returned (3, 7, 5) See README.md or http://www.freedesktop.org/wiki/Software/Beignet/ Beignet: disabling non-working device Also without the patch the current master does not run with above message on my system (Debian jessie, LLVM 3.5, HD4600). Do I need to open a new bug report? you are running into a known issue on HSW platform, you can get full detailed message in README.md. you need: # echo 0 > /sys/module/i915/parameters/enable_cmd_parser and a kernel patch: https://01.org/zh/beignet/downloads/linux-kernel-patch-hsw-support # echo 0 > /sys/module/i915/parameters/enable_cmd_parser is not needed in beignet master, but kernel patch is still needed for linux kernel before 4.0. Kernel 4.0.4 enable_cmd_parser is 0 Still the same problem. I am sorry, I didn't give you a exact description of the issue: for linux 4.0, you still need the kernel patch. * "Beignet: self-test failed" and 15-30 unit tests fail on 4th Generation (Haswell) hardware. On Haswell, shared local memory (\_\_local) does not work at all on Linux <= 4.0, and requires the i915.enable_ppgtt=2 [boot parameter](https://wiki.ubuntu.com/Kernel/KernelBootParameters) on Linux 4.1. This will be fixed in Linux 4.2; older versions can be fixed with [this patch](https://01.org/zh/beignet/downloads/linux-kernel-patch-hsw-support). Thank you. It's all running now. it is fixed by "GBE: Support storing/loading pointers to/from private array" which is merged in master branch. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.