Bug 89378 - OpenCL kernel build crash using Blender Cycles
Summary: OpenCL kernel build crash using Blender Cycles
Status: RESOLVED FIXED
Alias: None
Product: Beignet
Classification: Unclassified
Component: Beignet (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: ruiling
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-01 17:28 UTC by Russell Palmer
Modified: 2015-06-03 06:23 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
pcBx3E.cl (31.40 KB, text/plain)
2015-03-16 10:56 UTC, Vasily Khoruzhick
Details

Description Russell Palmer 2015-03-01 17:28:03 UTC
Following on from bug 89325, a second unrelated crash happens when trying to use Blender Cycles renderer with Beignet:

>> following is the output I see on the command line when I try to render the
>> default blender scene (just a cube) using the Cycles Renderer :
>> 
>> [arpie@max build]$ CYCLES_OPENCL_TEST=true blender
>> Read new prefs: /home/arpie/.config/blender/2.73/config/userpref.blend
>> Device init succes
>> Compiling OpenCL kernel ...
>>   %132 = load i32 addrspace(2)* %131, align 4, !tbaa !54
>> Illegal pointer which is not from a valid memory space.
>> Aborting...
>
>I can't reproduce this. Could you tell me the LLVM/Clang version you are using? >You can get it by execute:
>llvm-config --version.

Here you go:
[arpie@max ~]$ llvm-config --version
3.5.1


>> Equally, do you think it is too ambitious to be trying to get this to work
>> at all?  I suspect the speed up I will get will be minimal (if any at all),
>> as I am not exactly using a high-spec GPU.  In fact, that might be the cause
>> of the crash - not enough memory available on the GPU?  But that is just a
>> naive guess!  I should stop speculating and leave it to the experts...
>
>To support blender's CYCLE engine is a little bit ambitious due to its very >large computing kernel. But we will continue to improve beignet to support it >eventually.
>
>Don't know which GPU are you using, if you are using a HSW GT3, then if we can >get blender work with beignet, it should give noticeable performance boost. If >you are using IVB GT1 or HSW GT1, then it may not worth to enable GPU >acceleration. 

I'm using some sort of integrated graphics, in a laptop.  The only info I can give you is this :

GPU : 00:02.0 VGA compatible controller [0300]: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller [8086:0416] (rev 06)

That series/model id looks like a (from a google search) :
 Haswell Integrated Graphics Controller
or
 4th Gen Core Processor Integrated Graphics Controller

The motherboard is is a W54_55SU1 SUW if that helps at all.

Is there a command I can run to get you more useful info on the GPU?


Other useful info includes :
OS: Arch Linux 3.18.6-1-ARCH #1 SMP PREEMPT Sat Feb 7 08:44:05 CET 2015 x86_64 GNU/Linux
Blender version 2.73
Beignet version 1.0.1

Good luck,
Russell
Comment 1 ruiling 2015-03-06 06:02:58 UTC
Hi Ressell,

I tried blender 2.73a with llvm 3.6 and llvm 3.5.2 and also llvm 3.5.1. all the version I used are svn version. but still I cannot reproduce. So, I am not sure can you have I try on a different llvm version like 3.6? or try blender 2.73a?

Thanks! Ruiling
Comment 2 Vasily Khoruzhick 2015-03-13 10:18:01 UTC
(In reply to ruiling from comment #1)
> Hi Ressell,
> 
> I tried blender 2.73a with llvm 3.6 and llvm 3.5.2 and also llvm 3.5.1. all
> the version I used are svn version. but still I cannot reproduce. So, I am
> not sure can you have I try on a different llvm version like 3.6? or try
> blender 2.73a?
> 
> Thanks! Ruiling

I'm hitting the same issue with luxmark on ArchLinux. It just crashes few moments after start with:

  %1576 = load float addrspace(1)* %1575, align 4, !tbaa !45
Illegal pointer which is not from a valid memory space.

llvm is 3.5.1, beignet is from git, commit 75690361b4014c0b877309d5c5a73167dbc21c3d
Comment 3 Zhigang Gong 2015-03-13 10:32:22 UTC
(In reply to Vasily Khoruzhick from comment #2)
> (In reply to ruiling from comment #1)
> > Hi Ressell,
> > 
> > I tried blender 2.73a with llvm 3.6 and llvm 3.5.2 and also llvm 3.5.1. all
> > the version I used are svn version. but still I cannot reproduce. So, I am
> > not sure can you have I try on a different llvm version like 3.6? or try
> > blender 2.73a?
> > 
> > Thanks! Ruiling
> 
> I'm hitting the same issue with luxmark on ArchLinux. It just crashes few
> moments after start with:
> 
>   %1576 = load float addrspace(1)* %1575, align 4, !tbaa !45
> Illegal pointer which is not from a valid memory space.
> 
> llvm is 3.5.1, beignet is from git, commit
> 75690361b4014c0b877309d5c5a73167dbc21c3d

Could you try LLVM 3.6 with master beignet?
Comment 4 Vasily Khoruzhick 2015-03-13 13:31:38 UTC
(In reply to Zhigang Gong from comment #3)

> Could you try LLVM 3.6 with master beignet?

It didn't get yet into "extra" archlinux repo. I'll try as soon as it gets into it.
Comment 5 Vasily Khoruzhick 2015-03-16 09:09:58 UTC
(In reply to Zhigang Gong from comment #3)
 
> Could you try LLVM 3.6 with master beignet?

luxmark crashes with:

  %1549 = load float addrspace(1)* %1548, align 4, !tbaa !48
Illegal pointer which is not from a valid memory space.
Aborting...

llvm-3.6
beignet a153aba
Comment 6 Zhigang Gong 2015-03-16 10:03:15 UTC
(In reply to Vasily Khoruzhick from comment #5)
> (In reply to Zhigang Gong from comment #3)
>  
> > Could you try LLVM 3.6 with master beignet?
> 
> luxmark crashes with:
> 
>   %1549 = load float addrspace(1)* %1548, align 4, !tbaa !48
> Illegal pointer which is not from a valid memory space.
> Aborting...
> 
> llvm-3.6
> beignet a153aba

Could you firstly check whether there is only on beignet related icd file in the /etc/OpenCL/vendor directory. If you found two beignet*.icd there, please remove all of them and reinstall beignet and try luxmark again. If there is just one, then we need to dig a little bit further.

Could you try to apply the following patch to beignet master branch and rebuild a new beignet, and rerun the luxmark. Before it trigger the assertion, it should print out something like :
"xxxx /tmp/yyyy.cl"

The xxxx should be options, and may be empty, the yyyy.cl is a temporary cl file. Could you help to share the xxxx and the yyyy.cl file with us? That should be helpful for us to reproduce and fix the issue. Thanks.

diff --git a/backend/src/backend/program.cpp b/backend/src/backend/program.cpp
index eee7c3c..b24c19c 100644
--- a/backend/src/backend/program.cpp
+++ b/backend/src/backend/program.cpp
@@ -801,13 +801,14 @@ namespace gbe {
         *errSize += clangErrSize;
       if (OCL_OUTPUT_BUILD_LOG && options)
         llvm::errs() << options;
+      llvm::errs() << clName.c_str();
     } else
       p = NULL;

     if (!llvm::llvm_is_multithreaded())
       llvm_mutex.unlock();

-    remove(clName.c_str());
+    //remove(clName.c_str());
     return p;
   }
 #endif
@@ -848,9 +849,10 @@ namespace gbe {

       if (OCL_OUTPUT_BUILD_LOG && options)
         llvm::errs() << options;
+      llvm::errs() << clName.c_str();
     } else
       p = NULL;
-    remove(clName.c_str());
+    //remove(clName.c_str());
     releaseLLVMContextLock();
     return p;
   }
Comment 7 Vasily Khoruzhick 2015-03-16 10:56:48 UTC
Created attachment 114346 [details]
pcBx3E.cl

(In reply to Zhigang Gong from comment #6)
> Could you firstly check whether there is only on beignet related icd file in
> the /etc/OpenCL/vendor directory. If you found two beignet*.icd there,
> please remove all of them and reinstall beignet and try luxmark again. If
> there is just one, then we need to dig a little bit further.

There's just one intel-beignet-.icd
I'm building and installing a package all the time, so it's very unlikely that there're any leftovers after previous package.

> Could you try to apply the following patch to beignet master branch and
> rebuild a new beignet, and rerun the luxmark. Before it trigger the
> assertion, it should print out something like :
> "xxxx /tmp/yyyy.cl"
> 
> The xxxx should be options, and may be empty, the yyyy.cl is a temporary cl
> file. Could you help to share the xxxx and the yyyy.cl file with us? That
> should be helpful for us to reproduce and fix the issue. Thanks.

Options are empty, file is attached.
Comment 8 Zhigang Gong 2015-03-17 03:01:24 UTC
(In reply to Vasily Khoruzhick from comment #7)
> Created attachment 114346 [details]
> pcBx3E.cl
> 
> (In reply to Zhigang Gong from comment #6)
> > Could you firstly check whether there is only on beignet related icd file in
> > the /etc/OpenCL/vendor directory. If you found two beignet*.icd there,
> > please remove all of them and reinstall beignet and try luxmark again. If
> > there is just one, then we need to dig a little bit further.
> 
> There's just one intel-beignet-.icd
> I'm building and installing a package all the time, so it's very unlikely
> that there're any leftovers after previous package.
> 
> > Could you try to apply the following patch to beignet master branch and
> > rebuild a new beignet, and rerun the luxmark. Before it trigger the
> > assertion, it should print out something like :
> > "xxxx /tmp/yyyy.cl"
> > 
> > The xxxx should be options, and may be empty, the yyyy.cl is a temporary cl
> > file. Could you help to share the xxxx and the yyyy.cl file with us? That
> > should be helpful for us to reproduce and fix the issue. Thanks.
> 
> Options are empty, file is attached.

Thanks, now we can reproduce this bug, will fix it soon.
Comment 9 Zhigang Gong 2015-03-18 06:57:01 UTC
Ruiling found this bug is caused by store/load pointers to/from memory. When store a pointer to a memory, we lost the information where the pointers are from. So when load the pointes back, beignet couldn't get correct address space and BTI for it. Now ruiling is working on fixing it.
Comment 10 heliosh 2015-04-05 16:31:17 UTC
I was trying to run a BOINC application with beignet getting a similar error message. Is that problem related?


store float %345, float addrspace(1)* %379, align 4, !tbaa !58
Illegal pointer which is not from a valid memory space.
Aborting...
Comment 11 ruiling 2015-04-07 01:49:55 UTC
yes, it looks like the same problem, I am still working on it. As the problem is a little complex, i still need about one week to give a clean fix for this issue.
Comment 12 heliosh 2015-05-24 09:03:14 UTC
Status?
Comment 13 ruiling 2015-05-25 01:39:09 UTC
I am really sorry for the long time delay. I finally add the support in beignet. One patch is still under review in the list.
http://lists.freedesktop.org/archives/beignet/2015-May/005730.html
we will merge it asap once it got positive comments.
you can apply it to the latest master and have a try.
Comment 14 heliosh 2015-05-25 14:54:58 UTC
What have I broken?

~$ clinfo 
Number of platforms                               1
  Platform Name                                   Intel Gen OCL Driver
  Platform Vendor                                 Intel
  Platform Version                                OpenCL 1.2 beignet 1.1 (git-38cf31f)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd
  Platform Extensions function suffix             Intel
Beignet: self-test failed: (3, 7, 5) + (5, 7, 3) returned (3, 7, 5)
See README.md or http://www.freedesktop.org/wiki/Software/Beignet/
Beignet: disabling non-working device
Comment 15 heliosh 2015-05-25 19:23:38 UTC
Also without the patch the current master does not run with above message on my system (Debian jessie, LLVM 3.5, HD4600).
Do I need to open a new bug report?
Comment 16 ruiling 2015-05-26 01:46:19 UTC
you are running into a known issue on HSW platform, you can get full detailed message in README.md. you need:
# echo 0 > /sys/module/i915/parameters/enable_cmd_parser
and a kernel patch:
https://01.org/zh/beignet/downloads/linux-kernel-patch-hsw-support
Comment 17 ruiling 2015-05-26 01:52:52 UTC
# echo 0 > /sys/module/i915/parameters/enable_cmd_parser
is not needed in beignet master, but kernel patch is still needed for linux kernel before 4.0.
Comment 18 heliosh 2015-05-26 06:15:32 UTC
Kernel 4.0.4
enable_cmd_parser is 0

Still the same problem.
Comment 19 ruiling 2015-05-26 07:31:15 UTC
I am sorry, I didn't give you a exact description of the issue:
for linux 4.0, you still need the kernel patch.

* "Beignet: self-test failed" and 15-30 unit tests fail on 4th Generation (Haswell) hardware.
  On Haswell, shared local memory (\_\_local) does not work at all on
  Linux <= 4.0, and requires the i915.enable_ppgtt=2 [boot parameter](https://wiki.ubuntu.com/Kernel/KernelBootParameters)
  on Linux 4.1.
  
  This will be fixed in Linux 4.2; older versions can be fixed with
  [this patch](https://01.org/zh/beignet/downloads/linux-kernel-patch-hsw-support).
Comment 20 heliosh 2015-05-26 07:59:12 UTC
Thank you. It's all running now.
Comment 21 ruiling 2015-06-03 06:23:21 UTC
it is fixed by "GBE: Support storing/loading pointers to/from private array" which is merged in master branch.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.