Bug 94636 - Beignet doesn't work on Skylake GT2(clang + libc++)
Summary: Beignet doesn't work on Skylake GT2(clang + libc++)
Alias: None
Product: Beignet
Classification: Unclassified
Component: Beignet (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Xiuli Pan
QA Contact:
Depends on:
Reported: 2016-03-20 13:39 UTC by Armin K
Modified: 2016-11-04 02:35 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:

benchmark log (63.28 KB, text/plain)
2016-05-06 10:29 UTC, Armin K
Patch for clang build (2.71 KB, patch)
2016-05-09 03:13 UTC, Xiuli Pan
Details | Splinter Review
benchmark log (5.09 KB, text/plain)
2016-05-10 08:15 UTC, Armin K
utests log (34.91 KB, text/plain)
2016-05-10 08:40 UTC, Armin K
Latest benchmark log (16.99 KB, text/plain)
2016-08-16 21:58 UTC, Armin K
Latest utests log (40.29 KB, text/plain)
2016-08-16 21:58 UTC, Armin K
Fix cmrt detection (1.07 KB, patch)
2016-08-17 11:43 UTC, Armin K
Details | Splinter Review
utests log after applying cmrt detection fix (40.11 KB, text/plain)
2016-08-17 11:43 UTC, Armin K

Description Armin K 2016-03-20 13:39:06 UTC
I'm trying to use Beignet on a Skylake GT2 GPU but it fails to initialize.

clinfo says the following:

Beignet: self-test failed: (3, 7, 5) + (5, 7, 3) returned (-1356494504, -448270753, -443389048)
See README.md or http://www.freedesktop.org/wiki/Software/Beignet/
Beignet: disabling non-working device
Beignet: disabling non-working device

I am using beignet git master from couple of weeks ago (at most 3 commits behind the current master), coupled with LLVM-3.8.0 release. I've tried using Linux-4.5 as weel as drm-intel-nightly: 2016y-03m-19d-10h-09m-53s UTC. Issue is same on both of them.

Some time in the past 4.5-rc releases, it was working at certain moments, but as soon as gpu hang/recovery or oops in i915 happened, it would stop working. oopses disappeared in the final rc, so that's probably the time all the fixes went in and somehow broke OpenCL (but fixed oopses).
Comment 1 Xiuli Pan 2016-04-28 08:03:48 UTC
Hi Armin,

Are there still problem with the self-test of beignet with some newer kernel?
Could you give us the pciid of the GPU, and linux version and drm version for us to try to reproduce the bug.

Comment 2 Armin K 2016-04-28 13:22:49 UTC
Yes, it's still problematic. A program as simple as clinfo fails to run due to this issue.

Beignet 4e7d5a0 is used (git master from a bit more than a month). It's coupled with llvm-3.8.0 and mesa 79b3616 (git master from less than two weeks ago).

I am running linux-4.5.2 and using libdrm-2.4.67. The lspci output shows the following:

00:02.0 VGA compatible controller [0300]: Intel Corporation Sky Lake Integrated Graphics [8086:1916] (rev 07) (prog-if 00 [VGA controller])
Subsystem: Hewlett-Packard Company Skylake Integrated Graphics [103c:8102]

Let me need if you need more info.
Comment 3 Xiuli Pan 2016-04-29 04:57:13 UTC
Hi Armin,

Are you using clang as the compiler of the beignet?
I am facing the same problem when I tried to build the beignet with clang3.8, maybe you can try gcc as the compiler first.
I will look into this problem with clang.

Comment 4 Armin K 2016-04-29 18:46:15 UTC

Yes, I'm using clang/clang++ to compile beignet.

It isn't trivial to switch to gcc because I have libc++ as my default C++ library.

That means that beignet and llvm are linked to it, and linking programs built against libstdc++ with programs linked with libc++ doesn't really work (lots of undefined references to libc++ symbols (different namespacing I guess) when building with libstdc++ and vice versa).

I'd be happy to provide whatever info you need to narrow down the issue in the code itself so it works when built with clang, too.
Comment 5 Xiuli Pan 2016-05-02 05:02:45 UTC
Hi Armin,

OK, I have send a patch here:
Would you try it to see if the problem is still there?

Comment 6 Armin K 2016-05-02 09:47:19 UTC

Beignet git master + the patch linked from Comment 5 seem to work (at least for clinfo).
Comment 7 Xiuli Pan 2016-05-05 03:05:46 UTC
(In reply to Armin K from comment #6)
> Hi,
> Beignet git master + the patch linked from Comment 5 seem to work (at least
> for clinfo).

What about utest and other test cases?
If all is fine I'd like to mark this bug as fixed.

Comment 8 Armin K 2016-05-05 17:12:41 UTC
Sadly, it stopped working.

I still see:

Beignet: self-test failed: (3, 7, 5) + (5, 7, 3) returned (1562535031, -162421138, -1119602453)
See README.md or http://www.freedesktop.org/wiki/Software/Beignet/
Beignet: disabling non-working device
Comment 9 Xiuli Pan 2016-05-06 02:40:47 UTC
Hi Armin,

Could you export OCL_OUTPUT_LLVM_AFTER_GEN=1, export OCL_OUTPUT_GEN_IR=1, 
export OCL_OUTPUT_ASM=1 and upload these logs here?

And could you upload the full log of all tests you are running?

Comment 10 Armin K 2016-05-06 10:29:33 UTC
It doesn't work at all anymore, not even for clinfo.

As far as I know, the only updates I did to my system were kernel to 4.5.3. I see that mesa was updated to a newer git snapshot too, so that might have caused the issue, but I'm not sure if that was before I said it was working or after (I didn't properly log when it was installed).

export OCL_KERNEL_PATH=$HOME/src/Beignet-1.1.1-Source/kernels
./benchmark_run 2>&1 | tee benchmark.txt

(ignore the Beignet-1.1.1-Source above, it's still git master, just renamed so I didn't have to modify the build script).

benchmark.txt is attached.
Comment 11 Armin K 2016-05-06 10:29:53 UTC
Created attachment 123519 [details]
benchmark log
Comment 12 Xiuli Pan 2016-05-09 03:12:43 UTC
Hi Armin,

I have looked at the log and find the same problem I have solved at https://lists.freedesktop.org/archives/beignet/2016-April/007489.html.
The load and store instructions are wrongly generate from backend/src/llvm/llvm_gen_backend.cpp. The patch should have fix that bug, but I still saw the same wrong gen ir in the benchmark log your uploaded.
Could you recheck if you have apply the patch? I will upload one you can just use git am to apply it.

Comment 13 Xiuli Pan 2016-05-09 03:13:47 UTC
Created attachment 123557 [details] [review]
Patch for clang build
Comment 14 Armin K 2016-05-09 14:23:49 UTC
Hi, It appears that only a rebuild was needed for some reason. The patch was applied before.

However, while clinfo now works, benchmark_run doesnt. I only see the following:

$ ./benchmark_run 
platform number 1
platform_profile "FULL_PROFILE"
platform_name "Intel Gen OCL Driver"
platform_vendor "Intel"
platform_version "OpenCL 1.2 beignet 1.2"
platform_extensions "cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_motion_estimation"
device_profile "FULL_PROFILE"
device_name "Intel(R) HD Graphics Skylake ULT GT2"
device_vendor "Intel"
device_version "OpenCL 1.2 beignet 1.2"
device_extensions "cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_motion_estimation cl_khr_fp16"
device_opencl_c_version "OpenCL C 1.2 beignet 1.2"
23 image formats are supported
  Vector size 2:
        Offset 0 :    Interrupt signal (SIGSEGV) received.
  total: 21
  run: 1
  pass: 0
  fail: 1
  pass rate: 0.000000
Comment 15 Armin K 2016-05-09 14:30:47 UTC
#0  std::__1::__tree_next<std::__1::__tree_node_base<void*>*> (__x=<optimized out>) at /usr/bin/../include/c++/v1/__tree:151
#1  std::__1::__tree_const_iterator<gbe::ir::BasicBlock const*, std::__1::__tree_node<gbe::ir::BasicBlock const*, void*>*, long>::operator++ (this=<optimized out>)
    at /usr/bin/../include/c++/v1/__tree:703
#2  gbe::GenRegAllocator::Opaque::allocateFlags (this=<optimized out>, selection=...) at /home/armin/src/Beignet-1.1.1-Source/backend/src/backend/gen_reg_allocation.cpp:704
#3  0x00007ffff10b4a27 in gbe::GenRegAllocator::Opaque::allocate (this=<optimized out>, selection=...)
    at /home/armin/src/Beignet-1.1.1-Source/backend/src/backend/gen_reg_allocation.cpp:1241
#4  gbe::GenRegAllocator::allocate (this=<optimized out>, selection=...) at /home/armin/src/Beignet-1.1.1-Source/backend/src/backend/gen_reg_allocation.cpp:1350
#5  0x00007ffff10d39f9 in gbe::GenContext::emitCode (this=0x8b97b0) at /home/armin/src/Beignet-1.1.1-Source/backend/src/backend/gen_context.cpp:3275
#6  0x00007ffff0fff2ce in gbe::Context::compileKernel (this=<optimized out>) at /home/armin/src/Beignet-1.1.1-Source/backend/src/backend/context.cpp:360
#7  0x00007ffff10de05b in gbe::GenProgram::compileKernel (this=<optimized out>, unit=..., name=..., relaxMath=<optimized out>, profiling=<optimized out>)
    at /home/armin/src/Beignet-1.1.1-Source/backend/src/backend/gen_program.cpp:194
#8  0x00007ffff100211e in gbe::Program::buildFromUnit (this=<optimized out>, unit=..., error=...) at /home/armin/src/Beignet-1.1.1-Source/backend/src/backend/program.cpp:178
#9  0x00007ffff1001f88 in gbe::Program::buildFromLLVMFile (this=0x6ad640, fileName=0x0, module=0x6c0ff0, error=..., optLevel=1)
    at /home/armin/src/Beignet-1.1.1-Source/backend/src/backend/program.cpp:156
#10 0x00007ffff10df798 in gbe::genProgramNewFromLLVM (deviceID=6422, fileName=<optimized out>, module=0x6c0ff0, llvm_ctx=0xb647d0, asm_file_name=0x0, 
    stringSize=<optimized out>, err=0xffffffff <error: Cannot access memory at address 0xffffffff>, errSize=0x1916, optLevel=9628880, options=<optimized out>)
    at /home/armin/src/Beignet-1.1.1-Source/backend/src/backend/gen_program.cpp:421
#11 0x00007ffff1004188 in gbe::programNewFromSource (deviceID=<optimized out>, source=<optimized out>, stringSize=1048576, options=<optimized out>, err=<optimized out>, 
    errSize=0xb51b70) at /home/armin/src/Beignet-1.1.1-Source/backend/src/backend/program.cpp:944
#12 0x00007ffff7962dea in cl_program_build (p=0xb51ae0, options=0x0) at /home/armin/src/Beignet-1.1.1-Source/src/cl_program.c:577
#13 0x00007ffff795acdd in clBuildProgram (program=0xb51ae0, num_devices=<optimized out>, device_list=<optimized out>, options=0x92ec00 "@4\241", pfn_notify=0x0, user_data=0x0)
    at /home/armin/src/Beignet-1.1.1-Source/src/cl_api.c:957
#14 0x00007ffff7bbe350 in cl_kernel_init (file_name=0x7ffff7bd1d15 "vload_bench.cl", kernel_name=0xb56a10 "vload_bench_10000uchar2", format=<optimized out>, build_opt=0x0)
    at /home/armin/src/Beignet-1.1.1-Source/utests/utest_helper.cpp:261
#15 0x00007ffff7bc27c4 in vload_bench<unsigned char> (benchMode=255, kernelFunc=<optimized out>, N=<optimized out>, offset=<optimized out>)
    at /home/armin/src/Beignet-1.1.1-Source/utests/vload_bench.cpp:15
#16 vload_bench_uchar () at /home/armin/src/Beignet-1.1.1-Source/utests/vload_bench.cpp:95
#17 __ANON__vload_bench_uchar__ () at /home/armin/src/Beignet-1.1.1-Source/utests/vload_bench.cpp:95
#18 0x00007ffff7bbd4c4 in UTest::do_run (utest=<error reading variable: access outside bounds of object referenced via synthetic pointer>)
    at /home/armin/src/Beignet-1.1.1-Source/utests/utest.cpp:165
#19 UTest::runAllBenchMark () at /home/armin/src/Beignet-1.1.1-Source/utests/utest.cpp:241
#20 0x0000000000401ac7 in main (argc=<optimized out>, argv=<optimized out>) at /home/armin/src/Beignet-1.1.1-Source/benchmark/benchmark_run.cpp:101
Comment 16 Xiuli Pan 2016-05-10 07:38:18 UTC
Hi Armin,

I have tried to build beignet with clang + libc++ and it seems that this bug could rename to
Comment 17 Xiuli Pan 2016-05-10 07:42:19 UTC
Hi Armin,

I have tried to build beignet with clang + libc++ and it seems that this bug could rename to beignet could not work with clang + libc++.

The problem could be fixed by these two patches used for Android, they seems to share some requirements for stl.

Comment 18 Armin K 2016-05-10 08:15:23 UTC
Created attachment 123601 [details]
benchmark log


The two mentioned patches seem to fix beignet here. At least 15 benchmarks pass normally, before an assertion is hit in 16th (but I don't think that might be related to this bug). New benchmark log attached.
Comment 19 Xiuli Pan 2016-05-10 08:19:39 UTC
If you are using the newest master branch, then this is a benchmark bug, try this patch:

Comment 20 Armin K 2016-05-10 08:24:20 UTC
Now all is well

  total: 21
  run: 21
  pass: 21
  fail: 0
  pass rate: 1.000000
Comment 21 Xiuli Pan 2016-05-10 08:26:56 UTC
Hi Armin,

What about the utests?
If all thing is fine now I will close this bug, and these patches will later be in master branch.

Comment 22 Armin K 2016-05-10 08:40:47 UTC
Created attachment 123604 [details]
utests log


There are 3 failures in 803 tests. See for yourself.
Comment 23 Xiuli Pan 2016-05-10 08:57:03 UTC
Hi Armin,

It seems something wrong with the math lib, the cpu and gpu result for builtin_tgamma is the same but it is still fail.
For the CMRT test, it seems if you did not have cmrt beignet will not compiler with this test. If the test is wrong maybe something wrong with the cmrt.

Could you export OCL_OUTPUT_BUILD_LOG=1 for the cmrt test?

Comment 24 Armin K 2016-05-10 09:00:15 UTC

The math lib is the one from glibc-2.23. glibc-2.23 was compiled with gcc-5.3.0.

I have this cmrt, it was needed for vp9 support in libva-intel-driver:

Comment 25 Armin K 2016-05-10 09:03:41 UTC
I believe I may know why the cmrt test fails.

The thing is, cmrt can't be built with clang++ / libc++ due to it containing a binary file linking to libstdc++.

For some reason, libstdc++ and libc++ don't like sharing the same program space and the program will crash when it uses libc++ but some library it uses is using libstdc++.
Comment 26 Xiuli Pan 2016-05-10 09:13:18 UTC
Maybe that is bug for cmrt.

I have look into those fail builtin case, it seems something wrong with the 
ling49:  else if (fabsf(cpu - dst[i]) >= cl_FLT_ULP(cpu) * ULPSIZE_FACTOR) {
the cpu and dst[i] are the same here as they have been print out in the log, but if the result went wrong there maybe something wrong with fabsf or cl_FLT_ULP. And with the fail of builtin_pow()    [FAILED]
    Error: (fabs(gpu_data[index_cur] - cpu_data[index_cur]) < cl_FLT_ULP(cpu_data[index_cur]) * ULPSIZE_FACTOR) || (!denormals_supported && gpu_data[index_cur]==0 && std::fpclassify(cpu_data[index_cur])==FP_SUBNORMAL)
  at file /home/armin/src/Beignet-1.1.1-Source/utests/builtin_pow.cpp, function builtin_pow, line 95

It seems cl_FLT_ULP are more likely broken. Could you try to add some log in the test case just print out both fabsf(cpu - dst[i]) and cl_FLT_ULP(cpu) * ULPSIZE_FACTOR?

Comment 27 Armin K 2016-05-10 09:24:12 UTC
(In reply to Xiuli Pan from comment #26)
> It seems cl_FLT_ULP are more likely broken. Could you try to add some log in
> the test case just print out both fabsf(cpu - dst[i]) and cl_FLT_ULP(cpu) *

DBG: 0.000000
DBG: 0.000000

First one is fabsf(cpu - dst[i]), second one is cl_FLT_ULP(cpu) * ULPSIZE_FACTOR?

I've added the following printf at the line 50:

printf("DBG: %f\nDBG: %f\n", fabsf(cpu - dst[i]), cl_FLT_ULP(cpu) * ULPSIZE_FACTOR);
Comment 28 Xiuli Pan 2016-05-27 02:35:26 UTC
Hi Armin,

Sorry for the late replay, I have a test on a success machine and have the result here for the same cpu and gpu data you showed above.

printf("%f %a %a %a %a %d\n", src[i], cpu, dst[i], fabsf(cpu - dst[i]), cl_FLT_ULP(cpu), fabsf(cpu - dst[i]) >= cl_FLT_ULP(cpu) * ULPSIZE_FACTOR);

-63.999001 0x0p+0 0x0p+0 0x0p+0 0x1p-149 0

It seems that cl_FLT_ULP(cpu) should have something that not equal zero, and it is a denormal number.

You can have a try with a %a format to see if we have the same result for the same src. If not then this problem is something with the test cases.

Comment 29 Armin K 2016-05-30 21:51:06 UTC

I've updated to llvm/clang 3.9.0 svn and beignet doesn't build with that version, so I'm unable to test anything until beignet gains llvm/clang 3.9.0 support.
Comment 30 Armin K 2016-08-16 21:58:15 UTC
Created attachment 125829 [details]
Latest benchmark log
Comment 31 Armin K 2016-08-16 21:58:30 UTC
Created attachment 125830 [details]
Latest utests log
Comment 32 Armin K 2016-08-16 22:01:27 UTC
Running beignet git master (https://cgit.freedesktop.org/beignet/commit/?id=855b094669fdd243a5108e16d6abd14b8a2880fe) with 5 LLVM 3.9 patches applied (https://lists.freedesktop.org/archives/beignet/2016-August/007843.html).

It was compiled with clang version 3.9.0 (branches/release_39 278597) (libc++ / libc++abi same svn revision).

LLVM 3.9 patches can be "Tested-by: Armin K. <krejzi@email.com>" if required.

CMRT test still fails. I still suspect the issue with binary blob it contains, as mixing libc++ and libstdc++ programs often leads to trouble.

So far, it looks fine here. Kernel 4.8-rc2 / Mesa 12.1.0 devel git f9f4629.
Comment 33 Xiuli Pan 2016-08-17 08:51:57 UTC
Hi Armin,

I will ask my colleague about the CMRT problem.

Hi yejun,

Is CMRT extension related with libc++ and libstdc++ libs?

Comment 34 Armin K 2016-08-17 10:39:26 UTC
Apparently, cmrt has a binary blob, in jitter/ directory which is linked to libstdc++:

$ ldd jitter/igfxcmjit64.so 
        linux-vdso.so.1 (0x00007fffb97bc000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fa37c558000)
        libm.so.6 => /lib/libm.so.6 (0x00007fa37c257000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fa37c041000)
        libc.so.6 => /lib/libc.so.6 (0x00007fa37bc7c000)
        /lib/ld-linux-x86-64.so.2 (0x0000555fc21b0000)

However, that blob doesn't seem to be linked into libcmrt, which beignet uses, so no libstdc++ references there:

$ ldd /usr/lib/libcmrt.so
        linux-vdso.so.1 (0x00007ffe796d0000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x00007f9cf8f8f000)
        libdl.so.2 => /lib/libdl.so.2 (0x00007f9cf8d8b000)
        libdrm.so.2 => /usr/lib/libdrm.so.2 (0x00007f9cf8b7b000)
        libdrm_intel.so.1 => /usr/lib/libdrm_intel.so.1 (0x00007f9cf8959000)
        libva.so.1 => /usr/lib/libva.so.1 (0x00007f9cf8738000)
        libc++.so.1 => /usr/lib/libc++.so.1 (0x00007f9cf847a000)
        libc++abi.so.1 => /usr/lib/libc++abi.so.1 (0x00007f9cf8232000)
        libm.so.6 => /lib/libm.so.6 (0x00007f9cf7f31000)
        libc.so.6 => /lib/libc.so.6 (0x00007f9cf7b6c000)
        /lib/ld-linux-x86-64.so.2 (0x000055cb7acc9000)
        libpciaccess.so.0 => /usr/lib/libpciaccess.so.0 (0x00007f9cf7964000)
        librt.so.1 => /lib/librt.so.1 (0x00007f9cf775c000)

Something else must be in play here.
Comment 35 Armin K 2016-08-17 10:57:54 UTC
Found the issue. beignet seems to dlopen() the lib. However, cmake sets the path to library incorrectly. Instead of setting it to /usr/lib/libcmrt.so, it sets it to /libcmrt.so.

If I create a temporary symlink, so I have /libcmrt.so, utest now fails with the following error:

runtime_cmrt()    Interrupt signal (SIGABRT) received.
Comment 36 Armin K 2016-08-17 11:08:10 UTC
Found out where SIGABRT is comming from. libcmrt.so was trying to dlopen() igfxcmjit64.so, which I didn't install. After installing it to /usr/lib, the test seems to pass.

So, the main issue is beignet misdetecting location of cmrt.


CMRT_LIBRARY_DIRS doesn't seem to be defined.
Comment 37 Armin K 2016-08-17 11:43:25 UTC
Created attachment 125844 [details] [review]
Fix cmrt detection

After applying this patch and rebuilding beignet, all utests now pass (well, all that are being ran anyways).
Comment 38 Armin K 2016-08-17 11:43:52 UTC
Created attachment 125845 [details]
utests log after applying cmrt detection fix
Comment 39 Guo Yejun 2016-08-18 00:47:56 UTC
yes, cmrt jitter is needed to run runtime_cmrt, see https://lists.freedesktop.org/archives/beignet/2015-November/006690.html and https://github.com/01org/cmrt/blob/master/jitter/readme.txt.

thanks for your patch in attachment 125844 [details] [review], please send your patch to beignet@lists.freedesktop.org for upstream, thanks.
Comment 40 Armin K 2016-11-03 11:53:43 UTC
Everything seems to be fixed now.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.