Bug 98647

Summary: drm_intel_gem_bo_context_exec() failed: No space left on device
Product: Beignet Reporter: kenneth johansson <ken>
Component: BeignetAssignee: rongyang <rong.r.yang>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: aklhfex, evangelos, intelfx, ismo.puustinen, johann_frei, jolan, jylo06g, nroof, sauron, victzhang
Version: unspecified   
Hardware: Other   
OS: All   
URL: https://patchwork.freedesktop.org/patch/121347/
Whiteboard:
i915 platform: i915 features:

Description kenneth johansson 2016-11-08 23:16:49 UTC
on an macbook pro running ubuntu 16.04

have never used this software before. This is after following the readme.
works until I try to run the test program.

-----------------------
./utest_run some_unit_test
platform number 1
platform_profile "FULL_PROFILE"
platform_name "Intel Gen OCL Driver"
platform_vendor "Intel"
platform_version "OpenCL 1.2 beignet 1.3 (git-75b6f38)"
platform_extensions "cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short"
drm_intel_gem_bo_context_exec() failed: No space left on device
    Interrupt signal (SIGSEGV) received.
summary:
----------
  total: 982
  run: 0
  pass: 0
  fail: 1
  pass rate: 0.000000
Comment 1 Chris Wilson 2016-11-14 13:35:53 UTC
https://patchwork.freedesktop.org/patch/121347/
Comment 2 Chris Wilson 2016-11-14 13:36:36 UTC
Hmm, wrong errno - possibly a different bug...
Comment 3 Xiuli Pan 2016-11-16 01:46:08 UTC
(In reply to kenneth johansson from comment #0)
> on an macbook pro running ubuntu 16.04
> 
> have never used this software before. This is after following the readme.
> works until I try to run the test program.
> 
> -----------------------
> ./utest_run some_unit_test
> platform number 1
> platform_profile "FULL_PROFILE"
> platform_name "Intel Gen OCL Driver"
> platform_vendor "Intel"
> platform_version "OpenCL 1.2 beignet 1.3 (git-75b6f38)"
> platform_extensions "cl_khr_global_int32_base_atomics
> cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics
> cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store
> cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images
> cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups
> cl_intel_subgroups_short"
> drm_intel_gem_bo_context_exec() failed: No space left on device
>     Interrupt signal (SIGSEGV) received.
> summary:
> ----------
>   total: 982
>   run: 0
>   pass: 0
>   fail: 1
>   pass rate: 0.000000

Hi Kenneth,

Could you clean your dmesg first and then run the utest again to see if there is any dmesg about the drm?
Also could you try to run the clinfo (you can get one by apt-get install clinfo) to get some of the platform info. Also please provide the llvm version you are using. 

Thanks
Xiuli
Comment 4 Xiuli Pan 2016-12-01 03:00:30 UTC
*** Bug 98882 has been marked as a duplicate of this bug. ***
Comment 5 Ivan Shapovalov 2016-12-27 22:51:48 UTC
(In reply to Chris Wilson from comment #2)
> Hmm, wrong errno - possibly a different bug...

Well, at least my bug 98882 was marked as a duplicate of *this*.

BTW, any plans to actually fix this? beignet is broken since exactly that commit...
Comment 6 Xiuli Pan 2016-12-28 04:42:43 UTC
Hi,

We have tried to reproduced this bug but failed, we are not sure if this bug is related to PPGTT or something esle. The commit you bisect out is our pre-work for OpenCL 2.0 and this may need PPGTT support, could you check if the PPGTT on your device is on and provide the dmsg with the drm debug on.

Thanks
Xiuli
Comment 7 Ivan Shapovalov 2016-12-28 19:55:42 UTC
(In reply to Xiuli Pan from comment #6)
> We have tried to reproduced this bug but failed, we are not sure if this bug
> is related to PPGTT or something esle. The commit you bisect out is our
> pre-work for OpenCL 2.0 and this may need PPGTT support, could you check if
> the PPGTT on your device is on and provide the dmsg with the drm debug on.

How do I check that? Not that I have any familiarity with Intel's driver internals...
Comment 8 Thomas DEBESSE 2017-01-15 19:21:18 UTC
I got the same error message as described in #98882 which was marked as a duplicate of this one:

```
$ clinfo
Number of platforms                               1
  Platform Name                                   Intel Gen OCL Driver
  Platform Vendor                                 Intel
  Platform Version                                OpenCL 1.2 beignet 1.3
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short
  Platform Extensions function suffix             Intel
drm_intel_gem_bo_context_exec() failed: Device or resource busy
Segmentation fault
```

PPGTT seems to be enabled on my end:

```
# cat /sys/module/i915/parameters/enable_ppgtt
1
```

I'm running the Haswell platform.

```
$ lspci -s 00:02.0 -nn 
00:02.0 VGA compatible controller [0300]: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller [8086:0416] (rev 06)
$ glxinfo | grep 'OpenGL renderer'
OpenGL renderer string: Mesa DRI Intel(R) Haswell Mobile 
```
Comment 9 Giuseppe Bilotta 2017-01-30 14:54:57 UTC
I am also affected by this bug, with current beignet master (8efa803f2f93e377b30ff957a74c5d69beec7744), on a Dell XPS 15 from 2013

/proc/cpuinfo reads:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 60
model name      : Intel(R) Core(TM) i7-4712HQ CPU @ 2.30GHz
stepping        : 3
microcode       : 0x20
cpu MHz         : 2300.421
cache size      : 6144 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_t
bugs            :
bogomips        : 4589.58
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:


A workaround I've found is to disable the detection of HAVE_DRM_INTEL_BO_SET_SOFTPIN. Commenting the line in CMakeLists.txt ensure that the HAS_BO_SET_SOFTPIN define does not get set, and the device becomes usable again.

I also have ppgtt enable. I'm running on a Linux kernel 4.9.2 (from Debian unstable), libdrm 2.4.74
Comment 10 Ivan Shapovalov 2017-02-02 04:12:44 UTC
(In reply to Xiuli Pan from comment #6)
> Hi,
> 
> We have tried to reproduced this bug but failed, we are not sure if this bug
> is related to PPGTT or something esle. The commit you bisect out is our
> pre-work for OpenCL 2.0 and this may need PPGTT support, could you check if
> the PPGTT on your device is on and provide the dmsg with the drm debug on.
> 
> Thanks
> Xiuli

So:

1. PPGTT is on

----
# cat /sys/module/i915/parameters/enable_ppgtt
1
----

2. This is Haswell, integrated graphics of Intel Core i7-4700MQ

3. I'm not sure which logs you want. The kernel log with `drm.debug=0x3f` is very big. Which parts of it do you need?
Comment 11 Ivan Shapovalov 2017-02-02 04:36:03 UTC
(In reply to Xiuli Pan from comment #6)
> Hi,
> 
> We have tried to reproduced this bug but failed, we are not sure if this bug
> is related to PPGTT or something esle. The commit you bisect out is our
> pre-work for OpenCL 2.0 and this may need PPGTT support, could you check if
> the PPGTT on your device is on and provide the dmsg with the drm debug on.
> 
> Thanks
> Xiuli

OK, here is the kernel log with `drm.debug=0x1f` corresponding (roughly) to the time of running a sample OpenCL workload (CLBlast unit test): https://intelfx.name/files/2017-02-02%20beignet%20debug.log
Comment 12 Rebecca Palmer 2017-02-02 22:34:59 UTC
That link doesn't work - did you misspell it?
Comment 13 Ivan Shapovalov 2017-02-03 01:58:49 UTC
(In reply to Rebecca Palmer from comment #12)
> That link doesn't work - did you misspell it?

Sorry, my server went down unexpectedly (I could not find a pastebin that would accomodate a 6 MiB file). I'll bring it back later today.
Comment 14 Ivan Shapovalov 2017-02-03 07:24:58 UTC
(In reply to Rebecca Palmer from comment #12)
> That link doesn't work - did you misspell it?

Fixed. The link did wrap in that comment, but it should not matter.
Comment 15 Adrian Siemieniak 2017-02-04 14:32:40 UTC
I can confirm error (dup. of this - 98882) on my system:

Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
Linux (source compiled) 4.9.0 #1 SMP Sat Dec 31 11:03:07 CET 2016 x86_64 GNU/Linux
OS: debian sid
ii  beignet-dev:amd64                             1.3.0-1
ii  beignet-opencl-icd:amd64                      1.3.0-1
Application is source compiled GIMP 2.9.5
PPGTT is on

Error:
drm_intel_gem_bo_context_exec() failed: Device or resource busy
Beignet: "Exec event 0x80efe190 error, type is 4592, error staus is -5"
drm_intel_gem_bo_context_exec() failed: Device or resource busy
Beignet: "Exec event 0x81711200 error, type is 4597, error staus is -5"

Cheers - Adrian
Comment 16 CUI Hao 2017-02-21 02:11:26 UTC
Arch Linux with stock 4.9.8-1 kernel, libdrm 2.4.75.
CPU is i3-3110M (Ivy Bridge).

Beignet 1.3.0 and latest git version (cb4f2adc) doesn't work as well, but no SIGSEGV.

For example (with cb4f2adc commit):

------
$ ./utest_run test_load_program_from_bin_file
platform number 1
platform_profile "FULL_PROFILE"
platform_name "Intel Gen OCL Driver"
platform_vendor "Intel"
platform_version "OpenCL 2.0 beignet 1.4 (git-cb4f2adc)"
platform_extensions "cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing"
drm_intel_gem_bo_context_exec() failed: Device or resource busy
Beignet: "Exec event 0x55b9cbc4cc00 error, type is 4592, error status is -5"
device_profile "FULL_PROFILE"
device_name "Intel(R) HD Graphics IvyBridge M GT2"
device_vendor "Intel"
device_version "OpenCL 1.2 beignet 1.4 (git-cb4f2adc)"
device_extensions "cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing cl_intel_motion_estimation"
device_opencl_c_version "OpenCL C 1.2 beignet 1.4 (git-cb4f2adc)"
27 image formats are supported
......
test_load_program_from_bin_file()drm_intel_gem_bo_context_exec() failed: Device or resource busy
Beignet: "Exec event 0x55b9cb927fe0 error, type is 4592, error status is -5"
    [FAILED]
    Error: ((float *)buf_data[1])[i] == cpu_dst[i]
  at file /mnt/Lilar/userdata/Downloads/beignet/utests/load_program_from_bin_file.cpp, function test_load_program_from_bin_file, line 76
......
------
Comment 17 Ismo Puustinen 2017-02-21 10:48:58 UTC
I'm also hit by the "Beignet: "Exec event 0x55b9cb927fe0 error, type is 4592, error status is -5" bug. I'm running Beignet 1.4.0 (git cb4f2adcb78c71fae4) on Minnowboard Turbot (Atom E3826), kernel 4.9.6.
Comment 18 Jerome Kieffer 2017-02-26 09:34:15 UTC
I wonder if this is the same bug ... but it looks like.

I am running a debian9-stretch (kernel 4.9) with the Beignet 1.3 on a macbook pro-13 from late 2014 (Haswell processor). Former versions of Beignet used to work.

This is the output of clinfo (short version):
root@mac13:/home/kieffer/bin# clinfo -l
drm_intel_gem_bo_context_exec() failed: Device or resource busy
Beignet: "Exec event 0x15603c0 error, type is 4592, error staus is -5"
Platform #0: Intel Gen OCL Driver
 `-- Device #0: Intel(R) HD Graphics Haswell Ultrabook GT3 reserved
Platform #1: Intel(R) OpenCL
 `-- Device #0: Intel(R) Core(TM) i5-4308U CPU @ 2.80GHz
Platform #2: AMD Accelerated Parallel Processing
 `-- Device #0: Intel(R) Core(TM) i5-4308U CPU @ 2.80GHz

Running any kernel fails (for example clpeak is a simple benchmarking tool for OpenCL which used to work with Beignet 1.2):

kieffer@mac13:~$ clpeak -p 0 --kernel-latency 
drm_intel_gem_bo_context_exec() failed: Device or resource busy
Beignet: "Exec event 0xf2f050 error, type is 4592, error staus is -5"

Platform: Intel Gen OCL Driver
  Device: Intel(R) HD Graphics Haswell Ultrabook GT3 reserved
    Driver version  : 1.3 (Linux x64)
    Compute units   : 40
    Clock frequency : 1000 MHz

    Kernel launch latency : drm_intel_gem_bo_context_exec() failed: Device or resource busy
Beignet: "Exec event 0xb68d20 error, type is 4592, error staus is -5"
drm_intel_gem_bo_context_exec() failed: Device or resource busy
Beignet: "Exec event 0x1ac60a0 error, type is 4592, error staus is -5"
drm_intel_gem_bo_context_exec() failed: Device or resource busy
Beignet: "Exec event 0xb68d20 error, type is 4592, error staus is -5"
clGetEventProfileInfo (-7)
      Tests skipped

Kernel:
kieffer@mac13:~$ uname -a
Linux mac13 4.9.0-1-amd64 #1 SMP Debian 4.9.6-3 (2017-01-28) x86_64 GNU/Linux

Version installed:
kieffer@mac13:~$ dpkg -l |grep beignet
ii  beignet                                   1.3.0-1                              amd64        OpenCL library for Intel GPUs - transitional dummy package
ii  beignet-dev:amd64                         1.3.0-1                              amd64        OpenCL for Intel GPUs (development files and documentation)
ii  beignet-opencl-icd:amd64                  1.3.0-1                              amd64        OpenCL library for Intel GPUs

kieffer@mac13:~$ dpkg -l |grep intel
ii  intel-microcode                           3.20161104.1                         amd64        Processor microcode firmware for Intel CPUs
ii  intel-opencl-icd                          5.0.0.57-2                           amd64        OpenCL™ runtime for Intel® CPU device
ii  libdrm-intel1:amd64                       2.4.74-1                             amd64        Userspace interface to intel-specific kernel DRM services -- runtime
ii  libdrm-intel1:i386                        2.4.74-1                             i386         Userspace interface to intel-specific kernel DRM services -- runtime
ii  xserver-xorg-video-intel                  2:2.99.917+git20161206-1             amd64        X.Org X server -- Intel i8xx, i9xx display driver
Comment 19 Victor 2017-04-14 09:46:15 UTC
Have the same issue on Ubuntu 17.04, kernel 4.10.0-19-generic and beignet 1.3.
My GPU is Intel(R) HD Graphics IvyBridge M GT2.

Get the following info when running clinfo:

drm_intel_gem_bo_context_exec() failed: Device or resource busy
Beignet: "Exec event 0x249e1d0 error, type is 4592, error staus is -5"
Comment 20 nRoof 2017-06-19 22:21:19 UTC
I'm getting the same error as in the Description when Compatibility Support Module (CSM) is disabled and/or VT-d is enabled in my motherboard BIOS (UEFI).

Also the same issue is reproducible with LuxRender.

When CSM is enabled and VT-d is disabled, the issue is not reproducible. For example, "utest_run compiler_box_blur_float" is passed, and LuxRender renders without errors.
Comment 21 nRoof 2017-06-19 22:26:41 UTC
Addition to my previous comment:
The system is freshly updated Arch Linux with linux-libre 4.11.4_gnu-1. MB is AsRock Z97-Extreme6. CPU is i7-4970K with HD 4600 iGPU.
Comment 22 nRoof 2017-06-19 22:30:24 UTC
(In reply to nRoof from comment #21)
> Addition to my previous comment:
> The system is freshly updated Arch Linux with linux-libre 4.11.4_gnu-1. MB
> is AsRock Z97-Extreme6. CPU is i7-4970K with HD 4600 iGPU.

Typo: CPU is i7-4790K
Comment 23 rongyang 2017-06-20 05:40:25 UTC
which version beignet do you use?
Comment 24 nRoof 2017-06-20 17:50:28 UTC
(In reply to rongyang from comment #23)
> which version beignet do you use?

1.3.1
Comment 25 Ivan Shapovalov 2017-06-27 14:18:33 UTC
OK, this now works for me (apart from an unrelated type messup in 8d3e93fa, but that's unrelated).
Comment 26 Joseph Thommes 2017-11-19 02:40:21 UTC
I can confirm this error using convert (imagemagick), beignet version 1.3.1-4.
Processor: Intel Core i7-3517U
arch-linux system 4.13.12-1-ARCH
Comment 27 Benjamin Hodgetts 2017-12-11 16:44:58 UTC
Also seeing this on a Xeon E3-1246 v3 (HD P4600 GPU) using Beignet 1.3.1 and Kernel 4.14.4.
Comment 28 GitLab Migration User 2018-10-12 21:26:00 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/beignet/beignet/issues/54.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.