Bug 88804 - Beignet ratGPU benchmark bug
Summary: Beignet ratGPU benchmark bug
Status: RESOLVED MOVED
Alias: None
Product: Beignet
Classification: Unclassified
Component: Beignet (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Zhigang Gong
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-26 16:21 UTC by gnn
Modified: 2018-10-12 21:23 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
full dmesg, unit test results and clinfo output (26.68 KB, text/plain)
2015-01-26 16:21 UTC, gnn
Details
Screenshot with Image corruption (525.27 KB, image/png)
2015-01-28 08:57 UTC, gnn
Details

Description gnn 2015-01-26 16:21:06 UTC
Created attachment 112835 [details]
full dmesg, unit test results and clinfo output

There is a bug with Beignet and ratGPU OpenCL renderer with benchmark (http://www.ratgpu.com) on Intel IvyBridge i5 3470 (GT1) GPU.
After starting benchmarking, I get few seconds hang and app shows empty result image (but it shows, that it's making progress).

In dmesg I get output like:
[47797.300442] [drm] stuck on render ring
[47797.300446] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[47797.300447] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[47797.300447] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[47797.300448] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[47797.300449] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[47797.304207] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x622ee000 ctx 1) at 0x622ee0ec
[47804.311172] [drm] stuck on render ring
[47804.311211] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x622ee000 ctx 1) at 0x622ee0ec
[47804.311213] [drm:i915_context_is_banned] *ERROR* context hanging too fast, declaring banned!


Software: Ubuntu 14.04 x64 + stock kernel (3.13.0-44) and mesa (10.1.3)
Beignet:  1.0.1 built from sources with llvm 3.4.

Unit tests:

compiler_fill_image_1d_array()    [FAILED]
    Error: dst[j*w + i] == 0

summary:
----------
  total: 699
  run: 698
  pass: 697
  fail: 1
  pass rate: 0.998567

compiler_fill_image_1d_array - this test fails sometimes, but not always. Usually fails, when I work with other applications, like Firefox, during the test.

LuxMark 2.0 x64 with default "sala" scene works, and show 149 points (CPU - 370points).

I'm also getting graphics glitches and image corruptions (but usually not severe), if OpenCL applications are working.

I also tried to build Beignet with llvm-3.5 from Ubuntu repos (got some build problems in a process too), but it doesn't seem to make any difference.

In attachment I put full dmesg, unit test results and clinfo output.

I also tried ratCPU & LuxMark with the same software on HSW (GT2) i7 laptop, but both apps showed dmesg with "stuck on render ring" and blank results.
Comment 1 Zhigang Gong 2015-01-27 06:39:49 UTC
I can reproduce the ratgpu hang issue on my IVB machine. But can't reproduce the compiler_fill_image_1d_array problem. As to this problem, I have two question for you.

1. If you don't try to run ratgpu after a clean reboot, did you trigger the image1d_array problem? Or it will occur only after you ran the ratgpu benchmark.

2. Did you check the dmesg before and after you ran the image1d_array and got a failure from it? Did you happen to find that each image1d_array failure triggers a new GPU hang event in demesg?

As to the issues you met on HSW (GT2) machine, did you try to run utests firstly. And if so, what's the summary of the utest running. Luxmark should work well on HSW if the configuration is correct, please check the known issue section in the wiki page about the HSW platform support. You need to patch the kernel to enable HSW support.

Now, I would like to back to this bug which is mainly for ratGPU hang. I took a quick look at the hang kernel's assembly code, and didn't find any interesting thing (no atomic/no barrier). As we don't have the source code of ragGPU, we can't do too much for this case. The best guessing I have now is that their code may be for nvidia/amd only. Thus for beignet, their kernel code may be incomplete and thus causes some infinite loops. If we can get the source code, we may do some further investigation. Otherwise, we don't have too much to do with this bug.

BTW, if you still met luxmark/utest issue on HSW after you correct the configuration and kernel version, you are welcome to file another bug to track that issue.
Comment 2 Zhigang Gong 2015-01-27 06:41:48 UTC
> 
> LuxMark 2.0 x64 with default "sala" scene works, and show 149 points (CPU -
> 370points).
> 

Forgot to mention, the score is relatively low, as your IVB machine is a GT1 machine which only have 6 EUs.
Comment 3 gnn 2015-01-27 09:32:24 UTC
Additional information:
1) Unit tests don't produce any dmesg errors. Only ratGPU app does.
2) I tried running utests straight after reboot and login. The compiler_fill_image_1d_array() test fails once in 7 - 10 runs of utests, but I'm getting graphics flickering or small corruptions of some sort during every run. BTW, I use Ubuntu Unity environment, which uses OpenGL 2.0 and runs compiz. Could this fact help?

About HSW(GT2) laptop: yes, utests had 19 failures, and I hadn't patched kernel. It must be the reason. Could that kernel patch (or some other workaround for HSW) be expected in future kernels or in bug fixes for current ones?

BTW, I tried ratGPU on ivy bridge mac mini (with mobile i5 chip) under os x 10.10  and it worked, but, of cause, without sources one couldn't be sure what is the difference.
Comment 4 gnn 2015-01-27 13:12:03 UTC
I ready to run some additional applications or tests, if necessary.
Comment 5 Zhigang Gong 2015-01-27 14:49:16 UTC
(In reply to gnn from comment #4)
> I ready to run some additional applications or tests, if necessary

You are welcome to do more tests with additional applications, especially for those open source application. Another known issue is the the CYCLE engine for Blender is not support currently. There should be no other known issues for applications. If you found any issues, please feel free to submit bugs and we will investigate soon.

Thanks.
Comment 6 gnn 2015-01-28 07:27:20 UTC
I run utests after reboot and login in pure xterm session (xinit /usr/bin/xterm). The compiler_fill_image_1d_array() test still fails once in 7 - 10 runs of utests, but no graphics flickering or any corruptions during tests. So, using beignet opencl on my system affects Ubuntu Unity session because it uses OpenGL.

I also interested about future of HSW support: is that Linux kernel patch for HSW support going to be integrated in future kernels (like 3.21), or maybe committed as a bug fix in current kernels?
Comment 7 gnn 2015-01-28 08:57:06 UTC
Created attachment 112908 [details]
Screenshot with Image corruption

Fragment of screenshot, where background window (top part of the picture) got corruption during active usage of OpenCL.
Comment 8 Zhigang Gong 2015-01-28 15:19:00 UTC
(In reply to gnn from comment #6)
> I run utests after reboot and login in pure xterm session (xinit
> /usr/bin/xterm). The compiler_fill_image_1d_array() test still fails once in
> 7 - 10 runs of utests, but no graphics flickering or any corruptions during
> tests. So, using beignet opencl on my system affects Ubuntu Unity session
> because it uses OpenGL.
Could you tell more about the detail instructions about how to reproduce the screen corruption issue? Which OpenGL application do you run in background? Is it a random symptom or not? We haven't ran into this type of issues since version 0.9.0.

> 
> I also interested about future of HSW support: is that Linux kernel patch
> for HSW support going to be integrated in future kernels (like 3.21), or
> maybe committed as a bug fix in current kernels?

Kernel 3.20 still doesn't support beignet officially on HSW platform. The issue is still under discussion with kernel team and there is still no official solution till now. Once there is an official kernel could support HSW platform, I will announce it in the mail list and update the wiki page as well.
Comment 9 gnn 2015-01-28 16:17:53 UTC
Flickering doesn't happen when I use Beignet OpenCL apps under simple xterm session (without any desktop environment).
Flickering does happen, when I use Beignet OpenCL apps under Ubuntu Unity desktop environment session, which also runs compiz. Both Unity panel/menu and compiz use GPU and OpenGL, so I think it's connected somehow - beignet affects them.

compiler_fill_image_1d_array() test fail periodically anyway.

Most of the glitches appear and disappear almost instantly, but sometimes them become persistent (e.g. see screenshot) or could make window content blank.
In order to git rid of such glitch, use must make window foreground and trigger full window repaint (e.g. resize it to minimal possible size and then resize it back. So, persistent glitches could affect window buffers/pixmaps, not just a main framebuffer.

How I reproduce it:
1) Computer with Intel i5 3470 and single FullHD display.
2) Ubuntu 14.04 x64 with updates (AFAIK, this Ubuntu uses patched kernel, so may be there's something in it too)
3) Beignet 1.0.1 from sources
4) Ubuntu Unity login session
5) Open several non-maximized windows (like Nautilus, terminal, etc.)
6) Run OpenCL programs with Beignet like LuxMark 2.1 x64 or Beignet unit tests.
7) I see graphics corruptions.
Comment 10 meng 2015-01-29 04:53:56 UTC
(In reply to gnn from comment #9)

> How I reproduce it:
> 1) Computer with Intel i5 3470 and single FullHD display.
> 2) Ubuntu 14.04 x64 with updates (AFAIK, this Ubuntu uses patched kernel, so
> may be there's something in it too)
> 3) Beignet 1.0.1 from sources
> 4) Ubuntu Unity login session
> 5) Open several non-maximized windows (like Nautilus, terminal, etc.)
> 6) Run OpenCL programs with Beignet like LuxMark 2.1 x64 or Beignet unit
> tests.
> 7) I see graphics corruptions.

Hi, I can't reproduce the issue "screen corruption" on my IVB.
The environment: 
IVB: i3-3220
kernel:3.18.1
libdrm: 2.4.58
Other: Ubuntu 14.04 

So could you update your kernel/libdrm, then check it again?
Comment 11 gnn 2015-01-29 12:45:49 UTC
I've upgraded kernel & libdrm from stock Ubuntu 14.04 versions to kernel-3.18.1 and libdrm-2.4.59 (from xorg-edges ppa)

The graphics corruptions and compiler_fill_image_1d_array() test fail still happens, but less frequently.

In order to make graphics corruptions more visible, during OpenCL-beignet activity I'm switching between windows, move and resize them, browse files in nautilus, etc.

So, a bit more specific case:
1) Ubuntu 14.04 
2) Ubuntu Unity environment desktop session (with compiz)
3) Open standard gnome terminal. 
4) Run ./utest-run command (several times). While it outputs the results in gnome terminal, you could try browse in nautilus, resize its window, move it around, run something else (e.g. system monitor or gedit).  
I usually see some corruptions on gnome-terminal output itself, but working with other windows during beignet activity usually gives extra glitches.

The other interesting things is that compiler_fill_image_1d_array() test never fails by itself: when I run it like ./utest_run -c compiler_fill_image_1d_array. I tried lots of times, but it was always successful. It could fail only when run with other tests.


Some cpuinfo:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
stepping	: 9
microcode	: 0x17
cpu MHz		: 3584.125
cache size	: 6144 KB

My motherboard P8Z77-V LX with 4x4GB 1333Mhz RAM
Comment 12 GitLab Migration User 2018-10-12 21:23:41 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/beignet/beignet/issues/19.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.