Bug 96486 - [i915] [skl] GPU HANG: ecode 9:0:0x8ed9fff2 running GEGL pixelize test
Summary: [i915] [skl] GPU HANG: ecode 9:0:0x8ed9fff2 running GEGL pixelize test
Status: RESOLVED FIXED
Alias: None
Product: Beignet
Classification: Unclassified
Component: Beignet (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Xiuli Pan
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-10 20:44 UTC by Jan Vesely
Modified: 2016-08-08 12:47 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
/sys/class/drm/card0/error (379.16 KB, text/plain)
2016-06-10 20:44 UTC, Jan Vesely
Details

Description Jan Vesely 2016-06-10 20:44:55 UTC
Created attachment 124456 [details]
/sys/class/drm/card0/error

Platform Name                                   Intel Gen OCL Driver
Number of devices                                 1
  Device Name                                     Intel(R) HD Graphics Skylake Halo GT2
  Device Vendor                                   Intel
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 beignet 1.2 (git-309afa7)


[ 5706.847167] [drm] stuck on render ring
[ 5706.847921] [drm] GPU HANG: ecode 9:0:0x8ed9fff2, in gegl [6584], reason: Ring hung, action: reset
[ 5706.847947] ------------[ cut here ]------------
[ 5706.847957] WARNING: CPU: 0 PID: 5733 at /usr/src/linux-4.6.1-gentoo/drivers/gpu/drm/i915/intel_display.c:11384 intel_mmio_flip_work_func+0x44d/0x480
[ 5706.847960] WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, ((void *)0), &mmio_flip->i915->rps.mmioflips))
[ 5706.847962] Modules linked in:
[ 5706.847964]  ctr ccm bnep xfs dm_crypt algif_skcipher af_alg uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev iwlmvm kvm_intel kvm mac80211 btusb btrtl btbcm btintel efi_pstore bluetooth irqbypass hp_wmi iwlwifi efivars sparse_keymap nouveau mxm_wmi ttm wmi dm_mod efivarfs ipv6
[ 5706.848000] CPU: 0 PID: 5733 Comm: kworker/0:3 Tainted: G        W       4.6.1-gentoo #1
[ 5706.848002] Hardware name: HP HP ENVY Notebook/80EC, BIOS F.23 10/26/2015
[ 5706.848006] Workqueue: events intel_mmio_flip_work_func
[ 5706.848009]  0000000000000000 ffffffff8133aa9d ffff880369bebd98 0000000000000000
[ 5706.848015]  ffffffff8108102f ffff880205874d00 ffff880369bebde8 ffff880072493600
[ 5706.848019]  0000000000000000 ffff880382419600 0000000000000000 ffffffff8108109a
[ 5706.848023] Call Trace:
[ 5706.848032]  [<ffffffff8133aa9d>] ? dump_stack+0x46/0x59
[ 5706.848037]  [<ffffffff8108102f>] ? __warn+0xbf/0xe0
[ 5706.848041]  [<ffffffff8108109a>] ? warn_slowpath_fmt+0x4a/0x50
[ 5706.848045]  [<ffffffff814bac4d>] ? intel_mmio_flip_work_func+0x44d/0x480
[ 5706.848050]  [<ffffffff81098484>] ? process_one_work+0x144/0x410
[ 5706.848055]  [<ffffffff81098aa5>] ? worker_thread+0x45/0x460
[ 5706.848059]  [<ffffffff81098a60>] ? rescuer_thread+0x310/0x310
[ 5706.848063]  [<ffffffff81098a60>] ? rescuer_thread+0x310/0x310
[ 5706.848066]  [<ffffffff8109d778>] ? kthread+0xb8/0xd0
[ 5706.848073]  [<ffffffff817a4692>] ? ret_from_fork+0x22/0x40
[ 5706.848076]  [<ffffffff8109d6c0>] ? kthread_create_on_node+0x170/0x170
[ 5706.848078] ---[ end trace 9a775e286a6649ac ]---
[ 5706.850242] drm/i915: Resetting chip after gpu hang
[ 5708.835453] [drm] RC6 on
Comment 1 Xiuli Pan 2016-06-12 05:33:49 UTC
Hi Jan,

Could you provide your kernel or the reproduce codes?

Thanks
Xiuli
Comment 2 Jan Vesely 2016-06-12 06:51:52 UTC
(In reply to Xiuli Pan from comment #1)
> Hi Jan,
> 
> Could you provide your kernel or the reproduce codes?
> 
> Thanks
> Xiuli

Hi,

the pixelize kernel is here:
https://git.gnome.org/browse/gegl/tree/opencl/pixelize.cl

compiling gegl and running:

OCL_ICD_VENDORS=/etc/OpenCL/vendors/intel-beignet.icd GEGL_DEBUG=opencl GEGL_PATH="./operations" LD_LIBRARY_PATH="$LD_LIBRARY_PATH:./gegl/.libs/" ./bin/.libs/gegl tests/compositions/pixelize.xml
triggers the hang

the pixelize test:
https://git.gnome.org/browse/gegl/tree/tests/compositions/pixelize.xml

removing the crop operation still triggers GPU HANG
Comment 3 Xiuli Pan 2016-06-12 09:37:53 UTC
Hi Jan,

I have face some problems when building with the gegl, could you provide some information about the beignet, like the llvm versions? And if the in the build progress, it will build some kernels and run?

Thanks
Xiuli
Comment 4 Jan Vesely 2016-06-12 16:23:23 UTC
(In reply to Xiuli Pan from comment #3)
> Hi Jan,
> 
> I have face some problems when building with the gegl, could you provide
> some information about the beignet, like the llvm versions? And if the in
> the build progress, it will build some kernels and run?
> 
> Thanks
> Xiuli

Hi,

beignet is currently built against llvm-3.5, I can try rebuilding against llvm-3.9 (ToT) if it helps.

not sure I understood the gegl part.
gegl is an image manipulation library, some of its operations include accelerated paths using OpenCL. It includes a test suite and pixelize is one of the operations in the test suite (you can try "make check" to run the entire suite)

note that I also use babl git (babl is one of gegl's dependencies [0])

Jan

[0] https://git.gnome.org/browse/babl/
Comment 5 Xiuli Pan 2016-08-08 05:21:41 UTC
Hi Jan,

I now can run the test pixelize with make check or with run-compositions.py and the reproduce code you give me.

I have update to ubuntu 16.04 and the build problem seems sovled.
I tried LLVM 3.6 and LLVM 3.8, with beignet master branch(dff184f0f99d635622e2cb7b74279220852ae4f4)
I could not find git-309afa7 commit, could you provide the beignet you are using? And llvm 3.5 has some bug, you can try with some newer llvm now.

The log is here:
./run-compositions.py pixelize.xml --build-dir ../../build/
/home/pxl/lab/gegl/build/bin/gegl /home/pxl/lab/gegl/tests/compositions/pixelize.xml -o /home/pxl/lab/gegl/build/tests/compositions/output/pixelize.png

(lt-gegl:15046): GEGL-../../../gegl/graph/gegl-node.c-WARNING **: Failed to set operation type gegl:text, using a passthrough op instead

(lt-gegl:15046): GEGL-../../../gegl/graph/gegl-node.c-WARNING **: Failed to set operation type gegl:text, using a passthrough op instead
/home/pxl/lab/gegl/build/tools/gegl-imgcmp /home/pxl/lab/gegl/tests/compositions/reference/pixelize.png /home/pxl/lab/gegl/build/tests/compositions/output/pixelize.png

(lt-gegl-imgcmp:15064): GEGL-../../../gegl/graph/gegl-node.c-WARNING **: Failed to set operation type gegl:text, using a passthrough op instead

(lt-gegl-imgcmp:15064): GEGL-../../../gegl/graph/gegl-node.c-WARNING **: Failed to set operation type gegl:text, using a passthrough op instead
/home/pxl/lab/gegl/tests/compositions/reference/pixelize.png and /home/pxl/lab/gegl/build/tests/compositions/output/pixelize.png differ
  wrong pixels   : 13068/31680 (41.25%)
  max ?e         : 0.108
  avg ?e (wrong) : 0.039(wrong) 0.016(total)
because the error is smaller than 1.50 we'll say /home/pxl/lab/gegl/tests/compositions/reference/pixelize.png and /home/pxl/lab/gegl/build/tests/compositions/output/pixelize.png are identical
PASS pixelize.xml
/home/pxl/lab/gegl/build/bin/gegl /home/pxl/lab/gegl/tests/compositions/pixelize.xml -o /home/pxl/lab/gegl/build/tests/compositions/output/opencl-pixelize.png

(lt-gegl:15081): GEGL-../../../gegl/graph/gegl-node.c-WARNING **: Failed to set operation type gegl:text, using a passthrough op instead

(lt-gegl:15081): GEGL-../../../gegl/graph/gegl-node.c-WARNING **: Failed to set operation type gegl:text, using a passthrough op instead
Beignet: "unable to find good values for local_work_size[i], please provide local_work_size[] explicitly, you can find good values with trial-and-error method."
/home/pxl/lab/gegl/build/tools/gegl-imgcmp /home/pxl/lab/gegl/tests/compositions/reference/pixelize.png /home/pxl/lab/gegl/build/tests/compositions/output/opencl-pixelize.png

(lt-gegl-imgcmp:15101): GEGL-../../../gegl/graph/gegl-node.c-WARNING **: Failed to set operation type gegl:text, using a passthrough op instead

(lt-gegl-imgcmp:15101): GEGL-../../../gegl/graph/gegl-node.c-WARNING **: Failed to set operation type gegl:text, using a passthrough op instead
/home/pxl/lab/gegl/tests/compositions/reference/pixelize.png and /home/pxl/lab/gegl/build/tests/compositions/output/opencl-pixelize.png differ
  wrong pixels   : 13068/31680 (41.25%)
  max ?e         : 0.108
  avg ?e (wrong) : 0.039(wrong) 0.016(total)
because the error is smaller than 1.50 we'll say /home/pxl/lab/gegl/tests/compositions/reference/pixelize.png and /home/pxl/lab/gegl/build/tests/compositions/output/opencl-pixelize.png are identical
PASS pixelize.xml (OpenCL)
=== Test Results ===
 tests passed:  2
 tests skipped: 0
 tests failed:  0
======  PASS  ======
Comment 6 Jan Vesely 2016-08-08 12:47:03 UTC
Hi,

thank you for looking into this. I can confirm that using:
  Platform Name                                   Intel Gen OCL Driver
Number of devices                                 1
  Device Name                                     Intel(R) HD Graphics Skylake Halo GT2
  Device Vendor                                   Intel
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 beignet 1.2 (git-77d78d5)

with llvm-3.7.1, I can no longer reproduce the issue. I'd assume that the llvm upgrade fixed it.

git-77d78d5 is one local patch on top of 
dff184f0f (my local patch allows NULL device param for cl_get_kernel_workgroup_info)

thank you,
Jan


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.