Bug 91525 - ASSERTION FAILED in backend/src/backend/gen_context.cpp, line 438
Summary: ASSERTION FAILED in backend/src/backend/gen_context.cpp, line 438
Status: RESOLVED MOVED
Alias: None
Product: Beignet
Classification: Unclassified
Component: Beignet (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Luo Xionghu
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-01 17:59 UTC by Frank Dittrich
Modified: 2018-10-12 21:23 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
contents of /sys/class/drm/card0/error (3.01 MB, text/plain)
2015-11-19 21:36 UTC, Frank Dittrich
Details
contents of /sys/class/drm/card0/error with latest commits and newer linux kernel (3.01 MB, text/plain)
2015-12-23 00:36 UTC, Frank Dittrich
Details

Description Frank Dittrich 2015-08-01 17:59:30 UTC
This is with beignet's latest git commit, commit 228775e829ce996e4be7856de821c3540af1b24d, on a Fedora 22 system, but with a vanilla kernel:
$ uname -a
Linux f22b.localdomain 4.2.0-0.rc4.git4.1.vanilla.mainline.knurd.1.fc22.x86_64 #1 SMP Sat Aug 1 06:31:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

CPU is Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz

When I build the latest John the Ripper version https://github.com/magnumripper/JohnTheRipper/commit/8ebe17a69745dd3f6735c7d1f65884a40c98162e or any other recent commit of the bleeding-jumbo branch


(bleeding-jumbo)src $ make -s distclean; ./configure && make -s clean && make -s -j 16

and then test the office2013-opencl format, I get

(bleeding-jumbo)src $ ../run/john --test=0 --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ../run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
Trace/breakpoint trap

When I added a printf statement immediatelyx before the ASSERT, I saw that src.type was 8.


Just in case it helps, here's the backtrace from gdb:

(gdb) bt
#0  gbe::onFailedAssertion (msg=<optimized out>, file=<optimized out>, fn=<optimized out>, line=<optimized out>) at /home/fd/git/beignet/backend/src/sys/assert.cpp:76
#1  0x00007fffedae591f in gbe::GenContext::emitUnaryWithTempInstruction (this=0x1ae0700, insn=...) at /home/fd/git/beignet/backend/src/backend/gen_context.cpp:438
#2  0x00007fffedafd7be in gbe::GenContext::emitInstructionStream (this=this@entry=0x1ae0700) at /home/fd/git/beignet/backend/src/./backend/gen_insn_selection.hxx:81
#3  0x00007fffedaff2a5 in gbe::GenContext::emitCode (this=0x1ae0700) at /home/fd/git/beignet/backend/src/backend/gen_context.cpp:2288
#4  0x00007fffeda02caf in gbe::Context::compileKernel (this=this@entry=0x1ae0700) at /home/fd/git/beignet/backend/src/backend/context.cpp:365
#5  0x00007fffedb073c7 in gbe::GenProgram::compileKernel (this=<optimized out>, unit=..., name="Generate2013key", relaxMath=<optimized out>)
    at /home/fd/git/beignet/backend/src/backend/gen_program.cpp:184
#6  0x00007fffeda071b7 in gbe::Program::buildFromUnit (this=this@entry=0x16c67c0, unit=..., error="") at /home/fd/git/beignet/backend/src/backend/program.cpp:160
#7  0x00007fffeda07730 in gbe::Program::buildFromLLVMFile (this=this@entry=0x16c67c0, fileName=fileName@entry=0x0, module=module@entry=0x13b9bd0, error="", optLevel=optLevel@entry=1)
    at /home/fd/git/beignet/backend/src/backend/program.cpp:144
#8  0x00007fffedb07a5a in gbe::genProgramNewFromLLVM (deviceID=1042, fileName=0x0, module=0x13b9bd0, llvm_ctx=<optimized out>, stringSize=1000, err=0x16fc550 "", errSize=0x13b5320, 
    optLevel=1) at /home/fd/git/beignet/backend/src/backend/gen_program.cpp:365
#9  0x00007fffeda1c546 in gbe::programNewFromSource (deviceID=1042, source=<optimized out>, stringSize=1000, 
    options=0xf5b860 <include> "-I /home/fd/git/JtR/run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", err=0x16fc550 "", errSize=0x13b5320) at /home/fd/git/beignet/backend/src/backend/program.cpp:808
#10 0x00007ffff23362ff in cl_program_build (p=p@entry=0x13b5290, options=<optimized out>) at /home/fd/git/beignet/src/cl_program.c:535
#11 0x00007ffff232e126 in clBuildProgram (program=0x13b5290, num_devices=<optimized out>, device_list=<optimized out>, options=<optimized out>, pfn_notify=0x0, user_data=0x0)
    at /home/fd/git/beignet/src/cl_api.c:946
#12 0x00007ffff70dcafb in clBuildProgram () from /lib64/libOpenCL.so.1
#13 0x00000000006c9d46 in opencl_build (sequential_id=sequential_id@entry=0, opts=opts@entry=0x7fffffffb2c0 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", save=save@entry=0, 
    file_name=file_name@entry=0x0) at common-opencl.c:955
#14 0x00000000006ca1d1 in opencl_build_kernel_opt (opts=0x7fffffffb2c0 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", sequential_id=0, 
    kernel_filename=0x789ac8 "$JOHN/kernels/office2013_kernel.cl") at common-opencl.c:1871
#15 opencl_build_kernel (kernel_filename=kernel_filename@entry=0x789ac8 "$JOHN/kernels/office2013_kernel.cl", sequential_id=0, 
    opts=opts@entry=0x7fffffffb2c0 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", warn=warn@entry=0) at common-opencl.c:1887
#16 0x00000000006ca5e2 in opencl_init (kernel_filename=kernel_filename@entry=0x789ac8 "$JOHN/kernels/office2013_kernel.cl", sequential_id=<optimized out>, 
    opts=opts@entry=0x7fffffffb2c0 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2") at common-opencl.c:1966
#17 0x00000000005d66b1 in reset (db=<optimized out>) at opencl_office2013_fmt_plug.c:300
#18 0x000000000067d9cb in fmt_self_test_body (db=0x0, salt_copy=0x13b5b51, binary_copy=0x1388951, format=0xa82c00 <fmt_opencl_office2013>) at formats.c:295
#19 fmt_self_test (format=format@entry=0xa82c00 <fmt_opencl_office2013>, db=db@entry=0x0) at formats.c:719
#20 0x0000000000672709 in benchmark_format (format=0xa82c00 <fmt_opencl_office2013>, salts=256, results=0x7fffffffd5d0) at bench.c:235
#21 0x000000000067359d in benchmark_all () at bench.c:652
#22 0x0000000000685d72 in john_run () at john.c:1368
#23 0x0000000000686846 in main (argc=3, argv=0x7fffffffe088) at john.c:1741
Comment 1 Luo Xionghu 2015-08-18 08:00:34 UTC
fix in this patchset, please have atry:
http://lists.freedesktop.org/archives/beignet/2015-August/006021.html
Comment 2 Frank Dittrich 2015-08-21 13:39:52 UTC
Unfortunately, I still get the same error.
Just the source code line changed from 438 to 443:

$ ../run/john --test=0 --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ../run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 443
Trace/breakpoint trap (core dumped)

(gdb) bt
#0  gbe::onFailedAssertion (msg=<optimized out>, file=<optimized out>, fn=<optimized out>, line=<optimized out>) at /home/fd/git/beignet/backend/src/sys/assert.cpp:76
#1  0x00007fffedae3e3f in gbe::GenContext::emitUnaryWithTempInstruction (this=0x5183350, insn=...) at /home/fd/git/beignet/backend/src/backend/gen_context.cpp:443
#2  0x00007fffedafbfde in gbe::GenContext::emitInstructionStream (this=this@entry=0x5183350) at /home/fd/git/beignet/backend/src/./backend/gen_insn_selection.hxx:81
#3  0x00007fffedafda6a in gbe::GenContext::emitCode (this=0x5183350) at /home/fd/git/beignet/backend/src/backend/gen_context.cpp:2285
#4  0x00007fffeda005ef in gbe::Context::compileKernel (this=this@entry=0x5183350) at /home/fd/git/beignet/backend/src/backend/context.cpp:365
#5  0x00007fffedb05583 in gbe::GenProgram::compileKernel (this=<optimized out>, unit=..., name="Generate2013key", relaxMath=<optimized out>)
    at /home/fd/git/beignet/backend/src/backend/gen_program.cpp:185
#6  0x00007fffeda051a7 in gbe::Program::buildFromUnit (this=this@entry=0x51c6a20, unit=..., error="") at /home/fd/git/beignet/backend/src/backend/program.cpp:160
#7  0x00007fffeda05720 in gbe::Program::buildFromLLVMFile (this=this@entry=0x51c6a20, fileName=fileName@entry=0x0, module=module@entry=0x5230730, error="", optLevel=optLevel@entry=1)
    at /home/fd/git/beignet/backend/src/backend/program.cpp:144
#8  0x00007fffedb05ca1 in gbe::genProgramNewFromLLVM (deviceID=1042, fileName=0x0, module=0x5230730, llvm_ctx=0x51c6a90, asm_file_name=<optimized out>, stringSize=1000, 
    err=0x52a23a0 "", errSize=0x53d10d0, optLevel=1) at /home/fd/git/beignet/backend/src/backend/gen_program.cpp:367
#9  0x00007fffeda1aa86 in gbe::programNewFromSource (deviceID=1042, source=<optimized out>, stringSize=1000, 
    options=0xf6e4a0 <include> "-I /home/fd/git/JtR/run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", err=0x52a23a0 "", errSize=0x53d10d0) at /home/fd/git/beignet/backend/src/backend/program.cpp:853
#10 0x00007ffff233632f in cl_program_build (p=p@entry=0x53d1040, options=<optimized out>) at /home/fd/git/beignet/src/cl_program.c:535
#11 0x00007ffff232e126 in clBuildProgram (program=0x53d1040, num_devices=<optimized out>, device_list=<optimized out>, options=<optimized out>, pfn_notify=0x0, user_data=0x0)
    at /home/fd/git/beignet/src/cl_api.c:946
#12 0x00007ffff70dcafb in clBuildProgram () from /lib64/libOpenCL.so.1
#13 0x00000000006d1086 in opencl_build (sequential_id=sequential_id@entry=0, opts=opts@entry=0x7fffffffb220 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", save=save@entry=0, 
    file_name=file_name@entry=0x0) at common-opencl.c:959
#14 0x00000000006d1511 in opencl_build_kernel_opt (opts=0x7fffffffb220 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", sequential_id=0, 
    kernel_filename=0x7959c8 "$JOHN/kernels/office2013_kernel.cl") at common-opencl.c:1875
#15 opencl_build_kernel (kernel_filename=kernel_filename@entry=0x7959c8 "$JOHN/kernels/office2013_kernel.cl", sequential_id=0, 
    opts=opts@entry=0x7fffffffb220 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", warn=warn@entry=0) at common-opencl.c:1891
#16 0x00000000006d1922 in opencl_init (kernel_filename=kernel_filename@entry=0x7959c8 "$JOHN/kernels/office2013_kernel.cl", sequential_id=<optimized out>, 
    opts=opts@entry=0x7fffffffb220 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2") at common-opencl.c:1970
#17 0x00000000005d9701 in reset (db=<optimized out>) at opencl_office2013_fmt_plug.c:300
#18 0x0000000000683bfb in fmt_self_test_body (db=0x0, salt_copy=0x51c69b1, binary_copy=0x4e79251, format=0xa92fa0 <fmt_opencl_office2013>) at formats.c:392
#19 fmt_self_test (format=format@entry=0xa92fa0 <fmt_opencl_office2013>, db=db@entry=0x0) at formats.c:1320
#20 0x0000000000678719 in benchmark_format (format=0xa92fa0 <fmt_opencl_office2013>, salts=256, results=0x7fffffffd530) at bench.c:237
#21 0x00000000006795cd in benchmark_all () at bench.c:658
#22 0x000000000068c9d3 in john_run () at john.c:1375
#23 0x000000000068d4a7 in main (argc=4, argv=0x7fffffffdfe8) at john.c:1749
Comment 3 Luo Xionghu 2015-08-24 00:28:02 UTC
seems your code is not syncronized,  the latest code of 443 is not a assert. could you please provide your commit id? thanks.

http://cgit.freedesktop.org/beignet/tree/backend/src/backend/gen_context.cpp?id=5428b40a0df7bee14975f67423b8912a0b5c5537
Comment 4 Frank Dittrich 2015-08-24 08:31:15 UTC
I was testing commit 7b151ad6c47ba169b0971e9660023cff2a6de10f, which was the latest commit in the master branch at time of testing, and gen_context.cpp line 443 has GBE_ASSERT(0);


In the commit 5428b40a0df7bee14975f67423b8912a0b5c5537 you mentioned and and in commit 228775e829ce996e4be7856de821c3540af1b24d (the one I tested when reporting the bug) that GBE_ASSERT(0) is in line 438.

Unfortunately, I cannot test commit 5428b40a0df7bee14975f67423b8912a0b5c5537 right now.
Please let me know which commit I should test, and I'll do it when I find the time.
Comment 5 blaffablaffa 2015-09-08 12:39:34 UTC
I'm getting this on beignet 1.1.0 libdrm 2.61 kernel 4.1.6 fedora 22 x86_64 for basically any simple OpenCL operation (clinfo spams it 5 times).
Comment 6 Luo Xionghu 2015-09-22 08:45:13 UTC
please try the master again. 
the patch is upstream now.
http://cgit.freedesktop.org/beignet/commit/?id=18a52ffc966027a3004b85c7c03c9416e1a84c3a
Comment 7 Frank Dittrich 2015-09-22 21:11:51 UTC
I retried with latest master.
The ASSERTION FAILED problem is gone, but now I get:

$ ../run/john --test=0 --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ../run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2
Local worksize (LWS) 7, global worksize (GWS) 49
drm_intel_gem_bo_context_exec() failed: Input/output error
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel
$ echo $?
1

Is this 


drm_intel_gem_bo_context_exec() failed: Input/output error
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel


a beignet problem or a driver problem? If it is Linux kernel related, what kernel version should fix it?


Currently I still run 4.2.0-1.vanilla.mainline.knurd.1.fc22.x86_64, but I intend to upgrade to 4.2.0-0.rc7.git4.1.vanilla.mainline.knurd.1.fc22, but I can't reboot right now.
Comment 8 Frank Dittrich 2015-09-24 11:59:35 UTC
I just tested a newer kernel:

$ uname -a
Linux f22b.localdomain 4.3.0-0.rc2.git0.1.vanilla.mainline.knurd.1.fc22.x86_64 #1 SMP Tue Sep 22 06:13:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The error is still there:

$ ../run/john --test=0 --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ../run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2
Local worksize (LWS) 7, global worksize (GWS) 49
drm_intel_gem_bo_context_exec() failed: Input/output error
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel


BTW: Do you happen to know when these annoying "Failed to release test userptr object!" messages will be addressed by a driver update?
Comment 9 Luo Xionghu 2015-09-25 01:23:35 UTC
this is a GPU hang caused by kernel, you could dmesg to see it, can you run it pass on other opencl packages?
we will also continue investigate the root cause.
Comment 10 Luo Xionghu 2015-09-25 06:43:09 UTC
this kernel will cost longer than several seconds, caused time out detection and recovery, you can try echo -n 0 > /sys/module/i915/parameters/enable_hangcheck to disable the hang check and try again.
Comment 11 Frank Dittrich 2015-11-19 21:36:17 UTC
Created attachment 119954 [details]
contents of /sys/class/drm/card0/error
Comment 12 Frank Dittrich 2015-11-19 22:35:03 UTC
Sorry, I didn't have time earlier to check this.

I repeated the test with commit beignet 4c96950745270afc29b1981ec451bd800c173d2a, linux kernel 4.3.0-1.vanilla.mainline.knurd.1.fc22.x86_64 and John the Ripper commit https://github.com/magnumripper/JohnTheRipper/commit/c25192c23138db92b903c1b14bcf8a76963876ec (bleeding-jumbo branch)

While the "Failed to release test userptr object! (9) i915 kernel driver may not be sane!" messages are gone now (I think due to a linux kernel update to 4.3), the GPU hang still occurs:

Even reducing global and local work size doesn't help.

$ GWS=1 LWS=1 ./john --test=0 --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ../run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Local worksize (LWS) 1, global worksize (GWS) 1
drm_intel_gem_bo_context_exec() failed: Input/output error
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel

This is from dmesg output:

[  149.302616] [drm] stuck on render ring
[  149.303861] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2110], reason: Ring hung, action: reset
[  149.303865] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  149.303867] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  149.303870] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  149.303872] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  149.303874] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  149.306033] drm/i915: Resetting chip after gpu hang
[  155.301738] [drm] stuck on render ring
[  155.302955] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2110], reason: Ring hung, action: reset
[  155.305137] drm/i915: Resetting chip after gpu hang

I attached the contents of /sys/class/drm/card0/error.

Repeated tests of that formats just result in these dmesg lines:
[  520.187119] [drm] stuck on render ring
[  520.188347] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2239], reason: Ring hung, action: reset
[  520.190142] drm/i915: Resetting chip after gpu hang
[  526.179121] [drm] stuck on render ring
[  526.180357] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2239], reason: Ring hung, action: reset
[  526.182550] drm/i915: Resetting chip after gpu hang
[ 3991.026791] [drm] stuck on render ring
[ 3991.027969] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2733], reason: Ring hung, action: reset
[ 3991.030133] drm/i915: Resetting chip after gpu hang
[ 3997.024799] [drm] stuck on render ring
[ 3997.026080] [drm] GPU HANG: ecode 7:0:0xf3cffffe, in john [2733], reason: Ring hung, action: reset
[ 3997.027766] drm/i915: Resetting chip after gpu hang

After trying your suggestion
echo -n 0 > /sys/module/i915/parameters/enable_hangcheck
I did let ./john --test=0 --format=office2013-opencl --verbosity=5
run for half an hour before I used the power button to reboot the machine.

Since the kernels usually run just 200 ms on other GPUs, I doubt that it would run more than half an hour.

What other opencl packages supporting the Haswell do you have in mind?
I think I would be able to do some more tests in the near future.
Comment 13 Frank Dittrich 2015-12-23 00:12:12 UTC
With latest John the Ripper (bleeding-jumbo) commit 8d4470ff9f2357fc10c8e5769dbb164eb5118f40 and latest beignet commit 032b606f8c5baa53e52b1f55c4f7c0bafdd6ff37 as well as a newer Linux kernel (4.4.0-0.rc6.git0.1.vanilla.knurd.1.fc22.x86_64), I now get

$ GWS=1 LWS=1 ./john --test=0 --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Local worksize (LWS) 1, global worksize (GWS) 1
FAILED (cmp_all(1))


So, these messages

drm_intel_gem_bo_context_exec() failed: Input/output error
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel

are gone.

Instead, I get

FAILED (cmp_all(1))

That output is from John the Ripper.
So it looks like this bug has been fixed (but there's an unrelated John the Ripper bug.

If you want to know which beignet commit made the problem go away, I can try to git bisect. Otherwise this bug can be closed.
Comment 14 Frank Dittrich 2015-12-23 00:36:29 UTC
Created attachment 120657 [details]
contents of /sys/class/drm/card0/error with latest commits and newer linux kernel
Comment 15 Frank Dittrich 2015-12-23 00:38:14 UTC
Unfortunately, with repeated runs of the same command, I now got

GWS=1 LWS=1 ./john --test=0 --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Local worksize (LWS) 1, global worksize (GWS) 1
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel


And dmesg is showing
[ 2236.597358] [drm] stuck on render ring
[ 2236.598523] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2894], reason: Ring hung, action: reset
[ 2236.598527] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 2236.598529] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 2236.598531] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 2236.598533] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 2236.598536] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 2236.600685] drm/i915: Resetting chip after gpu hang
[ 2242.597091] [drm] stuck on render ring
[ 2242.597661] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2894], reason: Ring hung, action: reset
[ 2242.599757] drm/i915: Resetting chip after gpu hang


I attached the contents of /sys/class/drm/card0/error.
Comment 16 GitLab Migration User 2018-10-12 21:23:21 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/beignet/beignet/issues/15.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.