Summary: | ASSERTION FAILED in backend/src/backend/gen_context.cpp, line 438 | ||
---|---|---|---|
Product: | Beignet | Reporter: | Frank Dittrich <frank.dittrich> |
Component: | Beignet | Assignee: | Luo Xionghu <xionghu.luo> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | xionghu.luo |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
contents of /sys/class/drm/card0/error
contents of /sys/class/drm/card0/error with latest commits and newer linux kernel |
Description
Frank Dittrich
2015-08-01 17:59:30 UTC
fix in this patchset, please have atry: http://lists.freedesktop.org/archives/beignet/2015-August/006021.html Unfortunately, I still get the same error. Just the source code line changed from 438 to 443: $ ../run/john --test=0 --format=office2013-opencl --verbosity=5 initUnicode(UNICODE, ASCII/ASCII) ASCII -> ASCII -> ASCII Will run 4 OpenMP threads Failed to release test userptr object! (9) i915 kernel driver may not be sane! Failed to release test userptr object! (9) i915 kernel driver may not be sane! Failed to release test userptr object! (9) i915 kernel driver may not be sane! Failed to release test userptr object! (9) i915 kernel driver may not be sane! Device 0: Intel(R) HD Graphics Haswell GT2 Desktop Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ../run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 ASSERTION FAILED: 0 at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 443 Trace/breakpoint trap (core dumped) (gdb) bt #0 gbe::onFailedAssertion (msg=<optimized out>, file=<optimized out>, fn=<optimized out>, line=<optimized out>) at /home/fd/git/beignet/backend/src/sys/assert.cpp:76 #1 0x00007fffedae3e3f in gbe::GenContext::emitUnaryWithTempInstruction (this=0x5183350, insn=...) at /home/fd/git/beignet/backend/src/backend/gen_context.cpp:443 #2 0x00007fffedafbfde in gbe::GenContext::emitInstructionStream (this=this@entry=0x5183350) at /home/fd/git/beignet/backend/src/./backend/gen_insn_selection.hxx:81 #3 0x00007fffedafda6a in gbe::GenContext::emitCode (this=0x5183350) at /home/fd/git/beignet/backend/src/backend/gen_context.cpp:2285 #4 0x00007fffeda005ef in gbe::Context::compileKernel (this=this@entry=0x5183350) at /home/fd/git/beignet/backend/src/backend/context.cpp:365 #5 0x00007fffedb05583 in gbe::GenProgram::compileKernel (this=<optimized out>, unit=..., name="Generate2013key", relaxMath=<optimized out>) at /home/fd/git/beignet/backend/src/backend/gen_program.cpp:185 #6 0x00007fffeda051a7 in gbe::Program::buildFromUnit (this=this@entry=0x51c6a20, unit=..., error="") at /home/fd/git/beignet/backend/src/backend/program.cpp:160 #7 0x00007fffeda05720 in gbe::Program::buildFromLLVMFile (this=this@entry=0x51c6a20, fileName=fileName@entry=0x0, module=module@entry=0x5230730, error="", optLevel=optLevel@entry=1) at /home/fd/git/beignet/backend/src/backend/program.cpp:144 #8 0x00007fffedb05ca1 in gbe::genProgramNewFromLLVM (deviceID=1042, fileName=0x0, module=0x5230730, llvm_ctx=0x51c6a90, asm_file_name=<optimized out>, stringSize=1000, err=0x52a23a0 "", errSize=0x53d10d0, optLevel=1) at /home/fd/git/beignet/backend/src/backend/gen_program.cpp:367 #9 0x00007fffeda1aa86 in gbe::programNewFromSource (deviceID=1042, source=<optimized out>, stringSize=1000, options=0xf6e4a0 <include> "-I /home/fd/git/JtR/run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", err=0x52a23a0 "", errSize=0x53d10d0) at /home/fd/git/beignet/backend/src/backend/program.cpp:853 #10 0x00007ffff233632f in cl_program_build (p=p@entry=0x53d1040, options=<optimized out>) at /home/fd/git/beignet/src/cl_program.c:535 #11 0x00007ffff232e126 in clBuildProgram (program=0x53d1040, num_devices=<optimized out>, device_list=<optimized out>, options=<optimized out>, pfn_notify=0x0, user_data=0x0) at /home/fd/git/beignet/src/cl_api.c:946 #12 0x00007ffff70dcafb in clBuildProgram () from /lib64/libOpenCL.so.1 #13 0x00000000006d1086 in opencl_build (sequential_id=sequential_id@entry=0, opts=opts@entry=0x7fffffffb220 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", save=save@entry=0, file_name=file_name@entry=0x0) at common-opencl.c:959 #14 0x00000000006d1511 in opencl_build_kernel_opt (opts=0x7fffffffb220 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", sequential_id=0, kernel_filename=0x7959c8 "$JOHN/kernels/office2013_kernel.cl") at common-opencl.c:1875 #15 opencl_build_kernel (kernel_filename=kernel_filename@entry=0x7959c8 "$JOHN/kernels/office2013_kernel.cl", sequential_id=0, opts=opts@entry=0x7fffffffb220 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2", warn=warn@entry=0) at common-opencl.c:1891 #16 0x00000000006d1922 in opencl_init (kernel_filename=kernel_filename@entry=0x7959c8 "$JOHN/kernels/office2013_kernel.cl", sequential_id=<optimized out>, opts=opts@entry=0x7fffffffb220 "-DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2") at common-opencl.c:1970 #17 0x00000000005d9701 in reset (db=<optimized out>) at opencl_office2013_fmt_plug.c:300 #18 0x0000000000683bfb in fmt_self_test_body (db=0x0, salt_copy=0x51c69b1, binary_copy=0x4e79251, format=0xa92fa0 <fmt_opencl_office2013>) at formats.c:392 #19 fmt_self_test (format=format@entry=0xa92fa0 <fmt_opencl_office2013>, db=db@entry=0x0) at formats.c:1320 #20 0x0000000000678719 in benchmark_format (format=0xa92fa0 <fmt_opencl_office2013>, salts=256, results=0x7fffffffd530) at bench.c:237 #21 0x00000000006795cd in benchmark_all () at bench.c:658 #22 0x000000000068c9d3 in john_run () at john.c:1375 #23 0x000000000068d4a7 in main (argc=4, argv=0x7fffffffdfe8) at john.c:1749 seems your code is not syncronized, the latest code of 443 is not a assert. could you please provide your commit id? thanks. http://cgit.freedesktop.org/beignet/tree/backend/src/backend/gen_context.cpp?id=5428b40a0df7bee14975f67423b8912a0b5c5537 I was testing commit 7b151ad6c47ba169b0971e9660023cff2a6de10f, which was the latest commit in the master branch at time of testing, and gen_context.cpp line 443 has GBE_ASSERT(0); In the commit 5428b40a0df7bee14975f67423b8912a0b5c5537 you mentioned and and in commit 228775e829ce996e4be7856de821c3540af1b24d (the one I tested when reporting the bug) that GBE_ASSERT(0) is in line 438. Unfortunately, I cannot test commit 5428b40a0df7bee14975f67423b8912a0b5c5537 right now. Please let me know which commit I should test, and I'll do it when I find the time. I'm getting this on beignet 1.1.0 libdrm 2.61 kernel 4.1.6 fedora 22 x86_64 for basically any simple OpenCL operation (clinfo spams it 5 times). please try the master again. the patch is upstream now. http://cgit.freedesktop.org/beignet/commit/?id=18a52ffc966027a3004b85c7c03c9416e1a84c3a I retried with latest master. The ASSERTION FAILED problem is gone, but now I get: $ ../run/john --test=0 --format=office2013-opencl --verbosity=5 initUnicode(UNICODE, ASCII/ASCII) ASCII -> ASCII -> ASCII Will run 4 OpenMP threads Failed to release test userptr object! (9) i915 kernel driver may not be sane! Failed to release test userptr object! (9) i915 kernel driver may not be sane! Failed to release test userptr object! (9) i915 kernel driver may not be sane! Failed to release test userptr object! (9) i915 kernel driver may not be sane! Device 0: Intel(R) HD Graphics Haswell GT2 Desktop Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ../run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 Local worksize (LWS) 7, global worksize (GWS) 49 drm_intel_gem_bo_context_exec() failed: Input/output error drm_intel_gem_bo_context_exec() failed: Input/output error OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel $ echo $? 1 Is this drm_intel_gem_bo_context_exec() failed: Input/output error drm_intel_gem_bo_context_exec() failed: Input/output error OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel a beignet problem or a driver problem? If it is Linux kernel related, what kernel version should fix it? Currently I still run 4.2.0-1.vanilla.mainline.knurd.1.fc22.x86_64, but I intend to upgrade to 4.2.0-0.rc7.git4.1.vanilla.mainline.knurd.1.fc22, but I can't reboot right now. I just tested a newer kernel: $ uname -a Linux f22b.localdomain 4.3.0-0.rc2.git0.1.vanilla.mainline.knurd.1.fc22.x86_64 #1 SMP Tue Sep 22 06:13:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux The error is still there: $ ../run/john --test=0 --format=office2013-opencl --verbosity=5 initUnicode(UNICODE, ASCII/ASCII) ASCII -> ASCII -> ASCII Will run 4 OpenMP threads Failed to release test userptr object! (9) i915 kernel driver may not be sane! Failed to release test userptr object! (9) i915 kernel driver may not be sane! Failed to release test userptr object! (9) i915 kernel driver may not be sane! Failed to release test userptr object! (9) i915 kernel driver may not be sane! Device 0: Intel(R) HD Graphics Haswell GT2 Desktop Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ../run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 Local worksize (LWS) 7, global worksize (GWS) 49 drm_intel_gem_bo_context_exec() failed: Input/output error drm_intel_gem_bo_context_exec() failed: Input/output error OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel BTW: Do you happen to know when these annoying "Failed to release test userptr object!" messages will be addressed by a driver update? this is a GPU hang caused by kernel, you could dmesg to see it, can you run it pass on other opencl packages? we will also continue investigate the root cause. this kernel will cost longer than several seconds, caused time out detection and recovery, you can try echo -n 0 > /sys/module/i915/parameters/enable_hangcheck to disable the hang check and try again. Created attachment 119954 [details]
contents of /sys/class/drm/card0/error
Sorry, I didn't have time earlier to check this. I repeated the test with commit beignet 4c96950745270afc29b1981ec451bd800c173d2a, linux kernel 4.3.0-1.vanilla.mainline.knurd.1.fc22.x86_64 and John the Ripper commit https://github.com/magnumripper/JohnTheRipper/commit/c25192c23138db92b903c1b14bcf8a76963876ec (bleeding-jumbo branch) While the "Failed to release test userptr object! (9) i915 kernel driver may not be sane!" messages are gone now (I think due to a linux kernel update to 4.3), the GPU hang still occurs: Even reducing global and local work size doesn't help. $ GWS=1 LWS=1 ./john --test=0 --format=office2013-opencl --verbosity=5 initUnicode(UNICODE, ASCII/ASCII) ASCII -> ASCII -> ASCII Will run 4 OpenMP threads Device 0: Intel(R) HD Graphics Haswell GT2 Desktop Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ../run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl Local worksize (LWS) 1, global worksize (GWS) 1 drm_intel_gem_bo_context_exec() failed: Input/output error drm_intel_gem_bo_context_exec() failed: Input/output error OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel This is from dmesg output: [ 149.302616] [drm] stuck on render ring [ 149.303861] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2110], reason: Ring hung, action: reset [ 149.303865] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 149.303867] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 149.303870] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 149.303872] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 149.303874] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 149.306033] drm/i915: Resetting chip after gpu hang [ 155.301738] [drm] stuck on render ring [ 155.302955] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2110], reason: Ring hung, action: reset [ 155.305137] drm/i915: Resetting chip after gpu hang I attached the contents of /sys/class/drm/card0/error. Repeated tests of that formats just result in these dmesg lines: [ 520.187119] [drm] stuck on render ring [ 520.188347] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2239], reason: Ring hung, action: reset [ 520.190142] drm/i915: Resetting chip after gpu hang [ 526.179121] [drm] stuck on render ring [ 526.180357] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2239], reason: Ring hung, action: reset [ 526.182550] drm/i915: Resetting chip after gpu hang [ 3991.026791] [drm] stuck on render ring [ 3991.027969] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2733], reason: Ring hung, action: reset [ 3991.030133] drm/i915: Resetting chip after gpu hang [ 3997.024799] [drm] stuck on render ring [ 3997.026080] [drm] GPU HANG: ecode 7:0:0xf3cffffe, in john [2733], reason: Ring hung, action: reset [ 3997.027766] drm/i915: Resetting chip after gpu hang After trying your suggestion echo -n 0 > /sys/module/i915/parameters/enable_hangcheck I did let ./john --test=0 --format=office2013-opencl --verbosity=5 run for half an hour before I used the power button to reboot the machine. Since the kernels usually run just 200 ms on other GPUs, I doubt that it would run more than half an hour. What other opencl packages supporting the Haswell do you have in mind? I think I would be able to do some more tests in the near future. With latest John the Ripper (bleeding-jumbo) commit 8d4470ff9f2357fc10c8e5769dbb164eb5118f40 and latest beignet commit 032b606f8c5baa53e52b1f55c4f7c0bafdd6ff37 as well as a newer Linux kernel (4.4.0-0.rc6.git0.1.vanilla.knurd.1.fc22.x86_64), I now get $ GWS=1 LWS=1 ./john --test=0 --format=office2013-opencl --verbosity=5 initUnicode(UNICODE, ASCII/ASCII) ASCII -> ASCII -> ASCII Device 0: Intel(R) HD Graphics Haswell GT2 Desktop Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... Loaded 5 hashes with 5 different salts to test db from test vectors Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl Local worksize (LWS) 1, global worksize (GWS) 1 FAILED (cmp_all(1)) So, these messages drm_intel_gem_bo_context_exec() failed: Input/output error drm_intel_gem_bo_context_exec() failed: Input/output error OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel are gone. Instead, I get FAILED (cmp_all(1)) That output is from John the Ripper. So it looks like this bug has been fixed (but there's an unrelated John the Ripper bug. If you want to know which beignet commit made the problem go away, I can try to git bisect. Otherwise this bug can be closed. Created attachment 120657 [details]
contents of /sys/class/drm/card0/error with latest commits and newer linux kernel
Unfortunately, with repeated runs of the same command, I now got GWS=1 LWS=1 ./john --test=0 --format=office2013-opencl --verbosity=5 initUnicode(UNICODE, ASCII/ASCII) ASCII -> ASCII -> ASCII Device 0: Intel(R) HD Graphics Haswell GT2 Desktop Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... Loaded 5 hashes with 5 different salts to test db from test vectors Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl Local worksize (LWS) 1, global worksize (GWS) 1 drm_intel_gem_bo_context_exec() failed: Input/output error OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel And dmesg is showing [ 2236.597358] [drm] stuck on render ring [ 2236.598523] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2894], reason: Ring hung, action: reset [ 2236.598527] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 2236.598529] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 2236.598531] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 2236.598533] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 2236.598536] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 2236.600685] drm/i915: Resetting chip after gpu hang [ 2242.597091] [drm] stuck on render ring [ 2242.597661] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2894], reason: Ring hung, action: reset [ 2242.599757] drm/i915: Resetting chip after gpu hang I attached the contents of /sys/class/drm/card0/error. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/beignet/beignet/issues/15. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.