Summary: | [BSW]OpenCL/utests hang sporadically | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | meng <mengmeng.meng> | ||||
Component: | DRM/Intel | Assignee: | meng <mengmeng.meng> | ||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | critical | ||||||
Priority: | high | CC: | bingbingx.zhu, intel-gfx-bugs, rong.r.yang | ||||
Version: | DRI git | ||||||
Hardware: | All | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | |||||||
i915 platform: | BSW/CHT | i915 features: | GEM/Other | ||||
Attachments: |
|
This blocks our OpenCL testing. The issue is case hang. "utests/utest_run" could reproduce the issue. Note,the issue couldn't be reproduced if running one by one (utests/utest_run -c "subcase"). So no GPU hang? Does the problem happen with i915.enable_execlists=0 too? (In reply to Ville Syrjala from comment #3) > So no GPU hang? > > Does the problem happen with i915.enable_execlists=0 too? With i915.execlist=0, the issue still exists. For OpenCL testing, we need to disable i915 hang check because OCL kernel may cost 6 seconds or even more. (In reply to meng from comment #4) When the case hang, gdb attach that, then it could finish. So it's not GPU hang. (In reply to meng from comment #4) > (In reply to Ville Syrjala from comment #3) > > So no GPU hang? > > > > Does the problem happen with i915.enable_execlists=0 too? > > With i915.execlist=0, the issue still exists. > For OpenCL testing, we need to disable i915 hang check because OCL kernel > may cost 6 seconds or even more. 6 seconds of monopolizing the GPU sounds like a DoS worthy of being banned ;-) So not even the grace period given to looping kernels is enough to prevent hangcheck firing? I would strongly suggest you fired a bug with the bare minimum required to reproduce (that is an igt). (In reply to meng from comment #5) > (In reply to meng from comment #4) > When the case hang, gdb attach that, then it could finish. So it's not GPU > hang. No, that would be a "missed interrupt" which is normally detected by hangcheck. Bug scrub: Assigned to Jani Assigned to Mengmeng Hi Mengmeng, Is it still reproduced? Timeout, closing. Please reopen if the problem persists on latest kernels. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 116682 [details] dmesg ==Regression== -------------------------- Regression: No. Ubuntu: 14.04 ==kernel== -------------------------- drm-intel-next-queued: git-8c6cda ==Test cases== Beignet: git://anongit.freedesktop.org/git/beignet (master git-e64445f) ==Bug detailed description== ----------------------------- OpenCL/utests may hang on BSW sporadically(~20%). And the fail tests are not specific. The issue doesn't exist on other platforms(IVB/HSW/BDW). Please see the attached dmesg. (gdb) bt ================== #0 0x00007f6fb5bb1337 in ioctl () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f6fb4cd6e74 in drmIoctl (fd=6, request=request@entry=1074553951, arg=arg@entry=0x7ffdca8d7840) at xf86drm.c:164 #2 0x00007f6fb4ee68f7 in drm_intel_gem_bo_map (bo=0x1505f50, write_enable=1) at intel_bufmgr_gem.c:1325 #3 0x00007f6fb5880446 in cl_mem_map (mem=0x14dc4a0, write=write@entry=1) at /home/OpenCL/beignet/src/cl_mem.c:1908 #4 0x00007f6fb586f223 in clMapBufferIntel (mem=<optimized out>, errcode_ret=0x7ffdca8d790c) at /home/OpenCL/beignet/src/cl_api.c:3215 #5 0x00007f6fb65af04e in test_copy_buf (sz=1024, cb=512, dst_off=0, src_off=<optimized out>) at /home/OpenCL/beignet/utests/enqueue_copy_buf.cpp:24 #6 enqueue_copy_buf () at /home/OpenCL/beignet/utests/enqueue_copy_buf.cpp:61 #7 0x00007f6fb65af4bd in __ANON__enqueue_copy_buf__ () at /home/OpenCL/beignet/utests/enqueue_copy_buf.cpp:66 #8 0x00007f6fb63b92df in UTest::runAllNoIssue () at /home/OpenCL/beignet/utests/utest.cpp:169 #9 0x0000000000401786 in main (argc=1, argv=0x7ffdca8d8308) at /home/OpenCL/beignet/utests/utest_run.cpp:104 ==Reproduce steps== ---------------------------- 1. utests/utest_run