Summary: | [BDW Bisected]igt/gem_dummy_reloc_loop/blt causes [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... blitter ring idle | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | lu hua <huax.lu> | ||||
Component: | DRM/Intel | Assignee: | Chris Wilson <chris> | ||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | major | ||||||
Priority: | high | CC: | ben, intel-gfx-bugs | ||||
Version: | unspecified | ||||||
Hardware: | All | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Please bisect. Bisect shows: 8232644ccf099548710843e97360a3fcd6d28e04 is the first bad commit commit 8232644ccf099548710843e97360a3fcd6d28e04 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Mar 5 12:00:39 2014 +0000 drm/i915: Convert the forcewake worker into a timer func We don't want to suffer scheduling delay when turning off the GPU after waking it up to touch registers. Ideally, we only want to keep the GPU awake for the register access sequence, with a single forcewake dance on the first access and release immediately after the last. We set a timer on the first access so that we only dance once and on the next scheduler tick, we drop the forcewake again. This moves the cleanup routine from the common i915 workqueue to a timer func so that we don't anger powertop, and drop the forcewake again quicker. v2: Enable the deferred force_wake_put for regular register reads as well. v3: Beautification and make sure we disable forcewake when shutting down. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Can you please attach the output (trace.dat) of: $ trace-cmd record -e i915 ./gem_dummy_reloc_loop --run-subtest blt (In reply to comment #3) > Can you please attach the output (trace.dat) of: > > $ trace-cmd record -e i915 ./gem_dummy_reloc_loop --run-subtest blt /sys/kernel/debug/tracing/events/i915/filter /sys/kernel/debug/tracing/events/*/i915/filter IGT-Version: 1.6-gcde058a (x86_64) (Linux: 3.14.0-rc7_drm-intel-nightly_2c0d38_2 running dummy loop on render dummy loop run on render completed Subtest render: SUCCESS running dummy loop on bsd dummy loop run on bsd completed Subtest bsd: SUCCESS running dummy loop on blt dummy loop run on blt completed Subtest blt: SUCCESS running dummy loop on vebox Connection to x-bdw02 closed by remote host. Connection to x-bdw02 closed. [root@x-pk2 ~]# ssh x-bdw02 Last login: Thu Dec 12 00:01:05 2013 from luhua.sh.intel.com [root@x-bdw02 ~]# uname -a Linux x-bdw02 3.13.0_drm-intel-next-queued_823264_20140314_debug+ #960 SMP Fri Mar 14 02:31:27 CST 2014 x86_64 x86_64 x86_64 GNU/Linux [root@x-bdw02 ~]# cd /GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests/ [root@x-bdw02 tests]# trace-cmd record -e i915 ./gem_dummy_reloc_loop /sys/kernel/debug/tracing/events/i915/filter /sys/kernel/debug/tracing/events/*/i915/filter IGT-Version: 1.6-gcde058a (x86_64) (Linux: 3.13.0_drm-intel-next-queued_823264_20140314_debug+ x86_64) running dummy loop on render dummy loop run on render completed Subtest render: SUCCESS running dummy loop on bsd dummy loop run on bsd completed Subtest bsd: SUCCESS running dummy loop on blt dummy loop run on blt completed Subtest blt: SUCCESS running dummy loop on vebox dummy loop run on vebox completed Subtest vebox: SUCCESS running dummy loop on random rings dummy loop run on random rings completed Subtest mixed: SUCCESS Kernel buffer statistics: Note: "entries" are the entries left in the kernel ring buffer and are not recorded in the trace data. They should all be zero. CPU: 0 entries: 0 overrun: 0 commit overrun: 0 bytes: 3868 oldest event ts: 522.073751 now ts: 522.154184 dropped events: 0 read events: 68516225 CPU: 1 entries: 0 overrun: 0 commit overrun: 0 bytes: 1800 oldest event ts: 521.657562 now ts: 522.154479 dropped events: 0 read events: 6934134 CPU: 2 entries: 0 overrun: 0 commit overrun: 0 bytes: 304 oldest event ts: 520.351796 now ts: 522.154750 dropped events: 0 read events: 15421150 CPU: 3 entries: 0 overrun: 0 commit overrun: 0 bytes: 780 oldest event ts: 506.852074 now ts: 522.155017 dropped events: 0 read events: 2452128 CPU0 data recorded at offset=0x33a000 1883295744 bytes in size CPU1 data recorded at offset=0x70747000 190619648 bytes in size CPU2 data recorded at offset=0x7bd11000 423923712 bytes in size CPU3 data recorded at offset=0x9515a000 67411968 bytes in size Attach the trace.dat (In reply to comment #5) > Attach the trace.dat The file is too large. I will give you file location via mail. Chris, any update on this issue? Nope. The debug perturbed the system enough to hide the issue. So far it looks like a genuine missed interrupt. I can't reproduce, still an issue? Test it on latest -nightly kernel, It works well. Verified.Fixed. Closing old verified. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 95982 [details] dmesg System Environment: -------------------------- Platform: Broadwell kernel: (drm-intel-nightly)b1859622badb7509586987af5269aa525a0c112f Bug detailed description: ------------------------- run ./gem_dummy_reloc_loop --run-subtest blt, report <3>[ 410.576752] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... blitter ring idle. It happens on Broadwell with -nightly and -queued kernel. igt/gem_dummy_reloc_loop/mixed also has this error. The latest known good commit: b6ae3c7c60161a9b1e15b1ccd6412fad65b7d9cf The latest known bad commit: b2040f6fed736ccd2319768bc59833abe74148b8 output: IGT-Version: 1.6-g10571b8 (x86_64) (Linux: 3.14.0-rc6_drm-intel-nightly_b18596_20140314+ x86_64) running dummy loop on blt dummy loop run on blt completed Subtest blt: SUCCESS Reproduce steps: ---------------------------- 1. ./gem_dummy_reloc_loop --run-subtest blt