Summary: | [IVB bisected] igt/gem_storedw_batches_loop/normal causes [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... blitter ring idle | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | lu hua <huax.lu> | ||||
Component: | DRM/Intel | Assignee: | Chris Wilson <chris> | ||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | major | ||||||
Priority: | high | CC: | intel-gfx-bugs | ||||
Version: | unspecified | ||||||
Hardware: | All | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Can you please bisect where this regression has been introduced? Also please attach the error state. Please retest with latest igt. It still happens on latest igt. no error state collected in debug/dri/0/i915_error_state. Still waiting for the bisect. Also the output you've pasted indicates that the test worked, and now that we don't capture an error state any more I'm confused where the problem is. Please clarify. Bisect shows:094f9a54e35500739da185cdb78f2e92fc379458 is the first bad commit. commit 094f9a54e35500739da185cdb78f2e92fc379458 Author: Chris Wilson <chris@chris-wilson.co.uk> AuthorDate: Wed Sep 25 17:34:55 2013 +0100 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Thu Oct 3 20:01:30 2013 +0200 drm/i915: Fix __wait_seqno to use true infinite timeouts When we switched to always using a timeout in conjunction with wait_seqno, we lost the ability to detect missed interrupts. Since, we have had issues with interrupts on a number of generations, and they are required to be delivered in a timely fashion for a smooth UX, it is important that we do log errors found in the wild and prevent the display stalling for upwards of 1s every time the seqno interrupt is missed. Rather than continue to fix up the timeouts to work around the interface impedence in wait_event_*(), open code the combination of wait_event[_interruptible][_timeout], and use the exposed timer to poll for seqno should we detect a lost interrupt. v2: In order to satisfy the debug requirement of logging missed interrupts with the real world requirments of making machines work even if interrupts are hosed, we revert to polling after detecting a missed interrupt. v3: Throw in a debugfs interface to simulate broken hw not reporting interrupts. I'm still confused how the test actually fails, please explain. If there's a gpu hang also please attach the error state. Your bisect is the messenger, not the cause. (In reply to comment #7) > I'm still confused how the test actually fails, please explain. > > If there's a gpu hang also please attach the error state. It's a missed interrupt, not exactly a GPU hang. We don't dump the error state as hangcheck finds the GPU idle rather than stuck. It works well on latest -nightly kernel. Close it. Verified.Fixed. Closing old verified. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 89125 [details] dmesg System Environment: -------------------------- Platform: Ivybridge Kernel: (drm-intel-nightly)8e88bd3a304ff70d23c0586be7531e24a56f3931 Bug detailed description: ------------------------- Run ./gem_storedw_batches_loop --run-subtest normal, It causes <3>[ 28.713648] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... blitter ring idle. It happens on ivybridge with -nightly kernel and -queued kernel. It works well on -fixes kernel and debug kernel. The latest known good commit:da66146425c3136943452988afd3d64cd551da58 The latest known bad commit: a94b013b91de055572183c6772865123fa955027 output: running storedw loop with stall every 1 batch completed 524288 writes successfully running storedw loop with stall every 2 batch completed 524288 writes successfully running storedw loop with stall every 3 batch completed 524288 writes successfully running storedw loop with stall every 5 batch completed 524288 writes successfully Subtest normal: SUCCESS Reproduce steps: ------------------------- 1. ./gem_storedw_batches_loop --run-subtest normal