Bug 70747

Summary: igt/ZZ_missed_irq causes [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
handle fake missed interrupts as a simulated hang, too
none
patch v2
none
dmesg with patch v2 none

Description lu hua 2013-10-22 05:59:51 UTC
Created attachment 87966 [details]
dmesg

System Environment:
--------------------------
Platform:  PNV/ILK/SNB/IVB/HSW
Kernel:	(drm-intel-nightly)d1b2b826f0969182f055d11c991f90fdc6a4924a

Bug detailed description:
---------------------------
It causes [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle on PNV/ILK/SNB/IVB/HSW with -nightly and -queued kernel. -fixes kernel doesn't support interrupt masking. It's a new case.

output:
.
Interrupts masked
Interrupts unmasked
Cleared missed interrupts

no error state collected in debug/dri/0/i915_error_state.

Reproduce steps:
----------------------------
1. ./ZZ_missed_irq
Comment 1 Daniel Vetter 2013-10-22 08:17:47 UTC
This is expected, this testcase exercises a special case in our hangcheck code. I guess we need to add a "simulated gpu hang" notice like for ZZ_hangman and friends ...
Comment 2 Daniel Vetter 2013-10-22 08:44:31 UTC
Created attachment 87973 [details] [review]
handle fake missed interrupts as a simulated hang, too

With this patch we should properly mark the simulated hangs caused by ZZ_missed_irqs as such with the usual "Simulated gpu hang, resetting stop_rings" dmesg output. Please test.
Comment 3 Chris Wilson 2013-10-22 08:58:43 UTC
We don't call reset, this error is from hangcheck itself. Part of this test is that we do indeed emit an *ERROR* for a missed irq. It would be easier if we can test the test runners about expected *ERROR*.
Comment 4 lu hua 2013-10-25 05:04:24 UTC
(In reply to comment #2)
> Created attachment 87973 [details] [review] [review]
> handle fake missed interrupts as a simulated hang, too
> 
> With this patch we should properly mark the simulated hangs caused by
> ZZ_missed_irqs as such with the usual "Simulated gpu hang, resetting
> stop_rings" dmesg output. Please test.

Test this patch, "Simulated gpu hang, resetting stop_rings" doesn't appear in dmesg.

dmesg:
[   66.702045] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... bsd ring idle
[   70.709623] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle
Comment 5 Daniel Vetter 2013-10-27 18:48:08 UTC
Created attachment 88192 [details] [review]
patch v2

This one here should work better at quieting the ERROR in dmesg for faked missed interrupts. Please test.
Comment 6 lu hua 2013-10-28 08:03:53 UTC
(In reply to comment #5)
> Created attachment 88192 [details] [review] [review]
> patch v2
> 
> This one here should work better at quieting the ERROR in dmesg for faked
> missed interrupts. Please test.

Test this patch, The "ERROR" goes away.
Comment 7 lu hua 2013-10-28 08:04:19 UTC
Created attachment 88209 [details]
dmesg with patch v2
Comment 8 Daniel Vetter 2013-10-30 09:36:28 UTC
commit f4adcd247766e5b914f861ed143ff328f869bf80
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Oct 28 09:24:13 2013 +0100

    drm/i915: handle faked missed interrupts as simulated hangs, too
Comment 9 lu hua 2013-10-31 07:26:05 UTC
Verified.Fixed.
Comment 10 Elizabeth 2017-10-06 14:42:29 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.