Bug 89493

Summary:

[SKL] sporadic "missed interrupts"

Product:

DRI

Reporter:

lu hua <huax.lu>

Component:

DRM/Intel

Assignee:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Status:

CLOSED FIXED

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

normal

Priority:

medium

CC:

christophe.prigent, intel-gfx-bugs

Version:

unspecified

Hardware:

All

OS:

Linux (All)

Whiteboard:

i915 platform:

SKL

i915 features:

GEM/Other

Attachments:

Description	Flags
dmesg	none
output	none

Description lu hua 2015-03-09 06:07:04 UTC

Created attachment 114141 [details]
dmesg

==System Environment==
--------------------------
Regression: not sure, it always has gpu hang issue on SKL, reported bug 89037

no-working platforms: SKL

==kernel==
--------------------------
drm-intel-nightly/c2f6e584b215dd0d7e5a8a02716a89e985366ec0
commit c2f6e584b215dd0d7e5a8a02716a89e985366ec0
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Mar 6 18:30:27 2015 +0100

    drm-intel-nightly: 2015y-03m-06d-17h-29m-53s UTC integration manifest

==Bug detailed description==
-----------------------------
Run this case 20 cycles, it causes GPU hang. 

root@x-skly05:/home/lh# dmesg -r|egrep "<[1-4]>"|grep drm
<3>[   46.724398] [drm:i915_hangcheck_elapsed [i915]] *ERROR* Hangcheck timer elapsed... blitter ring idle
<3>[   48.589274] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

Reproduce steps:
-------------------------
1. xinit
2. bin/copyteximage 1D_ARRAY -samples=2 -auto

Comment 1 lu hua 2015-03-09 06:07:44 UTC

Created attachment 114142 [details]
output

no error state collected in /sys/kernel/debug/dri/0/i915_error_state

Comment 2 Damien Lespiau 2015-03-11 13:16:24 UTC

Neil reports that hangcheck is triggering "from time to time", not just on this piglit test. No better characterization yet.

Comment 3 Mika Kuoppala 2015-05-06 09:28:14 UTC

This hangcheck complaint is due to race between request->list addition and hangcheck inspecting that list. 

This is not real hang and and could be not even a missed interrupt. Very likely that hangcheck seeing ring being idle even if its not (due to that race).

Comment 4 Jani Nikula 2016-01-18 13:11:12 UTC

Please try kernel v4.4.

Comment 5 Chris Wilson 2016-01-18 16:55:49 UTC

Known to be still broken in -nightly.

Comment 6 Chris Wilson 2016-02-26 14:04:36 UTC

commit 7c17d377374ddbcfb7873366559fc4ed8b296e11
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jan 20 15:43:35 2016 +0200

    drm/i915: Use ordered seqno write interrupt generation on gen8+ execlists
    
    Broadwell and later currently use the same unordered command sequence to
    update the seqno in the HWS status page and then assert the user
    interrupt. We should apply the w/a from legacy (where we do an mmio
    read to delay the seqno read after the interrupt), but this is not
    enough to enforce coherent seqno visibilty on Skylake. Rather than
    search for the proper post-interrupt seqno barrier, use a strongly
    ordered command sequence to write the seqno, then assert the user
    interrupt from the ring.
    
    v2: Move around the wa tail dwords to avoid adding duplicate code.
    
    v3: Add references, comments on workarounds and bit5 check.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=93693
    Testcase: igt/gem_ring_sync_loop #skl
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/1453297415-17793-1-git-send-email-mika.kuoppala@intel.com

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.