93693 – [BAT SKL BDW] missed interrupt in gem_storedw_loop/basic-render with *ERROR* Hangcheck timer elapsed...

Bug 93693 - [BAT SKL BDW] missed interrupt in gem_storedw_loop/basic-render with *ERROR* Hangcheck timer elapsed...

Summary: [BAT SKL BDW] missed interrupt in gem_storedw_loop/basic-render with *ERROR* ...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	highest normal
Assignee:	Mika Kuoppala
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-01-13 10:07 UTC by Daniel Vetter
Modified:	2017-07-24 22:43 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments
drm/i915: Force ordering on request submission and hangcheck (3.29 KB, patch) 2016-01-13 16:58 UTC, Mika Kuoppala	no flags	Details \| Splinter Review
Show Obsolete (1) View All

Description Daniel Vetter 2016-01-13 10:07:42 UTC

All bdw/skl machines have random gpu hangs when running gem_storedw_loop/basic-render.

Strangely other engines are all fine, and this testcasee only uses CS instructions (so doesn't even load a full render workload).

This is kinda PO-exit criteria of fail, while bdw/skl are PV ready :(

Long-term history to make this clear can be found on the CI server under /archive/results/CI_IGT_test/igt@gem_storedw_loop@basic-render.html

Comment 1 Chris Wilson 2016-01-13 15:25:15 UTC

Wow, it's telling that the render ring is so slow! :)

I can run this in a loop until I get bored (>10minutes) on -nightly and haven't encountered an issue yet. I'd like to see the error state to see if there are any clues there.

Comment 2 Mika Kuoppala 2016-01-13 16:56:32 UTC

I suspect Daniel got confused by the error message. For what I can see, the
gem_store_dwloop triggers the hangcheck timer elapsed, rander ring idle errors.

Comment 3 Mika Kuoppala 2016-01-13 16:58:36 UTC

Created attachment 121002 [details] [review]
drm/i915: Force ordering on request submission and hangcheck

Comment 4 Chris Wilson 2016-01-13 17:34:03 UTC

(In reply to Mika Kuoppala from comment #3)
> Created attachment 121002 [details] [review] [review]
> drm/i915: Force ordering on request submission and hangcheck

You can't move the list manipulation just like that! It's time we eliminated that list_empty() check, but this does nothing to paper over the race.

Comment 5 Daniel Vetter 2016-01-15 18:40:31 UTC

(In reply to Mika Kuoppala from comment #2)
> I suspect Daniel got confused by the error message. For what I can see, the
> gem_store_dwloop triggers the hangcheck timer elapsed, rander ring idle
> errors.

Yeah I screwed up the title, it's "just" that the sw tracking got out of whack with reality, the gpu is actually perfectly fine. After all the testcase does succeed (and it checks that all the CS dw stores did land).

Comment 6 Daniel Vetter 2016-01-20 13:44:40 UTC

Same bug most likely in gem_sync/basic-render.

Comment 7 Chris Wilson 2016-01-24 11:59:08 UTC

commit 7c17d377374ddbcfb7873366559fc4ed8b296e11
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jan 20 15:43:35 2016 +0200

    drm/i915: Use ordered seqno write interrupt generation on gen8+ execlists

Comment 8 Chris Wilson 2016-01-24 12:01:05 UTC

For the record, this only happens for me when I have an output connected - suggests some interesting hilarity with memory bw/latency.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.