Bug 98470 - [BAT APL] gem_exec_suspend/basic-S3: GPU hang
Summary: [BAT APL] gem_exec_suspend/basic-S3: GPU hang
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-28 13:51 UTC by Imre Deak
Modified: 2016-11-07 13:21 UTC (History)
1 user (show)

See Also:
i915 platform: BXT
i915 features: GPU hang


Attachments
dmesg (326.03 KB, text/plain)
2016-10-28 13:51 UTC, Imre Deak
no flags Details
GPU error state (99.28 KB, text/plain)
2016-10-28 14:24 UTC, Imre Deak
no flags Details

Description Imre Deak 2016-10-28 13:51:12 UTC
Created attachment 127587 [details]
dmesg

Running igt/gem_exec_suspend/basic-S3 a few times leads to a GPU hang during suspend. The problem seems to happen after a GPU reset and since

commit 1c777c5d1dcdf8fa0223fcff35fb387b5bb9517a
Author: Imre Deak <imre.deak@intel.com>
Date:   Wed Oct 12 17:46:37 2016 +0300

    drm/i915/hsw: Fix GPU hang during resume from S3-devices state

we always perform a reset during suspend. Reverting this patch the problem can be reproduced by doing suspend/resume cycles after a GPU reset.
Comment 1 Chris Wilson 2016-10-28 14:10:46 UTC
That hang is right in the middle of suspend. So should be near the start since we flush the active rendering early on. I wonder if it happens to be something like we stop listening to an irq too early? The error state might be interesting.
Comment 2 Imre Deak 2016-10-28 14:24:47 UTC
Created attachment 127588 [details]
GPU error state

(In reply to Chris Wilson from comment #1)
> That hang is right in the middle of suspend. So should be near the start
> since we flush the active rendering early on. I wonder if it happens to be
> something like we stop listening to an irq too early?

Could be. I also noticed
hpet1: lost 7161 rtc interrupts
errors starting to appear after the problem.

> The error state might be interesting.

Attached.
Comment 3 Chris Wilson 2016-10-28 14:59:55 UTC
The render ring is off in a world of its own, flying through space, blt, bsd, vebox all coincidentally wrapped at exactly the same time with just before the breadcrumb for idling on suspend - and on all 3 rings it parsed the MI_FLUSH_DW command but did not execute the seqno write. Otherwise they look solid, all the pointers (acthd, faddr, ring start match). Oh my.
Comment 4 yann 2016-11-03 16:44:37 UTC
Reference to Imre's patchset: https://patchwork.freedesktop.org/series/14789/
Comment 5 Imre Deak 2016-11-07 12:58:07 UTC
Fix merged to -nightly.
Comment 6 yann 2016-11-07 13:21:42 UTC
(In reply to Imre Deak from comment #5)
> Fix merged to -nightly.

closing as fixed


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.