Bug 98470

Summary: [BAT APL] gem_exec_suspend/basic-S3: GPU hang
Product: DRI Reporter: Imre Deak <imre.deak>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: BXT i915 features: GPU hang
Attachments:
Description Flags
dmesg
none
GPU error state none

Description Imre Deak 2016-10-28 13:51:12 UTC
Created attachment 127587 [details]
dmesg

Running igt/gem_exec_suspend/basic-S3 a few times leads to a GPU hang during suspend. The problem seems to happen after a GPU reset and since

commit 1c777c5d1dcdf8fa0223fcff35fb387b5bb9517a
Author: Imre Deak <imre.deak@intel.com>
Date:   Wed Oct 12 17:46:37 2016 +0300

    drm/i915/hsw: Fix GPU hang during resume from S3-devices state

we always perform a reset during suspend. Reverting this patch the problem can be reproduced by doing suspend/resume cycles after a GPU reset.
Comment 1 Chris Wilson 2016-10-28 14:10:46 UTC
That hang is right in the middle of suspend. So should be near the start since we flush the active rendering early on. I wonder if it happens to be something like we stop listening to an irq too early? The error state might be interesting.
Comment 2 Imre Deak 2016-10-28 14:24:47 UTC
Created attachment 127588 [details]
GPU error state

(In reply to Chris Wilson from comment #1)
> That hang is right in the middle of suspend. So should be near the start
> since we flush the active rendering early on. I wonder if it happens to be
> something like we stop listening to an irq too early?

Could be. I also noticed
hpet1: lost 7161 rtc interrupts
errors starting to appear after the problem.

> The error state might be interesting.

Attached.
Comment 3 Chris Wilson 2016-10-28 14:59:55 UTC
The render ring is off in a world of its own, flying through space, blt, bsd, vebox all coincidentally wrapped at exactly the same time with just before the breadcrumb for idling on suspend - and on all 3 rings it parsed the MI_FLUSH_DW command but did not execute the seqno write. Otherwise they look solid, all the pointers (acthd, faddr, ring start match). Oh my.
Comment 4 yann 2016-11-03 16:44:37 UTC
Reference to Imre's patchset: https://patchwork.freedesktop.org/series/14789/
Comment 5 Imre Deak 2016-11-07 12:58:07 UTC
Fix merged to -nightly.
Comment 6 yann 2016-11-07 13:21:42 UTC
(In reply to Imre Deak from comment #5)
> Fix merged to -nightly.

closing as fixed

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.