Bug 102831

Summary: GPU hang after resume from suspend to disk
Product: DRI Reporter: Aaditya Bagga <abchk1234>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
GPU crash dump none

Description Aaditya Bagga 2017-09-18 04:48:08 UTC
Hi,

After resume from hibernation (suspend to disk), initially all system is not
responsive (can move mouse but keyboard does not work nor does the caps lock
light change on pressing), after a while (like 20-30s) can type again and login via display manager.

User gets gets logged in, but after a few seconds it becomes unresponsive again; goes black and after a while I get the desktop back but if I try to move the mouse or interact with anything it flickers.

Checked the logs and got following output from dmesg:

[   84.718912] [drm] GPU HANG: ecode 8:0:0xa7e661f7, in Xorg [2580], reason: Hang on render ring, action: reset
[   84.718915] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   84.718917] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   84.718918] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   84.718920] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   84.718921] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   84.718966] drm/i915: Resetting chip after gpu hang
[   92.717914] drm/i915: Resetting chip after gpu hang
[  113.773170] drm/i915: Resetting chip after gpu hang
[  123.756738] drm/i915: Resetting chip after gpu hang
[  133.740346] drm/i915: Resetting chip after gpu hang
[  145.771956] drm/i915: Resetting chip after gpu hang
[  155.755527] drm/i915: Resetting chip after gpu hang

Have this bug with Linux 4.9.35 and 4.9.70 (ie, the 4.9 series), works as
expected on Linux 4.4.75 (and the 4.4 series in general).

Hardware is laptop with Intel Broadwell graphics.

$ inxi -Fxz
System:    Host: slackware Kernel: 4.4.88 x86_64 bits: 64 gcc: 5.3.0 Desktop: Xfce 4.12.4 (Gtk 2.24.31)
           Distro: Slackware 14.2
Machine:   Device: laptop System: FUJITSU product: LIFEBOOK A555 serial: N/A
           Mobo: FUJITSU model: FJNBB3E serial: N/A UEFI [Legacy]: FUJITSU // Insyde v: 1.21 date: 05/31/2016
CPU:       Dual core Intel Core i3-5005U (-HT-MCP-) arch: Broadwell rev.4 cache: 3072 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 7981
           clock speeds: max: 2000 MHz 1: 1316 MHz 2: 828 MHz 3: 807 MHz 4: 825 MHz
Graphics:  Card: Intel Broadwell-U Integrated Graphics bus-ID: 00:02.0
           Display Server: X.Org 1.18.3 driver: intel Resolution: 1366x768@60.00hz
           OpenGL: renderer: Mesa DRI Intel HD Graphics 5500 (Broadwell GT2)
           version: 4.5 Mesa 13.0.6 Direct Render: Yes

Can provide more info if needed.
Comment 1 Aaditya Bagga 2017-09-18 04:49:12 UTC
Created attachment 134307 [details]
GPU crash dump
Comment 2 Chris Wilson 2017-09-18 09:13:44 UTC
commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Sep 21 14:51:08 2016 +0100

    drm/i915/execlists: Reset RING registers upon resume
    
    There is a disparity in the context image saved to disk and our own
    bookkeeping - that is we presume the RING_HEAD and RING_TAIL match our
    stored ce->ring->tail value. However, as we emit WA_TAIL_DWORDS into the
    ring but may not tell the GPU about them, the GPU may be lagging behind
    our bookkeeping. Upon hibernation we do not save stolen pages, presuming
    that their contents are volatile. This means that although we start
    writing into the ring at tail, the GPU starts executing from its HEAD
    and there may be some garbage in between and so the GPU promptly hangs
    upon resume.
    
    Testcase: igt/gem_exec_suspend/basic-S4
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3-chris@chris-wilson.co.uk

*** This bug has been marked as a duplicate of bug 96526 ***
Comment 3 Aaditya Bagga 2017-09-18 11:07:34 UTC
Hi Chris,

Thx for the info, I have applied the patch and am building the kernel to test.

Could you tell who to report it to so that it could be added to the 4.9 kernel series?
Comment 4 Aaditya Bagga 2017-09-18 14:06:04 UTC
Tested for a couple of hours with a couple of resumes from disk; seems to be working as expected.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.