Bug 103394

Summary:

GPU HANG: ecode 9:0:0x5fc8cb4d, in Xorg [3647], reason: Hang on render ring, action: reset

Product:

DRI

Reporter:

Nikolaus Rath <Nikolaus>

Component:

DRM/Intel

Assignee:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Status:

CLOSED DUPLICATE

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

normal

Priority:

medium

CC:

intel-gfx-bugs

Version:

XOrg git

Hardware:

Other

OS:

All

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
GPU crash dump	none

Description Nikolaus Rath 2017-10-21 17:39:50 UTC

Created attachment 134978 [details]
GPU crash dump

Hi,

When resuming from hibernation on a Thinkpad Carbon X1, I got the following message from the kernel:

Oct 21 18:36:07 thinkpad kernel: [  596.840614] [drm] GPU HANG: ecode 9:0:0x5fc8cb4d, in Xorg [3647], reason: Hang on render ring, action: reset
Oct 21 18:36:07 thinkpad kernel: [  596.840616] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Oct 21 18:36:07 thinkpad kernel: [  596.840618] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Oct 21 18:36:07 thinkpad kernel: [  596.840618] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Oct 21 18:36:07 thinkpad kernel: [  596.840620] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Oct 21 18:36:07 thinkpad kernel: [  596.840621] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Oct 21 18:36:07 thinkpad kernel: [  596.840668] drm/i915: Resetting chip after gpu hang
Oct 21 18:36:07 thinkpad kernel: [  596.840725] [drm] RC6 on
Oct 21 18:36:07 thinkpad kernel: [  596.860966] [drm] GuC firmware load skipped
Oct 21 18:36:16 thinkpad kernel: [  605.864564] drm/i915: Resetting chip after gpu hang
Oct 21 18:36:16 thinkpad kernel: [  605.864627] [drm] RC6 on
Oct 21 18:36:16 thinkpad kernel: [  605.882653] [drm] GuC firmware load skipped


As requested, I am thus filing this bug.

Comment 1 Chris Wilson 2017-10-21 17:57:12 UTC

commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Sep 21 14:51:08 2016 +0100

    drm/i915/execlists: Reset RING registers upon resume
    
    There is a disparity in the context image saved to disk and our own
    bookkeeping - that is we presume the RING_HEAD and RING_TAIL match our
    stored ce->ring->tail value. However, as we emit WA_TAIL_DWORDS into the
    ring but may not tell the GPU about them, the GPU may be lagging behind
    our bookkeeping. Upon hibernation we do not save stolen pages, presuming
    that their contents are volatile. This means that although we start
    writing into the ring at tail, the GPU starts executing from its HEAD
    and there may be some garbage in between and so the GPU promptly hangs
    upon resume.
    
    Testcase: igt/gem_exec_suspend/basic-S4
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3-chris@chris-wilson.co.uk

*** This bug has been marked as a duplicate of bug 96526 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.