Bug 101959 - GPU HANG: ecode 9:0:0x30b1fddf, in X [2110], reason: Hang on render ring, action: reset
Summary: GPU HANG: ecode 9:0:0x30b1fddf, in X [2110], reason: Hang on render ring, act...
Status: CLOSED DUPLICATE of bug 96526
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-28 09:31 UTC by Matwey V. Kornilov
Modified: 2017-07-29 07:44 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
/sys/class/drm/card0/error (755.74 KB, text/plain)
2017-07-28 09:31 UTC, Matwey V. Kornilov
no flags Details
dmesg (93.59 KB, text/plain)
2017-07-28 09:32 UTC, Matwey V. Kornilov
no flags Details
hwinfo (1.30 MB, text/plain)
2017-07-28 09:32 UTC, Matwey V. Kornilov
no flags Details

Description Matwey V. Kornilov 2017-07-28 09:31:24 UTC
Created attachment 133089 [details]
/sys/class/drm/card0/error

Hello,

I am running openSUSE Leap 42.3 with kernel 4.4.76-1-default and see the following after resume from suspend (not always, sometimes it just hangs forever).

[ 2089.780499] [drm] GPU HANG: ecode 9:0:0x30b1fddf, in X [2110], reason: Hang on render ring, action: reset
[ 2089.780502] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 2089.780503] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 2089.780504] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 2089.780505] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 2089.780506] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 2089.780570] drm/i915: Resetting chip after gpu hang
[ 2089.780665] [drm] RC6 on
[ 2089.796281] [drm] GuC firmware load skipped
[ 2101.816229] drm/i915: Resetting chip after gpu hang
[ 2101.816323] [drm] RC6 on
[ 2101.830392] [drm] GuC firmware load skipped
Comment 1 Matwey V. Kornilov 2017-07-28 09:32:18 UTC
Created attachment 133090 [details]
dmesg
Comment 2 Matwey V. Kornilov 2017-07-28 09:32:48 UTC
Created attachment 133091 [details]
hwinfo
Comment 3 Chris Wilson 2017-07-28 09:55:25 UTC
commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Sep 21 14:51:08 2016 +0100

    drm/i915/execlists: Reset RING registers upon resume
    
    There is a disparity in the context image saved to disk and our own
    bookkeeping - that is we presume the RING_HEAD and RING_TAIL match our
    stored ce->ring->tail value. However, as we emit WA_TAIL_DWORDS into the
    ring but may not tell the GPU about them, the GPU may be lagging behind
    our bookkeeping. Upon hibernation we do not save stolen pages, presuming
    that their contents are volatile. This means that although we start
    writing into the ring at tail, the GPU starts executing from its HEAD
    and there may be some garbage in between and so the GPU promptly hangs
    upon resume.
    
    Testcase: igt/gem_exec_suspend/basic-S4
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3-ch
ris@chris-wilson.co.uk

*** This bug has been marked as a duplicate of bug 96526 ***
Comment 4 Stefan Dirsch 2017-07-28 10:11:06 UTC
Thanks, Chris!

# git describe bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae
v4.8-rc2-641-gbafb2f7d4755

With leap 42.3 we ship a drm-kmp package, which updates drm/KMS to Kernel 4.9, so this fix should already be in.

Mawtey, could you verify, that drm-kmp package is installed on your system?

  rpm -qa | grep drm-kmp
Comment 5 Matwey V. Kornilov 2017-07-28 10:12:53 UTC
> rpm -qa | grep drm-kmp
drm-kmp-default-4.9.33_k4.4.76_1-3.2.x86_64
Comment 6 Stefan Dirsch 2017-07-28 10:13:41 UTC
Ok. Then it needs to be something different ...
Comment 7 Ricardo 2017-07-28 14:13:22 UTC
closing as this is a duplicate
Comment 8 solitone 2017-07-29 07:44:47 UTC
(In reply to Stefan Dirsch from comment #6)
> Ok. Then it needs to be something different ...

If I understand it right, the commit containing this patch has been reverted in the production kernel:

################
commit 0ee72d8f9b8e17b8e4ccfebc7a25cbc2d395cd6a
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Wed Apr 12 15:49:39 2017 +0200

    Revert "drm/i915/execlists: Reset RING registers upon resume"
    
    This reverts commit f2a0409a08502d64fbe3990354dff5902b08d2fb which is
    commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae upstream.
    
    It was reported to have problems.
################

https://lists.freedesktop.org/archives/intel-gfx/2017-April/125833.html

I therefore wonder whether this means this bug is still there in the production kernel.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.