If I hibernate from the console, thaw, and switch between the console and X, the gpu hangs itself. The machine becomes unresponsive except for the power button. 21:12:16 kernel: [drm] GPU HANG: ecode 8:0:0x0f71ffff, in Xorg [2047], reason: Hang on render ring, action: reset 21:12:16 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. 21:12:16 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel 21:12:16 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. 21:12:16 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. 21:12:16 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error 21:12:16 kernel: drm/i915: Resetting chip after gpu hang I tried to bisect, and found commit 068715b922a6f87c454cdfa15bb8049d2076eee6 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Aug 18 17:17:11 2016 +0100 drm/i915/cmdparser: Add the TIMESTAMP register for the other engines as the first bad commit, i.e with a gpu hang after hibernation. This sseems spurious, reverting this commit didnt not help. This is a Lenoveo T450s with Broadwell-U Integrated Graphics. I attach my .config and /sys/class/drm/card0/error. Regards, Martin
Created attachment 127348 [details] ..config of my linux kernel
Created attachment 127349 [details] content of /sys/class/drm/card0/error
Can you attached as well your kernel log ; please add "drm.debug=0x1e log_buf_len=1M" in your boot command line
I the website did not let me add an attachment. You can find it on my homepage http://home.mathematik.uni-freiburg.de/ziegler/kern_log
The kernel log contains two runs of linux-4.9-rc1 In the first run I hibernated at 17:13:09 and again at 17:14:13, but could not trigger the gpu hang. The second run started at 17:16:38, hibernation at 17:17:15 and the gpu hang at 17:17:49
Created attachment 127364 [details] kernel log with drm.debug=0x1e
the kernel log is attached now.
(In reply to Martin Ziegler from comment #7) > the kernel log is attached now. thanks Martin. It looks like prior to gpu hang happen there are many warning messages linked to dp link training (?) intel_dp_aux_transfer (with i915_hotplug_work_func event): WARN_ON(!msg->buffer != !msg->size) and this is also same as bug 98304 and bug 97344 *** This bug has been marked as a duplicate of bug 97344 ***
According to Jani in https://bugs.freedesktop.org/show_bug.cgi?id=97344#c10 the GPU hang is unrelated to bug #97344. I can confirm a GPU hang on resume from hibernation for 4.9-rc7 plus some mini merges by Linus, compiled yesterday. I am back at 4.8 again as I do not want to afford an unstable kernel at the moment. I also had a GPU hang with PlaneShift with a slightly older kernel (4.9-rc4 + drm-intel-fixes), but as instructed by kernel log I reported this as new bug, #98922. Yet it may be related to this one and various other ones like #98794, #98860, #98891.
The RING_HEAD (loaded from the context) is at the old tail pointer (minus the WA_TAIL) which was lost over suspend (due to stolen memory being reused). We wrote the next request after the WA_TAIL leaving 2 dwords of garbage in the ring.
commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Sep 21 14:51:08 2016 +0100 drm/i915/execlists: Reset RING registers upon resume There is a disparity in the context image saved to disk and our own bookkeeping - that is we presume the RING_HEAD and RING_TAIL match our stored ce->ring->tail value. However, as we emit WA_TAIL_DWORDS into the ring but may not tell the GPU about them, the GPU may be lagging behind our bookkeeping. Upon hibernation we do not save stolen pages, presuming that their contents are volatile. This means that although we start writing into the ring at tail, the GPU starts executing from its HEAD and there may be some garbage in between and so the GPU promptly hangs upon resume. Testcase: igt/gem_exec_suspend/basic-S4 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3-ch ris@chris-wilson.co.uk *** This bug has been marked as a duplicate of bug 96526 ***
Update: Tbe bug is still present in the recent kernel. commit 045169816b31b10faed984b01c390db1b32ee4c1 Merge: cd66289 678b5c6 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Sat Dec 10 09:47:13 2016 -0800 hibernate is unusable since 4.9-rc1
Closing resolved as duplicate of closed+fixed.
Chris' patch solved the problem. Thanks. The patch is not yet in Linus' v4.9 though
Chris's patch appeared in linux-4.10.rc1. But is still not in linux-4.9.4 (from Jan 15, 2017) The cpu-hang is still reproducible.
Chris, is bafb2f7d4755 ("drm/i915/execlists: Reset RING registers upon resume") cc: stable material?
The patch is in 4.9.9-rc1: https://lkml.org/lkml/2017/2/7/311
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.