99545 – [SKL] GPU HANG (after resuming from hibernation): ecode 9:1:0x4c32ff67, in Xorg [1902], reason: Hang on blitter ring, action: reset (kernel 4.9.0)

Bug 99545 - [SKL] GPU HANG (after resuming from hibernation): ecode 9:1:0x4c32ff67, in Xorg [1902], reason: Hang on blitter ring, action: reset (kernel 4.9.0)

Summary: [SKL] GPU HANG (after resuming from hibernation): ecode 9:1:0x4c32ff67, in Xo...

Status:	CLOSED DUPLICATE of bug 96526

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2017-01-26 06:53 UTC by unki
Modified:	2017-07-24 22:39 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:	SKL
i915 features:	GPU hang, power/suspend-resume

Attachments
dmesg snapshot after returning from hibernation (8.20 KB, text/plain) 2017-01-26 06:53 UTC, unki	no flags	Details
the gpu error log as requested in the dmesg messages from /sys/class/drm/card0/error (334.86 KB, text/plain) 2017-01-26 06:54 UTC, unki	no flags	Details
dmesg (9.99 KB, text/plain) 2017-01-31 06:01 UTC, unki	no flags	Details
/sys/class/drm/card0/error (334.86 KB, text/plain) 2017-01-31 06:01 UTC, unki	no flags	Details
View All

Description unki 2017-01-26 06:53:02 UTC

On returning from hibernation (suspend-to-disk) with kernel 4.9.0 the GPU seems to get into a state of constantly hung up. A hard reboot is then required.

I had these issues with kernel 4.8.0 too until I added intel_iommu=igfx_off as kernel cmdline parameter (by the way, Intel VT-d is disabled in BIOS).
Afterwards hibernation worked more or less smoothly (at least no GPU issues).

But now with 4.9.0, also intel_iommu=igfx_off seems to no longer to help.

Hardware: Lenovo ThinkPad X1 Carbon 4th

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 520 (rev 07) (prog-if 00 [VGA controller])
	Subsystem: Lenovo HD Graphics 520
	Flags: bus master, fast devsel, latency 0, IRQ 124
	Memory at e0000000 (64-bit, non-prefetchable) [size=16M]
	Memory at c0000000 (64-bit, prefetchable) [size=512M]
	I/O ports at e000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: i915
	Kernel modules: i915


OS: Debian Stretch (testing) amd64

user@host:/tmp$ uname -a
Linux carbon 4.9.0-1-amd64 #1 SMP Debian 4.9.2-2 (2017-01-12) x86_64 GNU/Linux

user@host:/tmp$ sudo glxinfo | grep Mesa
client glx vendor string: Mesa Project and SGI
    Device: Mesa DRI Intel(R) HD Graphics 520 (Skylake GT2)  (0x1916)
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 520 (Skylake GT2) 
OpenGL core profile version string: 4.5 (Core Profile) Mesa 13.0.3
OpenGL version string: 3.0 Mesa 13.0.3
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 13.0.3

Comment 1 unki 2017-01-26 06:53:48 UTC

Created attachment 129159 [details]
dmesg snapshot after returning from hibernation

Comment 2 unki 2017-01-26 06:54:39 UTC

Created attachment 129160 [details]
the gpu error log as requested in the dmesg messages from /sys/class/drm/card0/error

Comment 3 unki 2017-01-31 06:00:56 UTC

This morning, on resuming from hibernation, system has recovered after several GPU resets.
But only for a few minutes.
Then it locked up hard while working in X.
I've waited a few minutes if it recovers again, but finally performed a hard reset.

I've captured dmesg and /sys/class/drm/card0/error in the short time frame where the system was responding. I'm attaching them to this bug, in case you note any differences in there.

Comment 4 unki 2017-01-31 06:01:34 UTC

Created attachment 129244 [details]
dmesg

Comment 5 unki 2017-01-31 06:01:53 UTC

Created attachment 129245 [details]
/sys/class/drm/card0/error

Comment 6 Chris Wilson 2017-02-05 21:13:53 UTC

commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Sep 21 14:51:08 2016 +0100

    drm/i915/execlists: Reset RING registers upon resume
    
    There is a disparity in the context image saved to disk and our own
    bookkeeping - that is we presume the RING_HEAD and RING_TAIL match our
    stored ce->ring->tail value. However, as we emit WA_TAIL_DWORDS into the
    ring but may not tell the GPU about them, the GPU may be lagging behind
    our bookkeeping. Upon hibernation we do not save stolen pages, presuming
    that their contents are volatile. This means that although we start
    writing into the ring at tail, the GPU starts executing from its HEAD
    and there may be some garbage in between and so the GPU promptly hangs
    upon resume.
    
    Testcase: igt/gem_exec_suspend/basic-S4
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3-chris@chris-wilson.co.uk

*** This bug has been marked as a duplicate of bug 96526 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.