Bug 99545 - [SKL] GPU HANG (after resuming from hibernation): ecode 9:1:0x4c32ff67, in Xorg [1902], reason: Hang on blitter ring, action: reset (kernel 4.9.0)
Summary: [SKL] GPU HANG (after resuming from hibernation): ecode 9:1:0x4c32ff67, in Xo...
Status: CLOSED DUPLICATE of bug 96526
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-26 06:53 UTC by unki
Modified: 2017-07-24 22:39 UTC (History)
3 users (show)

See Also:
i915 platform: SKL
i915 features: GPU hang, power/suspend-resume


Attachments
dmesg snapshot after returning from hibernation (8.20 KB, text/plain)
2017-01-26 06:53 UTC, unki
no flags Details
the gpu error log as requested in the dmesg messages from /sys/class/drm/card0/error (334.86 KB, text/plain)
2017-01-26 06:54 UTC, unki
no flags Details
dmesg (9.99 KB, text/plain)
2017-01-31 06:01 UTC, unki
no flags Details
/sys/class/drm/card0/error (334.86 KB, text/plain)
2017-01-31 06:01 UTC, unki
no flags Details

Description unki 2017-01-26 06:53:02 UTC
On returning from hibernation (suspend-to-disk) with kernel 4.9.0 the GPU seems to get into a state of constantly hung up. A hard reboot is then required.

I had these issues with kernel 4.8.0 too until I added intel_iommu=igfx_off as kernel cmdline parameter (by the way, Intel VT-d is disabled in BIOS).
Afterwards hibernation worked more or less smoothly (at least no GPU issues).

But now with 4.9.0, also intel_iommu=igfx_off seems to no longer to help.

Hardware: Lenovo ThinkPad X1 Carbon 4th

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 520 (rev 07) (prog-if 00 [VGA controller])
	Subsystem: Lenovo HD Graphics 520
	Flags: bus master, fast devsel, latency 0, IRQ 124
	Memory at e0000000 (64-bit, non-prefetchable) [size=16M]
	Memory at c0000000 (64-bit, prefetchable) [size=512M]
	I/O ports at e000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: i915
	Kernel modules: i915


OS: Debian Stretch (testing) amd64

user@host:/tmp$ uname -a
Linux carbon 4.9.0-1-amd64 #1 SMP Debian 4.9.2-2 (2017-01-12) x86_64 GNU/Linux

user@host:/tmp$ sudo glxinfo | grep Mesa
client glx vendor string: Mesa Project and SGI
    Device: Mesa DRI Intel(R) HD Graphics 520 (Skylake GT2)  (0x1916)
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 520 (Skylake GT2) 
OpenGL core profile version string: 4.5 (Core Profile) Mesa 13.0.3
OpenGL version string: 3.0 Mesa 13.0.3
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 13.0.3
Comment 1 unki 2017-01-26 06:53:48 UTC
Created attachment 129159 [details]
dmesg snapshot after returning from hibernation
Comment 2 unki 2017-01-26 06:54:39 UTC
Created attachment 129160 [details]
the gpu error log as requested in the dmesg messages from /sys/class/drm/card0/error
Comment 3 unki 2017-01-31 06:00:56 UTC
This morning, on resuming from hibernation, system has recovered after several GPU resets.
But only for a few minutes.
Then it locked up hard while working in X.
I've waited a few minutes if it recovers again, but finally performed a hard reset.

I've captured dmesg and /sys/class/drm/card0/error in the short time frame where the system was responding. I'm attaching them to this bug, in case you note any differences in there.
Comment 4 unki 2017-01-31 06:01:34 UTC
Created attachment 129244 [details]
dmesg
Comment 5 unki 2017-01-31 06:01:53 UTC
Created attachment 129245 [details]
/sys/class/drm/card0/error
Comment 6 Chris Wilson 2017-02-05 21:13:53 UTC
commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Sep 21 14:51:08 2016 +0100

    drm/i915/execlists: Reset RING registers upon resume
    
    There is a disparity in the context image saved to disk and our own
    bookkeeping - that is we presume the RING_HEAD and RING_TAIL match our
    stored ce->ring->tail value. However, as we emit WA_TAIL_DWORDS into the
    ring but may not tell the GPU about them, the GPU may be lagging behind
    our bookkeeping. Upon hibernation we do not save stolen pages, presuming
    that their contents are volatile. This means that although we start
    writing into the ring at tail, the GPU starts executing from its HEAD
    and there may be some garbage in between and so the GPU promptly hangs
    upon resume.
    
    Testcase: igt/gem_exec_suspend/basic-S4
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3-chris@chris-wilson.co.uk

*** This bug has been marked as a duplicate of bug 96526 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.