Bug 100280 - [hsw] GPU Hang on freeze and restore -- context restore
Summary: [hsw] GPU Hang on freeze and restore -- context restore
Status: CLOSED DUPLICATE of bug 99993
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-19 13:13 UTC by Jens
Modified: 2017-07-24 22:20 UTC (History)
1 user (show)

See Also:
i915 platform: HSW
i915 features: GPU hang


Attachments
/sys/... /card0 error (8.22 KB, application/x-bzip)
2017-03-19 13:13 UTC, Jens
no flags Details
dmesg boot log including error (79.71 KB, text/plain)
2017-03-19 13:14 UTC, Jens
no flags Details
dmesg with drm.debug=0x1e enabled (107.49 KB, application/x-bzip)
2017-03-19 13:49 UTC, Jens
no flags Details
updated sys/.../card0/error matching dmesg output (6.76 KB, application/x-bzip)
2017-03-19 13:50 UTC, Jens
no flags Details
drm-tip kernel dmesg hibernation output (drm.debug=0x1e) (9.00 KB, text/plain)
2017-03-20 08:52 UTC, Jens
no flags Details

Description Jens 2017-03-19 13:13:28 UTC
Created attachment 130312 [details]
/sys/... /card0 error

I just upgraded to the newest Ubuntu mainline kernel (see http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/, as of 2017-03-19) because of stability issues in 4.10.x. When hibernating and resuming, I got this in dmesg:

[  100.149192] Restarting tasks ... 
[  100.149557] pci_bus 0000:04: Allocating resources
[  100.149583] pci 0000:03:00.0: PCI bridge to [bus 04]
[  100.149590] pci 0000:03:00.0:   bridge window [io  0x3000-0x3fff]
[  100.149602] pci 0000:03:00.0:   bridge window [mem 0xdf600000-0xdf7fffff]
[  100.149611] pci 0000:03:00.0:   bridge window [mem 0xdf800000-0xdf9fffff 64bit pref]
[  100.166320] done.
[  100.166620] video LNXVIDEO:00: Restoring backlight state
[  100.351557] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[  100.396813] r8169 0000:02:00.0 eth0: link down
[  100.396987] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[  103.178781] r8169 0000:02:00.0 eth0: link up
[  103.178786] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  109.825037] [drm] GPU HANG: ecode 7:0:0x86dfbff9, in compiz [3259], reason: Hang on render ring, action: reset
[  109.825040] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  109.825041] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  109.825042] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  109.825043] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  109.825045] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  109.825111] drm/i915: Resetting chip after gpu hang
[  117.813414] drm/i915: Resetting chip after gpu hang
[  125.781369] drm/i915: Resetting chip after gpu hang
[  133.781382] drm/i915: Resetting chip after gpu hang
[  141.781460] drm/i915: Resetting chip after gpu hang
[  149.780713] drm/i915: Resetting chip after gpu hang

... which I am doing right now. After resuming, the system was unresponsive until I changed to the console (Ctrl-Alt-F1) and back to X (Alt-F7) and waited approximately 30 seconds, in which the screen was alternatively filled with frozen images of my desktop (but distorted), complete garbage (white noise) and nothing.

The kernel version is

Linux linuxkiste 4.11.0-999-generic #201703182201 SMP Sun Mar 19 02:03:00 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

The hardware is a MSI-7817 Intell Haswell i915 chipset integrated graphics, connected to a 22" Full HD Samsung S22C300 monitor via VGA.

$ xrandr --verbose
jens@linuxkiste:~$ xrandr --verbose
Screen 0: minimum 8 x 8, current 1920 x 1080, maximum 32767 x 32767
...
VGA1 connected primary 1920x1080+0+0 (0x48) normal (normal left inverted right x axis y axis) 477mm x 268mm
	Identifier: 0x45
	Timestamp:  13811
	Subpixel:   unknown
	Gamma:      1.0:1.0:1.0
	Brightness: 1.0
	Clones:    
	CRTC:       0
	CRTCs:      0 1 2
	Transform:  1.000000 0.000000 0.000000
	            0.000000 1.000000 0.000000
	            0.000000 0.000000 1.000000
	           filter: 
	EDID: 
		00ffffffffffff004c2d1d0a33383230
		281701030e301b782a90c1a259559c27
		0e5054bfef80714f81c0810081809500
		a9c0b3000101023a801871382d40582c
		4500dd0c1100001e000000fd00384b1e
		5111000a202020202020000000fc0053
		3232433330300a2020202020000000ff
		0048344d443930363134310a2020008e
  1920x1080 (0x48) 148.500MHz +HSync +VSync *current +preferred
        h: width  1920 start 2008 end 2052 total 2200 skew    0 clock  67.50KHz
        v: height 1080 start 1084 end 1089 total 1125           clock  60.00Hz
  ...
Comment 1 Jens 2017-03-19 13:14:56 UTC
Created attachment 130313 [details]
dmesg boot log including error
Comment 2 Jens 2017-03-19 13:23:11 UTC
This is 100% reproducable. Another hibernate attempt after the first one results in a completely frozen system.
Comment 3 Jens 2017-03-19 13:38:08 UTC
Update. Crash does not happen with 4.10.4 (Ubuntu mainline:
Linux linuxkiste 4.10.4-041004-generic #201703180831 SMP Sat Mar 18 12:34:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux). Two hibernate/resume cycles in a row worked perfectly.
Comment 4 Jens 2017-03-19 13:48:30 UTC
Same with earlier kernel build:

Linux linuxkiste 4.11.0-999-generic #201703092101 SMP Fri Mar 10 02:03:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Including drm.debug=0x1e enabled dmesg output. It contains a lot of garbage because of mouse cursor movement on the desktop, though..
Comment 5 Jens 2017-03-19 13:49:05 UTC
Created attachment 130314 [details]
dmesg with drm.debug=0x1e enabled
Comment 6 Jens 2017-03-19 13:50:11 UTC
Created attachment 130315 [details]
updated sys/.../card0/error matching dmesg output
Comment 7 Chris Wilson 2017-03-19 14:20:50 UTC
dmesg is not required for this, the error state is informative enough. Can you please try a known kernel such as drm-tip [https://cgit.freedesktop.org/drm-tip] so that we can check it hasn't been resolved already.
Comment 8 Jens 2017-03-20 08:52:07 UTC
DRM-tip as of 2017.03.18 (this one: http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/2017-03-18/) does not even resume correctly, the system freezes completely after resume has finished. The machine can be pinged but is completely unresponsive (no SSH login, local keyboard is frozen, etc).

I am attaching the dmesg output until hibernation has completed. I cannot access resume dmesg log since the machine crashes upon boot and I can't get netconsole working for some reason...
Comment 9 Jens 2017-03-20 08:52:45 UTC
Created attachment 130322 [details]
drm-tip kernel dmesg hibernation output (drm.debug=0x1e)
Comment 10 Chris Wilson 2017-03-22 20:51:36 UTC

*** This bug has been marked as a duplicate of bug 99993 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.