Bug 97545 - [HSW] DRM/i915 GPU hang after resume
Summary: [HSW] DRM/i915 GPU hang after resume
Status: RESOLVED WORKSFORME
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-31 02:20 UTC by d0c
Modified: 2017-05-08 15:05 UTC (History)
2 users (show)

See Also:
i915 platform: HSW
i915 features: GPU hang


Attachments
./tmp/drm-card0.error (3.09 MB, text/plain)
2016-08-31 02:24 UTC, d0c
Details
./tmp/dmesg.0-ubuntu16.04.1-4.4.0-31-generic (152.58 KB, text/plain)
2016-08-31 02:32 UTC, d0c
Details
./tmp/lspci-d-nn-vvv-list (57.91 KB, text/plain)
2016-08-31 02:34 UTC, d0c
Details
GPU crash dump for Linux-4.4.0-75-generic and OpenGL ES 3.0 Mesa 12.0.6 (3.08 MB, text/plain)
2017-05-06 21:26 UTC, Burak Sezer
Details

Description d0c 2016-08-31 02:20:17 UTC
kern  :info  : [  +4.208065] [drm] stuck on render ring
kern  :info  : [  +0.001191] [drm] GPU HANG: ecode 7:0:0x84dfbffe, in chrome [14343], reason: Ring hung, action: reset
kern  :info  : [  +0.000001] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
kern  :info  : [  +0.000001] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
kern  :info  : [  +0.000001] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
kern  :info  : [  +0.000001] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
kern  :info  : [  +0.000001] [drm] GPU crash dump saved to /sys/class/drm/card0/error
kern  :notice: [  +0.002149] drm/i915: Resetting chip after gpu hang
Comment 1 d0c 2016-08-31 02:24:13 UTC
Created attachment 126125 [details]
./tmp/drm-card0.error

the /sys/class/drm/card0/error dump.
Comment 2 d0c 2016-08-31 02:32:11 UTC
Created attachment 126126 [details]
./tmp/dmesg.0-ubuntu16.04.1-4.4.0-31-generic

the kernel message since bootup, this is on Apple MacBookPro11,2/Mac-3CBD00234E554E41,

together with some close-source wireless driver (Broadcom STA driver) crash, might be related or unrelated.

kern  :warn  : [  +0.000001] Hardware name: Apple Inc. MacBookPro11,2/Mac-3CBD00234E554E41, BIOS MBP112.88Z.0138.B14.1501071031 01/07/2015
kern  :warn  : [  +0.000009] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
Comment 3 d0c 2016-08-31 02:34:41 UTC
Created attachment 126127 [details]
./tmp/lspci-d-nn-vvv-list

0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Crystal Well Integrated Graphics Controller [8086:0d26] (rev 08) (prog-if 00 [VGA controller])
	Subsystem: Apple Inc. Crystal Well Integrated Graphics Controller [106b:012e]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 35
	Region 0: Memory at a0000000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at 90000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at 1000 [size=64]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00018  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: i915
	Kernel modules: i915
Comment 4 yann 2016-09-01 09:33:04 UTC
Assigning to Mesa product.

From this error dump, hung is happening in render ring batch with active head
at 0x083358b4, with 0x7b000005 (3DPRIMITIVE) as IPEHR.

Batch extract (around 0x083358b4):

0x0833589c:      0x78090003: 3DSTATE_VERTEX_ELEMENTS
0x083358a0:      0x02400000:    buffer 0: valid, type 0x0040, src offset 0x0000 bytes
0x083358a4:      0x11130000:    (X, Y, Z, 1.0), dst offset 0x00 bytes
0x083358a8:      0x02d80014:    buffer 0: valid, type 0x00d8, src offset 0x0014 bytes
0x083358ac:      0x12230000:    (X, 0.0, 0.0, 1.0), dst offset 0x00 bytes
0x083358b0:      0x7b000005: 3DPRIMITIVE:
0x083358b4:      0x00000104:    tri list random
0x083358b8:      0x00000006:    vertex count
0x083358bc:      0x00000000:    start vertex
0x083358c0:      0x00000001:    instance count
0x083358c4:      0x00000000:    start instance
0x083358c8:      0x00000000:    index bias
Comment 5 Matt Turner 2016-10-25 18:56:51 UTC
We don't have enough information.

- What kernel version are you using? (uname -a)
- What Mesa version are you using? (glxinfo | grep Mesa)
- What were you doing at the time of the hang? Is it reproducible?
Comment 6 Matt Turner 2016-11-02 07:16:16 UTC
Please test a new version of Mesa (12 or 13) and mark as REOPENED
if you can reproduce and RESOLVED/* if you cannot reproduce.
Comment 7 Mark Janes 2016-11-04 19:04:50 UTC
I experienced similar issues with Ubuntu 16.04.1 LTS.

It appears to me that Ubuntu's kernel has botched the backport of hsw support for their kernel.  Updating to the stock linux kernel fixed the issues.
Comment 8 Annie 2017-02-10 22:38:40 UTC
Dear Reporter,

This Mesa bug has been in the "NEEDINFO" status for over 60 days. I am closing this bug based on lack of response but feel free to reopen if resolution is still needed. Please ensure you're supplying the correct information as requested.

Thank you.
Comment 9 Burak Sezer 2017-05-06 21:23:23 UTC
Hello,

I have the same problem. Here are some info about my environment:

Kernel: Linux turing 4.4.0-75-generic #96-Ubuntu SMP on Ubuntu 16.04.2 LTS

Mesa: client glx vendor string: Mesa Project and SGI
    Device: Mesa DRI Intel(R) Haswell Desktop  (0x412)
OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
OpenGL core profile version string: 3.3 (Core Profile) Mesa 12.0.6
OpenGL version string: 3.0 Mesa 12.0.6
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 12.0.6

I was surfing on the Internet at the time of the hang with Google Chrome 58.0.3029.96 (64-bit)

How I can provide valuable information to resolve that problem?

Error messages from the driver:

[45609.749114] [drm] stuck on render ring
[45609.749785] [drm] GPU HANG: ecode 7:0:0x84dfbffe, in chrome [24099], reason: Ring hung, action: reset
[45609.749785] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[45609.749786] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[45609.749786] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[45609.749787] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[45609.749787] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[45609.751891] drm/i915: Resetting chip after gpu hang
Comment 10 Burak Sezer 2017-05-06 21:26:42 UTC
Created attachment 131238 [details]
GPU crash dump for Linux-4.4.0-75-generic and  OpenGL ES 3.0 Mesa 12.0.6
Comment 11 Mark Janes 2017-05-08 15:05:22 UTC
Burak: You can't provide any helpful information if the problem has already been fixed.

You can probably get your system working properly by installing an up-to-date kernel and mesa.  If that doesn't work, describe the results of that activity here.

If that does fix your issue, then you can help ubuntu LTS work better by taking your findings to launchpad.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.