98309 – [SKL] [drm] GPU HANG: ecode 9:0:0x85dffffb, in X [447], reason: Hang on render ring, action: reset

Bug 98309 - [SKL] [drm] GPU HANG: ecode 9:0:0x85dffffb, in X [447], reason: Hang on render ring, action: reset

Summary: [SKL] [drm] GPU HANG: ecode 9:0:0x85dffffb, in X [447], reason: Hang on rende...

Status:	RESOLVED DUPLICATE of bug 89360

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i915 (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Ian Romanick
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-10-18 14:11 UTC by Mads
Modified:	2016-10-19 15:42 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:	SKL
i915 features:	GPU hang

Attachments
/sys/class/drm/card0/error (31.88 KB, text/plain) 2016-10-18 14:12 UTC, Mads	Details
another /sys/class/drm/card0/error (34.05 KB, text/plain) 2016-10-18 18:49 UTC, Mads	Details
View All

Description Mads 2016-10-18 14:11:27 UTC

With drm-intel-nightly

commit 5b633f423e27af3a7f30d303e243f5a2e82917ae
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 18 14:27:24 2016 +0100

    drm-intel-nightly: 2016y-10m-18d-13h-24m-11s UTC integration manifest

[   66.622400] [drm] GPU HANG: ecode 9:0:0x85dffffb, in X [447], reason: Hang on render ring, action: reset
[   66.622402] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   66.622403] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   66.622404] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   66.622405] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   66.622406] [drm] GPU crash dump saved to /sys/class/drm/card0/error

Happens all the time after starting X (after a minute or so). Virtualbox is installed so that taints the kernel I guess, but I post the /sys/class/drm/card0/error anyway.

This is on a Dell XPS 9550.

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)

Comment 1 Mads 2016-10-18 14:12:17 UTC

Created attachment 127377 [details]
/sys/class/drm/card0/error

Comment 2 Chris Wilson 2016-10-18 14:52:44 UTC

Recent?

Can you please try with intel_iommu=igfx_off

Comment 3 yann 2016-10-18 16:18:14 UTC

Mads, please retry as Chris is advising:  with intel_iommu=igfx_off on your boot command line. 
There were also improvements pushed in Mesa that will benefit to your system, so please re-test with latest Mesa to see if this issue is still occurring.

In parallel, assigning to Mesa product (please let me know if I am mistaken with this GPU Hang).

Kernel: 4.9.0-rc1+
Platform: Skylake (pci id: 0x191b; PCI Revision: 0x06; PCI Subsystem: 1028:06e4)
Mesa: [Please confirm your mesa version]

From this error dump, hung is happening in render ring batch with active head at 0xc7ffe38c, with 0x7a000004 (PIPE_CONTROL) as IPEHR.

Batch extract (around 0xc7ffe38c):

0xc7ffe358:      0x7b000005: 3DPRIMITIVE: fail sequential
0xc7ffe35c:      0x00000006:    vertex count
0xc7ffe360:      0x00000004:    start vertex
0xc7ffe364:      0x00000000:    instance count
0xc7ffe368:      0x00000001:    start instance
0xc7ffe36c:      0x00000000:    index bias
0xc7ffe370:      0x00000000: MI_NOOP
Bad count in PIPE_CONTROL
0xc7ffe374:      0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush
0xc7ffe378:      0x00101001:    destination address
0xc7ffe37c:      0x00000000:    immediate dword low
0xc7ffe380:      0x00000000:    immediate dword high
Bad count in PIPE_CONTROL
0xc7ffe38c:      0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush
0xc7ffe390:      0x00000408:    destination address
0xc7ffe394:      0x00000000:    immediate dword low
0xc7ffe398:      0x00000000:    immediate dword high
0xc7ffe3a4:      0x78230000: 3D UNKNOWN: 3d_965 opcode = 0x7823
0xc7ffe3a8:      0x00007e00: MI_NOOP

Comment 4 Mads 2016-10-18 18:46:40 UTC

My mesa version:

commit a4622305e67dbb3ed224fa966160616688e43ee8
Author: Emil Velikov <emil.velikov@collabora.com>
Date:   Wed Oct 12 16:06:47 2016 +0100

    swr: automake: add ar_eventhandlerfile_h.template to the tarball
    
    Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

It hasn't crashed yet when I booted with intel_iommu=igfx_off.

Comment 5 Mads 2016-10-18 18:49:02 UTC

Created attachment 127382 [details]
another /sys/class/drm/card0/error

Crash when using chromium, this time without any connected monitors/docking stations.

This is _not_ with intel_iommu=igfx_off, I thought I should upload one more crash with less hw connected.

Comment 6 Mads 2016-10-18 19:09:59 UTC

> It hasn't crashed yet when I booted with intel_iommu=igfx_off.

By the way, is this a fix or workaround? :) Does this have any caveats other than not being able to use the graphics controller inside of a VM?

Comment 7 Mads 2016-10-18 20:01:13 UTC

I guess I also must mention that your tip with intel_iommu=igfx_off also solved another bug I reported: https://bugs.freedesktop.org/show_bug.cgi?id=97211

Extremely nice!

Now the only issue I have left with my XPS 15 is https://bugs.freedesktop.org/show_bug.cgi?id=93578

Comment 8 yann 2016-10-19 09:01:24 UTC

intel_iommu=igfx_off is not fix of this defect, it is just avoiding it

btw dup of bug 89360?

Comment 9 Mads 2016-10-19 14:14:13 UTC

I saw the same kind of messages in dmesg, so I would guess this is a duplicate of bug 89360 yes...

Comment 10 yann 2016-10-19 15:42:47 UTC


*** This bug has been marked as a duplicate of bug 89360 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.