Bug 98309 - [SKL] [drm] GPU HANG: ecode 9:0:0x85dffffb, in X [447], reason: Hang on render ring, action: reset
Summary: [SKL] [drm] GPU HANG: ecode 9:0:0x85dffffb, in X [447], reason: Hang on rende...
Status: RESOLVED DUPLICATE of bug 89360
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i915 (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Ian Romanick
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-18 14:11 UTC by Mads
Modified: 2016-10-19 15:42 UTC (History)
2 users (show)

See Also:
i915 platform: SKL
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (31.88 KB, text/plain)
2016-10-18 14:12 UTC, Mads
Details
another /sys/class/drm/card0/error (34.05 KB, text/plain)
2016-10-18 18:49 UTC, Mads
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mads 2016-10-18 14:11:27 UTC
With drm-intel-nightly

commit 5b633f423e27af3a7f30d303e243f5a2e82917ae
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 18 14:27:24 2016 +0100

    drm-intel-nightly: 2016y-10m-18d-13h-24m-11s UTC integration manifest

[   66.622400] [drm] GPU HANG: ecode 9:0:0x85dffffb, in X [447], reason: Hang on render ring, action: reset
[   66.622402] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   66.622403] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   66.622404] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   66.622405] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   66.622406] [drm] GPU crash dump saved to /sys/class/drm/card0/error

Happens all the time after starting X (after a minute or so). Virtualbox is installed so that taints the kernel I guess, but I post the /sys/class/drm/card0/error anyway.

This is on a Dell XPS 9550.

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
Comment 1 Mads 2016-10-18 14:12:17 UTC
Created attachment 127377 [details]
/sys/class/drm/card0/error
Comment 2 Chris Wilson 2016-10-18 14:52:44 UTC
Recent?

Can you please try with intel_iommu=igfx_off
Comment 3 yann 2016-10-18 16:18:14 UTC
Mads, please retry as Chris is advising:  with intel_iommu=igfx_off on your boot command line. 
There were also improvements pushed in Mesa that will benefit to your system, so please re-test with latest Mesa to see if this issue is still occurring.

In parallel, assigning to Mesa product (please let me know if I am mistaken with this GPU Hang).

Kernel: 4.9.0-rc1+
Platform: Skylake (pci id: 0x191b; PCI Revision: 0x06; PCI Subsystem: 1028:06e4)
Mesa: [Please confirm your mesa version]

From this error dump, hung is happening in render ring batch with active head at 0xc7ffe38c, with 0x7a000004 (PIPE_CONTROL) as IPEHR.

Batch extract (around 0xc7ffe38c):

0xc7ffe358:      0x7b000005: 3DPRIMITIVE: fail sequential
0xc7ffe35c:      0x00000006:    vertex count
0xc7ffe360:      0x00000004:    start vertex
0xc7ffe364:      0x00000000:    instance count
0xc7ffe368:      0x00000001:    start instance
0xc7ffe36c:      0x00000000:    index bias
0xc7ffe370:      0x00000000: MI_NOOP
Bad count in PIPE_CONTROL
0xc7ffe374:      0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush
0xc7ffe378:      0x00101001:    destination address
0xc7ffe37c:      0x00000000:    immediate dword low
0xc7ffe380:      0x00000000:    immediate dword high
Bad count in PIPE_CONTROL
0xc7ffe38c:      0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush
0xc7ffe390:      0x00000408:    destination address
0xc7ffe394:      0x00000000:    immediate dword low
0xc7ffe398:      0x00000000:    immediate dword high
0xc7ffe3a4:      0x78230000: 3D UNKNOWN: 3d_965 opcode = 0x7823
0xc7ffe3a8:      0x00007e00: MI_NOOP
Comment 4 Mads 2016-10-18 18:46:40 UTC
My mesa version:

commit a4622305e67dbb3ed224fa966160616688e43ee8
Author: Emil Velikov <emil.velikov@collabora.com>
Date:   Wed Oct 12 16:06:47 2016 +0100

    swr: automake: add ar_eventhandlerfile_h.template to the tarball
    
    Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

It hasn't crashed yet when I booted with intel_iommu=igfx_off.
Comment 5 Mads 2016-10-18 18:49:02 UTC
Created attachment 127382 [details]
another /sys/class/drm/card0/error

Crash when using chromium, this time without any connected monitors/docking stations.

This is _not_ with intel_iommu=igfx_off, I thought I should upload one more crash with less hw connected.
Comment 6 Mads 2016-10-18 19:09:59 UTC
> It hasn't crashed yet when I booted with intel_iommu=igfx_off.

By the way, is this a fix or workaround? :) Does this have any caveats other than not being able to use the graphics controller inside of a VM?
Comment 7 Mads 2016-10-18 20:01:13 UTC
I guess I also must mention that your tip with intel_iommu=igfx_off also solved another bug I reported: https://bugs.freedesktop.org/show_bug.cgi?id=97211

Extremely nice!

Now the only issue I have left with my XPS 15 is https://bugs.freedesktop.org/show_bug.cgi?id=93578
Comment 8 yann 2016-10-19 09:01:24 UTC
intel_iommu=igfx_off is not fix of this defect, it is just avoiding it

btw dup of bug 89360?
Comment 9 Mads 2016-10-19 14:14:13 UTC
I saw the same kind of messages in dmesg, so I would guess this is a duplicate of bug 89360 yes...
Comment 10 yann 2016-10-19 15:42:47 UTC

*** This bug has been marked as a duplicate of bug 89360 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.