106002 – GPU HANG: ecode 9:2:0xa8dfbffd

Bug 106002 - GPU HANG: ecode 9:2:0xa8dfbffd

Summary: GPU HANG: ecode 9:2:0xa8dfbffd

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-04-12 12:02 UTC by Jiri Slaby
Modified:	2018-09-10 13:14 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:	SKL
i915 features:	GPU hang

Attachments
/sys/class/drm/card0/error (28.73 KB, text/plain) 2018-04-12 12:02 UTC, Jiri Slaby	no flags	Details
dmesg (688.67 KB, text/plain) 2018-04-12 12:02 UTC, Jiri Slaby	no flags	Details
View All

Description Jiri Slaby 2018-04-12 12:02:06 UTC

Created attachment 138782 [details]
/sys/class/drm/card0/error

[15146.471169] [drm] GPU HANG: ecode 9:2:0xa8dfbffd, in mpv/vo [18693], reason: Hang on vcs0, action: reset
[15146.471173] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[15146.471176] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[15146.471178] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[15146.471180] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[15146.471182] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[15146.471566] i915 0000:00:02.0: Resetting vcs0 after gpu hang
[15159.422798] i915 0000:00:02.0: Resetting vcs0 after gpu hang
[15168.414948] i915 0000:00:02.0: Resetting vcs0 after gpu hang

Comment 1 Jiri Slaby 2018-04-12 12:02:57 UTC

Created attachment 138783 [details]
dmesg

Comment 2 Chris Wilson 2018-04-12 12:40:30 UTC

Sigh, libva left the bugs.fd.o collective.

Comment 3 Jani Saarinen 2018-04-25 11:56:24 UTC

Chris, is this valid issue for i915?

Comment 4 Lionel Landwerlin 2018-05-04 13:41:25 UTC

It's unfortunate that the batchbuffers are not flagged by userspace so that we could look at what caused the hang.

The instruction on which the command streamer hanged (MI_FLUSH_DW) doesn't appear to be something that would be emitted by intel-vaapi-driver (difference in the set bits).
It looks more like something from gen6_bsd_ring_flush().
Though that's really confusing because I wouldn't expect anybody to run with legacy submission on SKL & 4.16.

Comment 5 Jani Saarinen 2018-05-17 09:58:08 UTC

Jiri, is this still valid on latest drm-tip?

Comment 6 Lakshmi 2018-09-10 12:26:01 UTC

Jiri, ping?

Comment 7 Jiri Slaby 2018-09-10 12:48:12 UTC

I don't think I saw it recently.

Comment 8 Lakshmi 2018-09-10 13:14:36 UTC

I assume this issue has been fixed.
Closing now. Feel free to reopen if you still have the issue with latest drm-tip (https://cgit.freedesktop.org/drm-tip).

If the problem persists attach the full dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.