110982 – [SKL]GPU HANG: ecode 9:0:0x00000000, hang on rcs0

Bug 110982 - [SKL]GPU HANG: ecode 9:0:0x00000000, hang on rcs0

Summary: [SKL]GPU HANG: ecode 9:0:0x00000000, hang on rcs0

Status:	CLOSED NOTOURBUG

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	Triaged
Keywords:

Depends on:
Blocks:

Reported:	2019-06-24 16:56 UTC by Rik
Modified:	2019-08-29 07:41 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	SKL
i915 features:	GPU hang

Attachments
GPU crash dump (8.10 KB, application/x-bzip) 2019-06-24 16:56 UTC, Rik	no flags	Details
i915 crash dump (14.34 KB, application/gzip) 2019-07-17 16:30 UTC, Rik	no flags	Details
View All

Description Rik 2019-06-24 16:56:46 UTC

Created attachment 144626 [details]
GPU crash dump

dmesg output:
[ 4337.121156] i915 0000:00:02.0: GPU HANG: ecode 9:0:0x00000000, hang on rcs0
[....]
[ 4337.121168] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[ 4337.122893] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[ 4337.122918] i915 0000:00:02.0: Resetting chip for hang on rcs0
[ 4337.124650] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[ 4337.126370] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout

Kernel version: 5.1.14

OS: Arch Linux

Mobo: ASRock Z170M Extreme4

Comment 1 Chris Wilson 2019-06-25 15:45:07 UTC

rcs:
  START: 0x0000d000
  HEAD:  0x00e01470 [0x00000000]
    head = 0x00001470, wraps = 7
  TAIL:  0x00001470 [0x00000000, 0x00000000]
  CTL:   0x00003001
    len=16384, enabled
  MODE:  0x00000000
  HWS:   0xffffe000
  ACTHD: 0x00000000 00e01470
    at ring: 0x00000000
  IPEIR: 0x00000000
  IPEHR: 0x02800000

  ELSP[0]:  pid 688, ban score 0, seqno        3:0002b24b!, prio 2, emitted -1119ms, start 0000d000, head 000013d0, tail 00001470
  ELSP[1]:  pid 611, ban score 0, seqno        2:0002b24c, prio 0, emitted -1114ms, start 00009000, head 00003b60, tail 00003bf8

The CS reached the end of the ring, but the GPU failed to switch contexts.

Then the GPU failed to respond to reset requests, further indicating it had suffered a terminal shock. Immediate suspicion would be on powergating, for the similarity with say bug 110450.

Comment 2 Chris Wilson 2019-06-25 16:53:25 UTC

Or it could just be a userspace fault that leaves the GPU barely functional up to the point it fails completely.

Comment 3 Rik 2019-07-17 16:28:02 UTC

A new one:
[40476.867483] i915 0000:00:02.0: GPU HANG: ecode 9:1:0xfffffffe, in ffmpeg [30494], hang on rcs0, vcs0
[40476.867484] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[40476.867484] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[40476.867485] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[40476.867485] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[40476.867485] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[40476.868493] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0, vcs0
[40476.868560] i915 0000:00:02.0: Resetting vcs0 for hang on rcs0, vcs0

ffmpeg command:
ffmpeg -vaapi_device /dev/dri/renderD128 -i 'http://****' -c:v hevc_vaapi -b:v 840K -vf 'format=nv12,hwupload' -c:a copy -y <output_file.mkv>

vaapi driver: iHD_drv_video.so

System:
Machine:   Device: desktop Mobo: ASRock model: Z170M Extreme4 serial: M80-68001700529
           UEFI [Legacy]: American Megatrends v: P7.20 date: 12/13/2016
CPU:       Dual core Intel Core i3-6100 (-MT-MCP-) cache: 3072 KB
           clock speeds: max: 3700 MHz 1: 800 MHz 2: 3223 MHz 3: 2982 MHz 4: 3049 MHz
Graphics:  Card: Intel HD Graphics 530
           Display Server: X.Org 1.20.5 drivers: intel (unloaded: modesetting) Resolution: 1920x1080@60.00hz
           OpenGL: renderer: Mesa DRI Intel HD Graphics 530 (Skylake GT2) version: 4.5 Mesa 19.1.2

Comment 4 Rik 2019-07-17 16:30:56 UTC

Created attachment 144812 [details]
i915 crash dump

crash dump attached.

Comment 5 Chris Wilson 2019-07-18 11:05:17 UTC

Ah, libva, userspace hang leaving the GPU barely functional is no longer a surprise.

Comment 6 Lakshmi 2019-08-28 10:25:33 UTC

Closing this issue as NOTOURBUG.
Rik, can you please create an issue for libva? ttps://github.com/intel/libva/issues

Comment 7 Rik 2019-08-29 07:41:55 UTC

(In reply to Lakshmi from comment #6)
> Closing this issue as NOTOURBUG.
> Rik, can you please create an issue for libva?
> ttps://github.com/intel/libva/issues

No,
but thanks for your support.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.