Bug 108695 - [drm] GPU HANG: ecode 8:0:0x37974124, in spotify [10995], reason: hang on rcs0, action: reset
Summary: [drm] GPU HANG: ecode 8:0:0x37974124, in spotify [10995], reason: hang on rcs...
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: high normal
Assignee: Joonas Lahtinen
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-08 13:57 UTC by Eric Blau
Modified: 2019-03-20 14:31 UTC (History)
1 user (show)

See Also:
i915 platform: BDW
i915 features: GPU hang


Attachments
Crash dump from the reported hang (3.53 KB, application/x-bzip)
2018-11-08 13:57 UTC, Eric Blau
no flags Details

Description Eric Blau 2018-11-08 13:57:37 UTC
Created attachment 142409 [details]
Crash dump from the reported hang

My X session froze several times and eventually recovered when resuming from hibernate. Here is the message displayed (crash dump attached):

Nov 08 08:52:25 eric-macbookpro kernel: [drm] GPU HANG: ecode 8:0:0x37974124, in spotify [10995], reason: hang on rcs0, action: reset
Nov 08 08:52:25 eric-macbookpro kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Nov 08 08:52:25 eric-macbookpro kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Nov 08 08:52:25 eric-macbookpro kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Nov 08 08:52:25 eric-macbookpro kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Nov 08 08:52:25 eric-macbookpro kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Nov 08 08:52:25 eric-macbookpro kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Nov 08 08:52:35 eric-macbookpro kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0


Other details:

Linux Distribution: Arch Linux
System Architecture: x86_64
Kernel Version: 4.18.16-arch1-1-ARCH
Display Connector: 2x eDP
Comment 1 Chris Wilson 2018-11-08 14:11:26 UTC
  ELSP[0]:  pid 10995, ban score 0, seqno       18:0034da3d, prio 0, emitted 1431653982ms, start 010d2000, head 00000000, tail 00000070
  ELSP[1]:  pid 2718, ban score 0, seqno        1:0034da3e, prio 1024, emitted 1431653982ms, start 02eea000, head 00000bf8, tail 00000c60

  START: 0x02eea000
  HEAD:  0x00400bf0 [0x00000000]
    head = 0x00000bf0, wraps = 2
  TAIL:  0x00000bf0 [0x00000048, 0x00000070]
  CTL:   0x00003000

  seqno: 0x0034da3c
  last_seqno: 0x0034da3e

=> GPU switched contexts before completing the first, but failed to start the second context -- it didn't even see the TAIL update.
Comment 2 Lakshmi 2018-11-09 12:30:26 UTC
Eric, how often you can reproduce this issue? Any particular pattern causing this? 
Have you tried to verify this issue with latest drm-tip?(https://cgit.freedesktop.org/drm-tip)
Comment 3 Eric Blau 2018-11-09 12:38:19 UTC
I can reproduce this issue fairly often, maybe once in every four attempts or so when I'm using Chromium as my web browser. With Firefox it never seems to happen. I'm not sure if that is because Chromium is using hardware acceleration or other features that Firefox does not, but it definitely happens with more frequency with Chromium.

The issue occurs when I hibernate and resume from hibernate. It's a longstanding issue that keeps coming up for me. See bug 102658 for example, which was closed without a fix.
Comment 4 Francesco Balestrieri 2019-02-25 09:16:30 UTC
Currently we are a bit limited in the ability to reproduce this issue. It would help if you could run the latest drm-tip on your system and report the logs.
Comment 5 Eric Blau 2019-02-25 14:23:54 UTC
Sure, I will try to reproduce the problem with drm-tip.

Since I was hitting this problem so often, I tried a few workarounds. Disabling hardware acceleration in Chromium has made the problem go away, although obviously I would prefer using hardware acceleration.

I'll build and run drm-tip, turn on Chromium hardware acceleration again and report back. Thanks.
Comment 6 Chris Wilson 2019-03-15 23:20:01 UTC
Any news with recent kernels, Chromium rendering and hibernation?
Comment 7 Eric Blau 2019-03-16 12:40:46 UTC
I have not been able to attempt to reproduce the problem with drm-tip yet. The ZFS kernel module I require could not build on Linux 5.0+ until the March 4th release of version 0.7.13 so my build was unsuccessful.

In addition, I find the power consumption on my laptop is lower if I use Chromium in software rendering mode vs. using hardware acceleration so I've been tempted to leave hardware acceleration disabled.

I'll try the drm-tip build again this week now that the ZFS build issues are fixed and see how it goes.
Comment 8 Eric Blau 2019-03-20 14:02:26 UTC
I retried with drm-tip at d33bf3f6a140 now that I was able to get ZFS to build. With Chromium hardware acceleration enabled, I was able to do a bunch of hibernate / resume sequences without hitting any hangs. Closing this one out. Thanks.
Comment 9 Francesco Balestrieri 2019-03-20 14:31:51 UTC
Thanks for testing!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.