Bug 111920 - NON-GuC constant i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Summary: NON-GuC constant i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-08 00:26 UTC by Kenneth C
Modified: 2019-11-06 09:50 UTC (History)
4 users (show)

See Also:
i915 platform: CFL
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (4.81 KB, text/plain)
2019-10-08 00:26 UTC, Kenneth C
no flags Details
/sys/class/drm/card0/error (4.75 KB, text/plain)
2019-10-08 00:30 UTC, Kenneth C
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kenneth C 2019-10-08 00:26:16 UTC
Created attachment 145678 [details]
/sys/class/drm/card0/error

In bug 111085 (https://bugs.freedesktop.org/show_bug.cgi?id=111805) lakshminarayana.vudum@intel.com asked me to try running without the GuC enabled. 

I did that, and it's still hanging up. This is the DRM-tip right before commit c1132367 as that commit prevents my box from going into S0/s2idle suspend (see bug https://bugs.freedesktop.org/show_bug.cgi?id=111909).

Here's the worst part- if I can wrench control to a VT, I can usually "sudo systemctl hibernate" to force a power-cycle that unwedges the i915- but THIS time, right after the resume:

----
Oct  7 17:03:36 hp-x360n systemd-sleep[16719]: System resumed.
Oct  7 17:03:36 hp-x360n systemd[1]: Stopping TLP suspend/resume...
Oct  7 17:03:36 hp-x360n systemd[1]: Stopped TLP suspend/resume.
Oct  7 17:04:40 hp-x360n kernel: [20868.899672] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 17:05:16 hp-x360n kernel: [20904.931581] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 17:07:04 hp-x360n kernel: [21012.899361] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
----

<facepalm>

The latest i915 changes on Sept 26th are really killing my workflow, as I can never tell when my laptop will just decide to hang up (and I can be doing such mundane tasks as viewing a webpage or building some software in a konsole- I don't game and this time I wasn't even watching video).

Is there ANYTHING I can do to help you guys diagnose, mitigate, or warn me when it's likely to occur? I've posted some 7 .../card0/error files and apparently there's not enough info in these to help figure out what's going on. Are there any debug flags (that won't ruin daily-driver performance) that I can try so when this happens again there's more info?

(Is there any way to just hack out a merge from a GIT tree?)
Comment 1 Kenneth C 2019-10-08 00:30:18 UTC
Created attachment 145679 [details]
/sys/class/drm/card0/error

This is another non-GuC hang, from yesterday. (It is not from drm-tip, however)
Comment 2 Kenneth C 2019-10-08 00:35:54 UTC
This is the dmesg from today's hang:

I did notice this, which I hadn't seen before:

Asynchronous wait on fence i915:kwin_x11[3017]:d88a4 timed out (hint:intel_atomic_commit_ready+0x0/0x4c [i915])

----
Oct  7 16:54:54 hp-x360n kernel: [20328.929256] i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Oct  7 16:54:54 hp-x360n kernel: [20328.929260] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Oct  7 16:54:54 hp-x360n kernel: [20328.929261] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Oct  7 16:54:54 hp-x360n kernel: [20328.929262] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Oct  7 16:54:54 hp-x360n kernel: [20328.929263] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Oct  7 16:54:54 hp-x360n kernel: [20328.929265] GPU crash dump saved to /sys/class/drm/card0/error
Oct  7 16:54:54 hp-x360n kernel: [20328.930273] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:54:54 hp-x360n kernel: [20328.931019] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Oct  7 16:54:54 hp-x360n kernel: [20328.934266] i915 0000:00:02.0: Resetting chip for hang on rcs0
Oct  7 16:54:54 hp-x360n kernel: [20328.936037] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Oct  7 16:54:54 hp-x360n kernel: [20328.936783] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Oct  7 16:55:02 hp-x360n kernel: [20336.929187] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:10 hp-x360n kernel: [20344.929132] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:12 hp-x360n kernel: [20346.913128] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:14 hp-x360n kernel: [20348.897114] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:16 hp-x360n kernel: [20350.881102] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:18 hp-x360n kernel: [20352.929087] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:20 hp-x360n kernel: [20354.913078] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:22 hp-x360n kernel: [20356.897068] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:24 hp-x360n kernel: [20358.881055] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:26 hp-x360n kernel: [20360.929066] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:28 hp-x360n kernel: [20362.913023] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:30 hp-x360n kernel: [20364.897054] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:32 hp-x360n kernel: [20366.881001] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:34 hp-x360n kernel: [20368.928989] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:35 hp-x360n kernel: [20370.527934] mce: CPU2: Package temperature/speed normal
Oct  7 16:55:35 hp-x360n kernel: [20370.527935] mce: CPU6: Package temperature/speed normal
Oct  7 16:55:35 hp-x360n kernel: [20370.528006] mce: CPU1: Package temperature/speed normal
Oct  7 16:55:35 hp-x360n kernel: [20370.528007] mce: CPU0: Package temperature/speed normal
Oct  7 16:55:35 hp-x360n kernel: [20370.528007] mce: CPU4: Package temperature/speed normal
Oct  7 16:55:35 hp-x360n kernel: [20370.528008] mce: CPU5: Package temperature/speed normal
Oct  7 16:55:35 hp-x360n kernel: [20370.528009] mce: CPU3: Package temperature/speed normal
Oct  7 16:55:35 hp-x360n kernel: [20370.528010] mce: CPU7: Package temperature/speed normal
Oct  7 16:55:36 hp-x360n kernel: [20370.913036] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:38 hp-x360n kernel: [20372.897003] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:40 hp-x360n kernel: [20374.880977] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:42 hp-x360n kernel: [20376.928995] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:44 hp-x360n kernel: [20378.912956] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:46 hp-x360n kernel: [20380.896965] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:48 hp-x360n kernel: [20382.880929] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:50 hp-x360n kernel: [20384.928929] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:52 hp-x360n kernel: [20386.912919] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:54 hp-x360n kernel: [20388.896907] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:56 hp-x360n kernel: [20390.880898] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:55:58 hp-x360n kernel: [20392.929904] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:56:00 hp-x360n kernel: [20394.912873] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:56:02 hp-x360n kernel: [20396.896862] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:56:04 hp-x360n kernel: [20398.880874] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:56:06 hp-x360n kernel: [20400.928837] i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
Oct  7 16:56:06 hp-x360n kernel: [20400.929027] i915 0000:00:02.0: Resetting chip for hang on rcs0
Oct  7 16:56:08 hp-x360n kernel: [20402.912858] i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
Oct  7 16:56:08 hp-x360n kernel: [20402.913079] i915 0000:00:02.0: Resetting chip for hang on rcs0
Oct  7 16:56:16 hp-x360n kernel: [20410.912788] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:56:18 hp-x360n kernel: [20412.896789] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Oct  7 16:56:19 hp-x360n kernel: [20414.049730] Asynchronous wait on fence i915:kwin_x11[3017]:d88a4 timed out (hint:intel_atomic_commit_ready+0x0/0x4c [i915])
Oct  7 16:56:20 hp-x360n kernel: [20414.880759] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
----
Comment 3 Lakshmi 2019-10-08 14:52:11 UTC
(In reply to Kenneth C from comment #0)
> Created attachment 145678 [details]
> /sys/class/drm/card0/error
> 
> In bug 111085 (https://bugs.freedesktop.org/show_bug.cgi?id=111805)
> lakshminarayana.vudum@intel.com asked me to try running without the GuC
> enabled. 
> 
> I did that, and it's still hanging up. This is the DRM-tip right before
> commit c1132367 as that commit prevents my box from going into S0/s2idle
> suspend (see bug https://bugs.freedesktop.org/show_bug.cgi?id=111909).
> 
> Here's the worst part- if I can wrench control to a VT, I can usually "sudo
> systemctl hibernate" to force a power-cycle that unwedges the i915- but THIS
> time, right after the resume:
> 
> ----
> Oct  7 17:03:36 hp-x360n systemd-sleep[16719]: System resumed.
> Oct  7 17:03:36 hp-x360n systemd[1]: Stopping TLP suspend/resume...
> Oct  7 17:03:36 hp-x360n systemd[1]: Stopped TLP suspend/resume.
> Oct  7 17:04:40 hp-x360n kernel: [20868.899672] i915 0000:00:02.0: Resetting
> rcs0 for hang on rcs0
> Oct  7 17:05:16 hp-x360n kernel: [20904.931581] i915 0000:00:02.0: Resetting
> rcs0 for hang on rcs0
> Oct  7 17:07:04 hp-x360n kernel: [21012.899361] i915 0000:00:02.0: Resetting
> rcs0 for hang on rcs0
> ----
> 
> <facepalm>
> 
> The latest i915 changes on Sept 26th are really killing my workflow, as I
> can never tell when my laptop will just decide to hang up (and I can be
> doing such mundane tasks as viewing a webpage or building some software in a
> konsole- I don't game and this time I wasn't even watching video).
> 
> Is there ANYTHING I can do to help you guys diagnose, mitigate, or warn me
> when it's likely to occur? I've posted some 7 .../card0/error files and
> apparently there's not enough info in these to help figure out what's going
> on. Are there any debug flags (that won't ruin daily-driver performance)
> that I can try so when this happens again there's more info?
> 
> (Is there any way to just hack out a merge from a GIT tree?)

Mika, any suggestions here?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.