Bug 106676 - [SNB] [drm] GPU HANG: ecode 6:0:0x87e8effd, in Xorg [1364], reason: Hang on rcs0, action: reset
Summary: [SNB] [drm] GPU HANG: ecode 6:0:0x87e8effd, in Xorg [1364], reason: Hang on r...
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-28 06:42 UTC by ValdikSS
Modified: 2018-10-23 11:56 UTC (History)
2 users (show)

See Also:
i915 platform: SNB
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (14.49 KB, text/plain)
2018-05-28 06:42 UTC, ValdikSS
no flags Details
gpu_error_4 (38.95 KB, text/plain)
2018-07-06 13:25 UTC, ValdikSS
no flags Details
GPU error 5 (29.69 KB, text/plain)
2018-09-18 22:39 UTC, ValdikSS
no flags Details
Video which crashes the GPU (3.29 MB, video/x-matroska)
2018-09-18 22:40 UTC, ValdikSS
no flags Details
GPU crash (633.33 KB, application/zip)
2018-09-21 15:51 UTC, ValdikSS
no flags Details

Description ValdikSS 2018-05-28 06:42:37 UTC
Created attachment 139807 [details]
/sys/class/drm/card0/error

I'm using Fedora 28 with kernel 4.16.9-300.fc28.x86_64
mesa-dri-drivers 18.0.2-1.fc28
xorg-x11-drv-intel 2.99.917-32.20171025.fc28

This happened when I tried to open a website in Firefox while playing high-bitrate H.264 video using mpv (vaapi-copy).

[мая27 04:50] [drm] GPU HANG: ecode 6:0:0x87e8effd, in Xorg [1364], reason: Hang on rcs0, action: reset
[  +0,000002] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  +0,000000] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  +0,000001] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  +0,000000] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  +0,000001] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  +0,000047] i915 0000:00:02.0: Resetting chip after gpu hang
[  +3,071473] asynchronous wait on fence i915:[global]:491aed timed out
[  +4,928033] i915 0000:00:02.0: Resetting chip after gpu hang
[  +8,960256] i915 0000:00:02.0: Resetting chip after gpu hang
[мая27 04:51] i915 0000:00:02.0: Resetting chip after gpu hang
[  +9,023965] i915 0000:00:02.0: Resetting chip after gpu hang
Comment 1 Jani Saarinen 2018-05-28 11:16:27 UTC
Can you try with latest drm-tip: https://cgit.freedesktop.org/drm-tip and send dmesg with drm.debug=0x1e log_buf_len=4M that is now on Linux version 4.17.0-rc6.
Comment 2 ValdikSS 2018-05-28 20:13:25 UTC
I'll try it, but it's rather hard to reproduce. It happens maybe once in several days of playing video.
Comment 3 Chris Wilson 2018-05-29 08:03:52 UTC
It's a libva related bug. Once in a while the gpu gets itself into a state that it stops writing to memory (SDM specifically, but may not be limited to).
Comment 4 ValdikSS 2018-05-29 08:05:18 UTC
(In reply to Chris Wilson from comment #3)
> It's a libva related bug. Once in a while the gpu gets itself into a state
> that it stops writing to memory (SDM specifically, but may not be limited
> to).

Just to note, this also happened when the video was on pause.
Comment 5 Jani Saarinen 2018-06-25 10:00:57 UTC
Reported,m would you be able to test drm-tip? Now 4.18-rc2.
Comment 6 Jani Saarinen 2018-06-26 06:06:18 UTC
(In reply to Jani Saarinen from comment #5)
> Reported,m would you be able to test drm-tip? Now 4.18-rc2.
Was meant reporter, would you be able to test drm-tip? Now 4.18-rc2.
Comment 7 ValdikSS 2018-06-30 10:44:10 UTC
(In reply to Jani Saarinen from comment #6)
> Was meant reporter, would you be able to test drm-tip? Now 4.18-rc2.

Right now I was unable to trigger the bug. I've been using hardware video decoding on a stock Fedora kernel 4.17.2-200.fc28.x86_64 for several days but no problems so far.
The problem occurs only under specific conditions which I don't know how to reproduce on purpose.
Comment 8 ValdikSS 2018-07-06 13:24:29 UTC
It happens again, I'll try to compile and trigger the bug with drm-tip.
Comment 9 ValdikSS 2018-07-06 13:25:03 UTC
Created attachment 140482 [details]
gpu_error_4
Comment 10 ValdikSS 2018-07-06 16:46:38 UTC
With drm-tip 95944426a9ffda186843c78f2f925494e1bc53c5 I experience complete system lockup in under than 1 hour after system boot. The system does not respond to sysrq and does not repair in 5 minutes.

All I do is playing H.264 50 fps 23 Mbit/s video using mpv vaapi-copy.

It already happened 3 times. Because the system locks up, I can't provide you debug log and I doubt that netconsole will print out anything.

I can't corroborate that the video subsystem is the cause of this lockup.
Comment 11 ValdikSS 2018-07-06 16:51:20 UTC
When the lockup occurs, audio output repeats last second of audio from a video file.
Comment 12 ValdikSS 2018-07-17 10:13:37 UTC
This is a kernel regression which probably is not because of GPU. After I updated to released kernel 4.17.5 (not from drm-next, just a usual kernel), I have the same complete system lockups as I had with drm-next in comment 10.
Comment 13 ValdikSS 2018-07-20 13:56:26 UTC
The problem does not occur with drm-tip commit 4aa6797dfafaf527949bf55d3c8513c6902dfec2 kernel (with additional patch 5ea45736209c8efd04ed793f81084925097f84ed from kernel 4.17.7 to fix lockup bug mentioned in comment 10, unrelated to GPU).

I've been running it for 2 days, the video is constantly playing with vaapi and vaapi-copy hardware acceleration methods. No lockups occur.

It is possible to backport patches in drm-tip to the mainline kernel?
Comment 14 Jani Saarinen 2018-08-13 09:39:07 UTC
Our drm-tip is pre-upstream tree that goes to mainline "automatically". 
Jani, how do you see this?
Comment 15 Jani Saarinen 2018-08-14 06:16:02 UTC
Please report if latest drm-tip works as it is now Linux version 4.18.0.
Comment 16 Lakshmi 2018-08-30 07:09:47 UTC
Reporter, were you able to see this issue with latest drmtip? If not, I can close this bug.
Comment 17 ValdikSS 2018-09-05 16:02:12 UTC
(In reply to Lakshmi from comment #16)
> Reporter, were you able to see this issue with latest drmtip? If not, I can
> close this bug.

With 4.17.19-200.fc28.x86_64 Fedora kernel and fully updated system I no longer get GPU hangs. This bug is probably resolved.
Comment 18 Lakshmi 2018-09-05 17:16:34 UTC
Closing the bug.
Comment 19 ValdikSS 2018-09-18 22:39:28 UTC
Created attachment 141647 [details]
GPU error 5

I think I found a video which instantly crashes the GPU.
Comment 20 ValdikSS 2018-09-18 22:40:33 UTC
Created attachment 141648 [details]
Video which crashes the GPU

Here's the video.
Tested on 4.18.7-200.fc28.x86_64.
Comment 21 Lakshmi 2018-09-21 09:51:58 UTC
Reporter, Please try to reproduce the issue using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot.
Comment 22 ValdikSS 2018-09-21 15:51:36 UTC
Created attachment 141676 [details]
GPU crash

(In reply to Lakshmi from comment #21)
> Reporter, Please try to reproduce the issue using drm-tip
> (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e
> log_buf_len=4M, and if the problem persists attach the full dmesg from boot.

Done.
Comment 23 Lakshmi 2018-10-23 11:56:32 UTC
Closing this bug as it is not a kernel bug but userspace.
Please report the bug to Vaapi team.
https://github.com/intel/intel-vaapi-driver/issues/new


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.