Bug 108966 - drm i915 "GPU HANG" in kernel-4.19.7 on starting X
Summary: drm i915 "GPU HANG" in kernel-4.19.7 on starting X
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-07 12:04 UTC by William
Modified: 2019-07-26 06:10 UTC (History)
1 user (show)

See Also:
i915 platform: ILK
i915 features: GPU hang


Attachments
Xorg log (26.58 KB, text/plain)
2018-12-07 12:04 UTC, William
no flags Details
/sys/class/drm/c ard0/error (66.77 KB, text/plain)
2018-12-07 12:05 UTC, William
no flags Details

Description William 2018-12-07 12:04:10 UTC
Created attachment 142745 [details]
Xorg log

On starting X windows or very soon after the keyboard and mouse response
was slow and got this in /var/log/messages :-

Dec  7 05:52:23 localhost kernel: [drm] GPU HANG: ecode 5:0:0xeeccffff, in Xorg [784], reason: hang on rcs0, action: reset
Dec  7 05:52:23 localhost kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Dec  7 05:52:23 localhost kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Dec  7 05:52:23 localhost kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Dec  7 05:52:23 localhost kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Dec  7 05:52:23 localhost kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Dec  7 05:52:23 localhost kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0

The system was just about usable but very slow. I shut X down and started it
up again and it was still slow. It was all ok again after a full reboot.

I had installed kernel-4.19.7 the previous day and had booted and run with
kernel-4.19.7 four times during the day without any problems.

------------------------------------
Hardware:
  Asus P7H55-M SI with Intel Core i5 processor
  and Integrated Graphics [i915 driver].

Distro: Slackware-14.2, but with newer kernel.
------------------------------------

Will attach Xorg.0.0.log file and error file.
Comment 1 William 2018-12-07 12:05:01 UTC
Created attachment 142746 [details]
/sys/class/drm/c ard0/error
Comment 2 Chris Wilson 2018-12-07 13:03:54 UTC
Something overwrote a portion (one tile) of the batchbuffer. We only caught the damage, not the culprit.
Comment 3 Lakshmi 2019-07-19 08:40:55 UTC
William, Do you still have this issue?
Have you tried to verify with drmtip (https://cgit.freedesktop.org/drm-tip)?
Comment 4 William 2019-07-25 09:25:12 UTC
It does seem to be a very very rare happening. Since my initial bug report
I have had only one similar happening in May 2019 under kernel-5.0.15, again
soon after startup, but I did not notice it at the time, and so failed to
get a GPU crash dump, the /var/log/messages showed :-

May 15 18:46:35 localhost kernel: [drm] GPU HANG: ecode 5:0:0x005f5f5f, in Xorg [796], reason: hang on rcs0, action: reset
May 15 18:46:35 localhost kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
May 15 18:46:35 localhost kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
May 15 18:46:35 localhost kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
May 15 18:46:35 localhost kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
May 15 18:46:35 localhost kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
May 15 18:46:35 localhost kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0

It has not happened again since and is more of a curiosity than a problem
so if you wish to close the bug that is ok.
Comment 5 Lakshmi 2019-07-26 06:10:37 UTC
(In reply to William from comment #4)
> It does seem to be a very very rare happening. Since my initial bug report
> I have had only one similar happening in May 2019 under kernel-5.0.15, again
> soon after startup, but I did not notice it at the time, and so failed to
> get a GPU crash dump, the /var/log/messages showed :-
> 
> May 15 18:46:35 localhost kernel: [drm] GPU HANG: ecode 5:0:0x005f5f5f, in
> Xorg [796], reason: hang on rcs0, action: reset
> May 15 18:46:35 localhost kernel: [drm] GPU hangs can indicate a bug
> anywhere in the entire gfx stack, including userspace.
> May 15 18:46:35 localhost kernel: [drm] Please file a _new_ bug report on
> bugs.freedesktop.org against DRI -> DRM/Intel
> May 15 18:46:35 localhost kernel: [drm] drm/i915 developers can then
> reassign to the right component if it's not a kernel issue.
> May 15 18:46:35 localhost kernel: [drm] The gpu crash dump is required to
> analyze gpu hangs, so please always attach it.
> May 15 18:46:35 localhost kernel: [drm] GPU crash dump saved to
> /sys/class/drm/card0/error
> May 15 18:46:35 localhost kernel: i915 0000:00:02.0: Resetting chip for hang
> on rcs0
> 
> It has not happened again since and is more of a curiosity than a problem
> so if you wish to close the bug that is ok.

Thanks for feedback. I will close this bug as WORKSFORME.
In general, I would recommend to update the kernel to latest (drm-tip) when you see an issue. If the issue persists with drm-tip, please create a new issue if you don't find a similar bug which is open.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.