Bug 110653 - Repeat resetting chip every 8seconds after GPU HANG: ecode 9:0:0x85dffffb
Summary: Repeat resetting chip every 8seconds after GPU HANG: ecode 9:0:0x85dffffb
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-09 08:14 UTC by Yoshinori Gento
Modified: 2019-07-29 00:25 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features: GPU hang


Attachments
Part of kernel log. (5.88 KB, text/plain)
2019-05-09 08:20 UTC, Yoshinori Gento
no flags Details
/sys/class/drm/card0/error (56.33 KB, text/plain)
2019-05-30 02:53 UTC, Yoshinori Gento
no flags Details

Description Yoshinori Gento 2019-05-09 08:14:28 UTC

    
Comment 1 Yoshinori Gento 2019-05-09 08:20:03 UTC
Created attachment 144205 [details]
Part of kernel log.
Comment 2 Chris Wilson 2019-05-09 08:31:51 UTC
Please disable guc submission and all unsafe cmdline options. If it occurs again, please attach the /sys/class/drm/card0/error.
Comment 3 Yoshinori Gento 2019-05-09 08:53:39 UTC
Driver repeated resetting chip, after GPU Hang.
Then drawing was also repeating stop and re-motion.
This problem occurred only one time in running over 200 days totally.
At that time, I did not operate the machine, drawing only.
So, I could not reproduce yet.

[Environment]
CPU: SkyLake(core i5 6500TE)
Distribution: debian(customised)
Kernel: 4.14.98
Mesa: 18.3.3
libdrm: 2.4.89

> Chris Wilson
Sorry for my unfinished report.
I will try it.
Comment 4 Yoshinori Gento 2019-05-30 02:53:54 UTC
Created attachment 144379 [details]
/sys/class/drm/card0/error

I'd disabled guc submission and unsafe cmdline options.
But, similar issue re-occurred yesterday.
I attached /sys/class/drm/card0/error.

Kernel message is following.
----
[38893.560462] [drm] GPU HANG: ecode 9:0:0x85dffffb, in mfd_draw [2656], reason: Hang on rcs0, action: reset
[38893.560470] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[38901.583889] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[38909.579903] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[38917.583892] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[38925.579889] i915 0000:00:02.0: Resetting rcs0 after gpu hang
:
:
----
Comment 5 Lakshmi 2019-07-03 09:29:08 UTC
(In reply to Yoshinori Gento from comment #4)
> Created attachment 144379 [details]
> /sys/class/drm/card0/error
> 
> I'd disabled guc submission and unsafe cmdline options.
> But, similar issue re-occurred yesterday.
> I attached /sys/class/drm/card0/error.
> 
> Kernel message is following.
> ----
> [38893.560462] [drm] GPU HANG: ecode 9:0:0x85dffffb, in mfd_draw [2656],
> reason: Hang on rcs0, action: reset
> [38893.560470] i915 0000:00:02.0: Resetting rcs0 after gpu hang
> [38901.583889] i915 0000:00:02.0: Resetting rcs0 after gpu hang
> [38909.579903] i915 0000:00:02.0: Resetting rcs0 after gpu hang
> [38917.583892] i915 0000:00:02.0: Resetting rcs0 after gpu hang
> [38925.579889] i915 0000:00:02.0: Resetting rcs0 after gpu hang
> :
> :
> ----

@Chris, How do you see this issue?
Comment 6 Chris Wilson 2019-07-03 09:33:24 UTC
It's hanging in userspace (mesa), but the kernel is so old it is using known buggy dmc firmware that alone is responsible for a variety of GPU hangs all by itself.
Comment 7 Yoshinori Gento 2019-07-03 10:25:28 UTC
(In reply to Chris Wilson from comment #6)
> It's hanging in userspace (mesa), but the kernel is so old it is using known
> buggy dmc firmware that alone is responsible for a variety of GPU hangs all
> by itself.

Thank you for your comment.
If you can, please tell me version that I should update kernel to.
4.14.131?, 4.19.56? or 5.1.15?
(I hope the version is LTS...)

I want to try the new one soon.
Comment 8 Lakshmi 2019-07-05 05:33:36 UTC
(In reply to Yoshinori Gento from comment #7)
> (In reply to Chris Wilson from comment #6)
> > It's hanging in userspace (mesa), but the kernel is so old it is using known
> > buggy dmc firmware that alone is responsible for a variety of GPU hangs all
> > by itself.
> 
> Thank you for your comment.
> If you can, please tell me version that I should update kernel to.
> 4.14.131?, 4.19.56? or 5.1.15?
> (I hope the version is LTS...)
> 
> I want to try the new one soon.

I recommend you to to verify the issue with drm-tip (https://cgit.freedesktop.org/drm-tip).
Comment 9 Yoshinori Gento 2019-07-05 06:57:35 UTC
OK.
I updated kernel to 4.19 and started continuous operation.
So, if I meet same or other issue, I will verify the issue with drm-tip.
Comment 10 Francesco Balestrieri 2019-07-23 09:05:47 UTC
Yoshinori Gento, were you able to verify this? Thanks!
Comment 11 Yoshinori Gento 2019-07-29 00:25:38 UTC
I updated Kernel to 4.19.57 and Mesa to 18.3.6 about 3 weeks ago.
After that, I don't see any problem so far.

I think the bug I met is fixed.
Thanks for all!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.