Bug 111978 - GPU Hang and Failed to reset chip.
Summary: GPU Hang and Failed to reset chip.
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: not set critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-11 10:16 UTC by Yoshinori Gento
Modified: 2019-10-16 01:33 UTC (History)
1 user (show)

See Also:
i915 platform: SKL
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (22.89 KB, application/x-bzip)
2019-10-11 10:16 UTC, Yoshinori Gento
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yoshinori Gento 2019-10-11 10:16:44 UTC
Created attachment 145706 [details]
/sys/class/drm/card0/error

[Environment]
CPU: SkyLake(core i5 6500TE)
Distribution: debian(customised)
Kernel: 4.19.57
Mesa: 18.3.6
libdrm: 2.4.89

[dmesg]
[10524.095632] [drm] GPU HANG: ecode 9:0:0x85dffffb, in xxxx [2606], reason: hang on rcs0, action: reset
[10524.096671] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[160603.626044] mod_lipc:lipc_write_lipc() destination is full, tid=3275, comm=CcmTimer
[160608.822588] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0, bcs0
[160608.825117] i915 0000:00:02.0: Resetting chip for hang on rcs0, bcs0
[160608.829851] [drm:gen8_reset_engines] *ERROR* bcs0: reset request timeout
[160608.940281] [drm:gen8_reset_engines] *ERROR* bcs0: reset request timeout
[160609.048277] [drm:gen8_reset_engines] *ERROR* bcs0: reset request timeout
[160609.153847] i915 0000:00:02.0: Failed to reset chip
[160609.158573] [drm:gen8_reset_engines] *ERROR* bcs0: reset request timeout
i965: Failed to submit batchbuffer: Input/output error

[description]
It occurred only once while about total 400days operation.(not continuous)
It seems GPU Hang occurred twice in this machine.
First was recovered by reset rcs.
But second cannot be recovered by reset chip.
I attached error file.
This has only first information.
I do not know whether second is related to first.
Comment 1 Chris Wilson 2019-10-11 20:07:17 UTC
The GPU dying as a result of an invalid sequence of instructions is not entirely impossible. It could be a result of a missed application of a workaround after the reset, my memory suggests that there was such a bug on Skylake circa v4.19. I would strongly suggest checking with a later kernel. However, that is only likely to fix the subsequent lockup...
Comment 2 Yoshinori Gento 2019-10-16 01:33:47 UTC
> The GPU dying as a result of an invalid sequence of instructions is not
> entirely impossible. It could be a result of a missed application of a
> workaround after the reset, my memory suggests that there was such a bug on
> Skylake circa v4.19. I would strongly suggest checking with a later kernel.
> However, that is only likely to fix the subsequent lockup...

I will try new kernel.
It seems to be rare bug.
I close ticket as fixed and if I saw same symptom, I will re-open.
Thank you.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.