Bug 106333 - GPU HANG: ecode 9:1:0xfefffffe, in Xorg [600], reason: Hang on bcs0, action: reset
Summary: GPU HANG: ecode 9:1:0xfefffffe, in Xorg [600], reason: Hang on bcs0, action: ...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-01 12:36 UTC by udo
Modified: 2018-05-14 12:12 UTC (History)
3 users (show)

See Also:
i915 platform: SKL
i915 features: GPU hang


Attachments
GPU crash dump (20.40 KB, text/plain)
2018-05-01 12:36 UTC, udo
no flags Details

Description udo 2018-05-01 12:36:27 UTC
Created attachment 139245 [details]
GPU crash dump

With Linux-4.16.6 on a Lenovo X1 laptop (Skylake GT2 [HD Graphics 520]) I just hit the following:

[   24.801885] [drm] GPU HANG: ecode 9:1:0xfefffffe, in Xorg [600], reason: Hang on bcs0, action: reset
[   24.801886] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   24.801886] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   24.801886] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   24.801887] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   24.801887] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   24.801927] i915 0000:00:02.0: Resetting bcs0 after gpu hang
[   27.868430] asynchronous wait on fence i915:clflush:0 timed out
[   27.868450] asynchronous wait on fence i915:Xorg[600]/0:3 timed out
[   32.796660] i915 0000:00:02.0: Resetting chip after gpu hang
[   32.796739] i915 0000:00:02.0: GPU recovery failed
Comment 1 Chris Wilson 2018-05-01 13:24:50 UTC
bcs completed with HEAD==TAIL; ELSP is loaded with the next request. Looks like it never switched and started the new request. No other clues in the register state afaics. Relevance of it being an early request?
Comment 2 udo 2018-05-01 13:41:32 UTC
Seems to happen right the moment the X server starts and is easily reproducible here.
Comment 3 Jani Saarinen 2018-05-02 06:47:49 UTC
Are you able to try using latest drm-tip: https://cgit.freedesktop.org/drm-tip and send dmesg with drm.debug=0x1e log_buf_len=4M?

Does it help Chris?
Comment 4 Jani Saarinen 2018-05-03 06:41:30 UTC
Mika, Chris, any help here?
Comment 5 Jani Saarinen 2018-05-09 05:48:30 UTC
Udo, have you been able to test latest drm-tip?
Comment 6 udo 2018-05-10 15:18:37 UTC
I just tried drm-tip commit 8e1dab6e and with that kernel the hang no longer occurs.

With 4.16.8 it still reliably hangs.
Comment 7 Jani Saarinen 2018-05-11 04:59:55 UTC
OK, thanks. Chris, Mika, Jani, an idea what has changed?
Comment 8 Jani Saarinen 2018-05-14 06:18:21 UTC
Based on comment resolving as fixed in drm-tip.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.