Bug 104959 - GPU HANG: ecode 9:-1:0x00000000, reason: Kicking stuck wait on bcs0
Summary: GPU HANG: ecode 9:-1:0x00000000, reason: Kicking stuck wait on bcs0
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-06 02:40 UTC by Thiago Macieira
Modified: 2018-04-25 11:16 UTC (History)
1 user (show)

See Also:
i915 platform: SKL
i915 features: firmware/dmc


Attachments
card0/error (28.79 KB, text/plain)
2018-02-06 02:40 UTC, Thiago Macieira
no flags Details

Description Thiago Macieira 2018-02-06 02:40:30 UTC
Created attachment 137180 [details]
card0/error

Kernel: 4.15.0 (openSUSE's build 1)
DMC: 1.26
Platform: Dell XPS 13 9350, Intel(R) Core(TM) i7-6560U CPU
Dell BIOS: 1.5.1

dmesg:

[drm] GPU HANG: ecode 9:-1:0x00000000, reason: Kicking stuck wait on bcs0, action: continue
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm] GPU crash dump saved to /sys/class/drm/card0/error

card0/error file attached.

Similar to bug #101991, but not the same. That bug has as a defining characteristic the use of hibernation. This problem now happened without hibernating, though I did suspend twice and connected to a Dell dock and to an external monitor via USB-C.

Uptime was about 24 hours. That was my first boot using kernel 4.15. This problem was not observed when using 4.14 and earlier kernels.
Comment 1 Chris Wilson 2018-02-06 16:52:56 UTC
DERRMR: 0x2077efef

That's not the right DERRMR, someone (fw? dmc probably) has been fiddling.
Comment 2 Imre Deak 2018-02-07 15:11:26 UTC
(In reply to Chris Wilson from comment #1)
> DERRMR: 0x2077efef
> 
> That's not the right DERRMR, someone (fw? dmc probably) has been fiddling.

Yes, it saves/restores it's value across DC5/6 state transitions. It could be the corruption issue:

(In reply to Thiago Macieira from comment #0)
> ...
> DMC: 1.26

so could you try version 1.27 where that's fixed?
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/i915/skl_dmc_ver1_27.bin

and

commit 39ccc9852e2b46964c9c44eba52db57413ba6d27
Author: Anusha Srivatsa <anusha.srivatsa@intel.com>
Date:   Thu Nov 9 17:18:32 2017 -0800

    drm/i915/skl: DMC firmware for skylake v1.27


Yes, we need to start backporting these:/
Comment 3 Thiago Macieira 2018-02-07 18:09:29 UTC
(In reply to Imre Deak from comment #2)
> > DMC: 1.26
> 
> so could you try version 1.27 where that's fixed?

I asked Anusha and the reply I got is that 1.27 is loaded only by the kernel that has been tested with 1.27. The file is sitting there in /lib/firmware, but doesn't get loaded by the kernel.

I'm told that won't happen until kernel 4.16.

Also please note that 4.14 and earlier have been using 1.26 without this particular problem. I'm reporting it in case it's a new regression.

> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> plain/i915/skl_dmc_ver1_27.bin

$ ls -l /lib/firmware/i915/skl_dmc_ver1*
-rw-r--r-- 1 root root 8824 jan  4 07:06 /lib/firmware/i915/skl_dmc_ver1_23.bin
-rw-r--r-- 1 root root 8928 jan  4 07:06 /lib/firmware/i915/skl_dmc_ver1_26.bin
-rw-r--r-- 1 root root 8928 jan  4 07:06 /lib/firmware/i915/skl_dmc_ver1_27.bin
lrwxrwxrwx 1 root root   19 jan  4 07:06 /lib/firmware/i915/skl_dmc_ver1.bin -> skl_dmc_ver1_26.bin


> and
> 
> commit 39ccc9852e2b46964c9c44eba52db57413ba6d27
> Author: Anusha Srivatsa <anusha.srivatsa@intel.com>
> Date:   Thu Nov 9 17:18:32 2017 -0800
> 
>     drm/i915/skl: DMC firmware for skylake v1.27
> 
> 
> Yes, we need to start backporting these:/

That would be appreciated. I can't build my own kernels (secure boot).
Comment 4 Jani Saarinen 2018-03-29 07:10:05 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 5 Jani Saarinen 2018-04-25 11:16:14 UTC
Closing, please re-open is issue still exists.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.