Bug 94161 - [skl rc6] GPU HANG
Summary: [skl rc6] GPU HANG
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: highest blocker
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 94029 94462 94575 94768 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-02-15 16:34 UTC by Mikael Djurfeldt
Modified: 2017-10-03 13:39 UTC (History)
23 users (show)

See Also:
i915 platform: SKL
i915 features: GPU hang, power/GT


Attachments
GPU crash dump (82.96 KB, text/plain)
2016-02-15 16:34 UTC, Mikael Djurfeldt
no flags Details
gpu-rc4-crash.log.gz (81.81 KB, application/x-gzip)
2016-03-10 13:37 UTC, Mikael Djurfeldt
no flags Details
drm/i915/skl: Use WaForceContextSaveRestoreNonCoherent for all revs (1.27 KB, patch)
2016-04-01 12:14 UTC, Mika Kuoppala
no flags Details | Splinter Review
drm/i915/skl: Use WaRsDisableCoarsePowerGating for all revs (1.26 KB, patch)
2016-04-01 15:18 UTC, Mika Kuoppala
no flags Details | Splinter Review
attachment-11432-0.html (1.49 KB, text/html)
2016-04-11 07:07 UTC, Odd Rune Lykkebø
no flags Details
Set NEEDS_WaRsDisableCoarsePowerGating for Skylake GT2 GPUs (1.41 KB, patch)
2017-08-02 15:48 UTC, Gordon Messmer
no flags Details | Splinter Review

Description Mikael Djurfeldt 2016-02-15 16:34:18 UTC
Created attachment 121772 [details]
GPU crash dump

I have experienced GPU hangs with all kernels after 4.3.

I'm running a MS Surface Pro 4.

Feb 15 17:27:22 hat kernel: [  478.912402] [drm] stuck on render ring
Feb 15 17:27:22 hat kernel: [  478.913345] [drm] GPU HANG: ecode 9:0:0x85df9fff, in gnome-shell [1956], reason: Ring hung, action: reset
Feb 15 17:27:22 hat kernel: [  478.913357] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Feb 15 17:27:22 hat kernel: [  478.913361] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb 15 17:27:22 hat kernel: [  478.913364] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Feb 15 17:27:22 hat kernel: [  478.913367] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Feb 15 17:27:22 hat kernel: [  478.913371] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Feb 15 17:27:22 hat kernel: [  478.915833] drm/i915: Resetting chip after gpu hang
Feb 15 17:27:24 hat kernel: [  480.901312] [drm] RC6 on
Comment 1 cprigent 2016-02-16 11:10:18 UTC
Hi Mikael,
Which GPU is it: m3 Intel HD graphics 515 / i5 Intel HD graphics 520 / i7 Intel Iris graphics?
Which steps are causing the GPU hang?
Comment 2 Mikael Djurfeldt 2016-02-16 11:56:42 UTC
It's Intel Iris (HD 540).

It's hard to say what exactly is causing it. Once it was caused by a "tail -f /var/log/syslog" scrolling text in Gnome Terminal. Another time it was caused by a web page being displayed in Firefox. A third time it was caused by switching workspace in Gnome Shell.
Comment 3 Mikael Djurfeldt 2016-02-16 12:01:27 UTC
I should add that these hangs happen every other minute when things change on the screen.
Comment 4 Ivan Giuliani 2016-02-20 18:09:12 UTC
I seem to be affected by this as well (same GPU, on an XPS 13" (2016)). Tried any kernel from 4.3 to 4.5 on Ubuntu, including the drm-intel-next kernel (4.5.0-997-generic).

A workaround is to add i915.enable_rc6=0 to the kernel boot parameters.
Comment 5 Mikael Djurfeldt 2016-02-24 21:43:24 UTC
I have now tried this with the latest drm-intel kernel and the newest skl-dcm firmware (1.26).  My libdrm is 2.4.67.

The problem still persists.

A deterministic way to provoke the hang is to run glmark2 (github.com/glmark2).

I can confirm that if I give i915.enable_rc6=0 as a kernel option, the problem disappears.
Comment 6 Chris Wilson 2016-03-01 20:42:51 UTC
*** Bug 94029 has been marked as a duplicate of this bug. ***
Comment 7 Chris Wilson 2016-03-02 21:04:46 UTC
To attempt to distinguish another source of bugs, does intel_pstate=disable make any difference?
Comment 8 Mikael Djurfeldt 2016-03-02 21:43:13 UTC
(In reply to Chris Wilson from comment #7)
> To attempt to distinguish another source of bugs, does intel_pstate=disable
> make any difference?

Replacing i915.enable_rc6=0 with intel_pstate=disable reintroduces the GPU crashes.
Comment 9 Chris Wilson 2016-03-10 12:50:29 UTC
*** Bug 94462 has been marked as a duplicate of this bug. ***
Comment 10 Chris Wilson 2016-03-10 12:52:24 UTC
Next on the possible list of interactions, can we please test rc6 vs iommu? Leave rc6 as default (remove it from the command line) and add intel_iommu=igfx_off
Comment 11 Odd Rune Lykkebø 2016-03-10 13:13:28 UTC
adding intel_iommu=igfx_off and removing rc6=0 frpm kernel boot parameters of 4.5-rc4 reintroduces hang problems.
Comment 12 Mikael Djurfeldt 2016-03-10 13:37:29 UTC
Created attachment 122204 [details]
gpu-rc4-crash.log.gz

I tested this both with Linus rc4 and drm-intel-nightly from today (rc7).
In both cases I still experience a GPU hang with the single (apart from
noresume) kernel cmd line option intel_iommu=igfx_off.

For rc4, I saw a new error code, though:

Mar 10 14:27:46 hat kernel: [   56.843611] [drm] GPU HANG: ecode
9:0:0x87f99ff9, in gnome-shell [1742], reason: Ring hung, action: reset

I attach the corresponding crash dump file.

This means that the only way, so far, to avoid hangs is i915.enable_rc6=0.
I have confirmed that this is also true for rc7 (drm-intel-nightly).

On Thu, Mar 10, 2016 at 2:13 PM, <bugzilla-daemon@freedesktop.org> wrote:

> *Comment # 11 <https://bugs.freedesktop.org/show_bug.cgi?id=94161#c11> on
> bug 94161 <https://bugs.freedesktop.org/show_bug.cgi?id=94161> from
> oddrunesl@gmail.com <oddrunesl@gmail.com> *
>
> adding intel_iommu=igfx_off and removing rc6=0 frpm kernel boot parameters of
> 4.5-rc4 reintroduces hang problems.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>    - You are on the CC list for the bug.
>
>
Comment 13 Odd Rune Lykkebø 2016-03-11 14:19:30 UTC
fyi tried  4.5.0-994-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-03-11-wily/

...and still see hangs without i915.enable_rc6=0

cheers,
Comment 14 Odd Rune Lykkebø 2016-03-14 11:52:13 UTC
still present in daily build 14th of march found in http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-03-14-wily/

cheers
Comment 15 lister.lists 2016-03-16 11:04:40 UTC
I have the XPS 13 with the Iris 540. I managed to get Arch working about a week ago. At the time, the core repo included 4.4.1 and had some problems. About the next day I think 4.4.3 hit and I managed to get a working system with that following the Arch wiki (mkinitcpio "... intel_agp i915 ...").
However, after upgrading to 4.4.5 I encounter problems. I don't know if they're hangs per say. Mostly I get blackscreens on boot. But either way, visually the only nicely working system I've got on Iris 540 is:
4.4.3-1-ARCH #1 SMP PREEMPT Fri Feb 26 15:09:29 CET 2016 x86_64 GNU/Linux
Comment 16 lister.lists 2016-03-16 11:05:34 UTC
(In reply to lister.lists from comment #15)
> I have the XPS 13 with the Iris 540. I managed to get Arch working about a
> week ago. At the time, the core repo included 4.4.1 and had some problems.
> About the next day I think 4.4.3 hit and I managed to get a working system
> with that following the Arch wiki (mkinitcpio "... intel_agp i915 ...").
> However, after upgrading to 4.4.5 I encounter problems. I don't know if
> they're hangs per say. Mostly I get blackscreens on boot. But either way,
> visually the only nicely working system I've got on Iris 540 is:
> 4.4.3-1-ARCH #1 SMP PREEMPT Fri Feb 26 15:09:29 CET 2016 x86_64 GNU/Linux

I realise my timeline is out, but the point remains...
Comment 17 Odd Rune Lykkebø 2016-03-22 08:30:18 UTC
still present in http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-03-22-wily/
Comment 18 dump 2016-03-22 13:34:00 UTC
*** Bug 94575 has been marked as a duplicate of this bug. ***
Comment 19 Robert LeBlanc 2016-03-24 22:01:46 UTC
I can confirm on Dell XPS 13. i915.enable_rc6=0 works. I've tried i915.enable_rc6=1 to see it was a deep sleep problem, but it shows up with i915.enable_rc6=1 as well. I tried turning semaphores 0 and 1 and neither of those helped either. Another bug report mentioned commenting out a couple of lines in the kernel helped him, but it didn't help me on 4.5.0.

I'm running Debian Stretch with KDE and can reproduce very quickly by logging in, opening chrome, visit youtube and play a video and set it to full screen.

Good luck
Comment 20 Odd Rune Lykkebø 2016-03-25 08:52:09 UTC
Bug present in 
http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-03-25-wily/
Comment 21 Jess Frazelle 2016-03-26 22:53:44 UTC
I am experiencing this as well. Happens on the Dell XPS 13 (2016) w 6th Generation Intel Core i7-6560U (4M Cache, up to 3.2 GHz), Intel® Iris™ Graphics 540.

Kernel 4.4.6, also experienced on 4.4.2.

Let me know what you need to help. Its super easy to trigger by just playing a video or even just using chrome for more than 5 min.

Can confirm  i915.enable_rc6=0 fixes, there still get hiccups fi watching a video etc, but no full crashes. Kinda hate doing that to my battery life though.
Comment 22 Timo Aaltonen 2016-04-01 06:37:10 UTC
*** Bug 94768 has been marked as a duplicate of this bug. ***
Comment 23 Michaël D. 2016-04-01 11:48:34 UTC
I can confirm this issue on a Intel NUC6i5SYH, Iris Graphics 540 with kernel 4.4.6-300.fc23.x86_64 (exact same symptoms, logs ...). I bet a whole lot of people must be affected ...
Is this driver supported by Intel themselves or the community?
Comment 24 Mika Kuoppala 2016-04-01 12:14:15 UTC
Created attachment 122661 [details] [review]
drm/i915/skl: Use WaForceContextSaveRestoreNonCoherent for all revs
Comment 25 Timo Aaltonen 2016-04-01 12:44:28 UTC
#24 didn't fix it for me
Comment 26 Odd Rune Lykkebø 2016-04-01 12:46:13 UTC
Negative on #24.
Comment 27 Timo Aaltonen 2016-04-01 13:40:06 UTC
I heard this might be due to old bios, which my system certainly has.. so verify you have the latest from the vendor (mine is from intel, and no updates available for test hw, so..)
Comment 28 Mika Kuoppala 2016-04-01 15:18:55 UTC
Created attachment 122664 [details] [review]
drm/i915/skl: Use WaRsDisableCoarsePowerGating for all revs
Comment 29 Michael Sartain 2016-04-01 19:16:32 UTC
(In reply to Timo Aaltonen from comment #27)
> I heard this might be due to old bios, which my system certainly has.. so
> verify you have the latest from the vendor

I've got a Skylake Dell XPS 13 9350 with the very latest bios from a couple days ago (1.3.3), and the bug still happens on this machine if I remove rc6=0 from my boot line.
Comment 30 dump 2016-04-01 20:33:14 UTC
I'm running patch from comment 28 over the mainline kernel (4.6rc1)
No freeze/crash so far even when i stress test it.

Thanks Mika!
Comment 31 Timo Aaltonen 2016-04-01 21:39:26 UTC
#28 plus #5 from 93491 seem to have fixed glmark2 here, could be that #28 alone would be enough but doesn't hurt to test with both..
Comment 32 miticotoby 2016-04-03 06:59:05 UTC
Tested as Timo using #28 plus #5 from 93491. seems to fix the issue for me too. has been stable for a few hours now without disabling rc6
Comment 33 dump 2016-04-06 23:24:26 UTC
Note that I notice sluggishness (my 3y old intel 2D graphics - and CPU rendered graphics on this computer are faster) and display freezes with the fix and DRM enalbed, though this might need a separate bug (not sure if its related or just another bug)
Comment 34 jason 2016-04-09 17:25:42 UTC
(In reply to miticotoby from comment #32)
> Tested as Timo using #28 plus #5 from 93491. seems to fix the issue for me
> too. has been stable for a few hours now without disabling rc6

seconding miticotoby.  applied #28 plus #5 from 93491 working fine for few days now.  (without disabling rc6)
Comment 35 Gerard Farré 2016-04-10 12:32:36 UTC
Compiled kernel 4.6 rc2 drm-intel-nightly with the Mika patch (comment #28) and everything is working fine, no gpu hang at the moment (4 days testing).
Why this patch is not merged? Maybe because needs more testing?

Thanks Mika.
Comment 36 Odd Rune Lykkebø 2016-04-11 07:07:39 UTC
Created attachment 122856 [details]
attachment-11432-0.html

As far as I understand, the patch disables power gating which is a very bad
thing in terms of power usage so this is not a fix, just a temp workaround.
On 10 Apr 2016 14:32, <bugzilla-daemon@freedesktop.org> wrote:

> *Comment # 35 <https://bugs.freedesktop.org/show_bug.cgi?id=94161#c35> on
> bug 94161 <https://bugs.freedesktop.org/show_bug.cgi?id=94161> from Gerard
> Farré <gerar.f87@gmail.com> *
>
> Compiled kernel 4.6 rc2 drm-intel-nightly with the Mika patch (comment #28 <https://bugs.freedesktop.org/show_bug.cgi?id=94161#c28>) and
> everything is working fine, no gpu hang at the moment (4 days testing).
> Why this patch is not merged? Maybe because needs more testing?
>
> Thanks Mika.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are on the CC list for the bug.
>
>
Comment 37 Markus Schauler 2016-04-12 19:59:32 UTC
I can confirm that the patch in comment 28 (Use WaRsDisableCoarsePowerGating)
solved the issue on my Intel NUC6i5 with only a moderate increase in power consumption.

With an idle desktop using kernel 4.6.0-rc3, the system consumes:

7 Watts without patch, RC6 enabled, frequent crashes
17 Watts with i915.enable_rc6=0, no crashes
9 Watts with patch, no crashes
Comment 38 Jess Frazelle 2016-04-19 02:30:20 UTC
I just used the patch on 4.6-rc4 from upstream source and it works for me!
Comment 39 Jess Frazelle 2016-04-19 04:54:55 UTC
Nevermind, I compiled a binary and my whole computer froze
Comment 40 Kimmo Nikkanen 2016-04-21 11:30:49 UTC
This fixed by,

commit d528a6a0f3fd346bd7cc2de611a4149b6ebaab41
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Tue Apr 5 15:56:16 2016 +0300

drm/i915/skl: Fix rc6 based gpu/system hang
Comment 41 Gordon Messmer 2017-08-02 15:48:33 UTC
Created attachment 133202 [details] [review]
Set NEEDS_WaRsDisableCoarsePowerGating for Skylake GT2 GPUs

Mika, I'm seeing the same hang and error message on a Dell Latitude E7470 with a GT2 GPU.  Would you consider further extending the list of parts that require this fix?  I'm testing this patch now, and it seems to work.

lspci describes my GPU as:
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 520 [8086:1916] (rev 07)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.