Summary: | [skl rc6] GPU HANG | ||
---|---|---|---|
Product: | DRI | Reporter: | Mikael Djurfeldt <mikael> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | blocker | ||
Priority: | highest | CC: | b, carlo.cabanilla, crow, dblack, dump, florian, gary.c.wang, giuliani.v, gordon.messmer, intel-gfx-bugs, john.stultz, marci_r, martin, mikael, miticotoby, nell, oddrunesl, q3aiml, reyad.attiyat, tjaalton, ufsxgxlg, wengxt, xiong.y.zhang |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | SKL | i915 features: | GPU hang, power/GT |
Attachments: |
Hi Mikael, Which GPU is it: m3 Intel HD graphics 515 / i5 Intel HD graphics 520 / i7 Intel Iris graphics? Which steps are causing the GPU hang? It's Intel Iris (HD 540). It's hard to say what exactly is causing it. Once it was caused by a "tail -f /var/log/syslog" scrolling text in Gnome Terminal. Another time it was caused by a web page being displayed in Firefox. A third time it was caused by switching workspace in Gnome Shell. I should add that these hangs happen every other minute when things change on the screen. I seem to be affected by this as well (same GPU, on an XPS 13" (2016)). Tried any kernel from 4.3 to 4.5 on Ubuntu, including the drm-intel-next kernel (4.5.0-997-generic). A workaround is to add i915.enable_rc6=0 to the kernel boot parameters. I have now tried this with the latest drm-intel kernel and the newest skl-dcm firmware (1.26). My libdrm is 2.4.67. The problem still persists. A deterministic way to provoke the hang is to run glmark2 (github.com/glmark2). I can confirm that if I give i915.enable_rc6=0 as a kernel option, the problem disappears. *** Bug 94029 has been marked as a duplicate of this bug. *** To attempt to distinguish another source of bugs, does intel_pstate=disable make any difference? (In reply to Chris Wilson from comment #7) > To attempt to distinguish another source of bugs, does intel_pstate=disable > make any difference? Replacing i915.enable_rc6=0 with intel_pstate=disable reintroduces the GPU crashes. *** Bug 94462 has been marked as a duplicate of this bug. *** Next on the possible list of interactions, can we please test rc6 vs iommu? Leave rc6 as default (remove it from the command line) and add intel_iommu=igfx_off adding intel_iommu=igfx_off and removing rc6=0 frpm kernel boot parameters of 4.5-rc4 reintroduces hang problems. Created attachment 122204 [details] gpu-rc4-crash.log.gz I tested this both with Linus rc4 and drm-intel-nightly from today (rc7). In both cases I still experience a GPU hang with the single (apart from noresume) kernel cmd line option intel_iommu=igfx_off. For rc4, I saw a new error code, though: Mar 10 14:27:46 hat kernel: [ 56.843611] [drm] GPU HANG: ecode 9:0:0x87f99ff9, in gnome-shell [1742], reason: Ring hung, action: reset I attach the corresponding crash dump file. This means that the only way, so far, to avoid hangs is i915.enable_rc6=0. I have confirmed that this is also true for rc7 (drm-intel-nightly). On Thu, Mar 10, 2016 at 2:13 PM, <bugzilla-daemon@freedesktop.org> wrote: > *Comment # 11 <https://bugs.freedesktop.org/show_bug.cgi?id=94161#c11> on > bug 94161 <https://bugs.freedesktop.org/show_bug.cgi?id=94161> from > oddrunesl@gmail.com <oddrunesl@gmail.com> * > > adding intel_iommu=igfx_off and removing rc6=0 frpm kernel boot parameters of > 4.5-rc4 reintroduces hang problems. > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > - You are on the CC list for the bug. > > fyi tried 4.5.0-994-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-03-11-wily/ ...and still see hangs without i915.enable_rc6=0 cheers, still present in daily build 14th of march found in http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-03-14-wily/ cheers I have the XPS 13 with the Iris 540. I managed to get Arch working about a week ago. At the time, the core repo included 4.4.1 and had some problems. About the next day I think 4.4.3 hit and I managed to get a working system with that following the Arch wiki (mkinitcpio "... intel_agp i915 ..."). However, after upgrading to 4.4.5 I encounter problems. I don't know if they're hangs per say. Mostly I get blackscreens on boot. But either way, visually the only nicely working system I've got on Iris 540 is: 4.4.3-1-ARCH #1 SMP PREEMPT Fri Feb 26 15:09:29 CET 2016 x86_64 GNU/Linux (In reply to lister.lists from comment #15) > I have the XPS 13 with the Iris 540. I managed to get Arch working about a > week ago. At the time, the core repo included 4.4.1 and had some problems. > About the next day I think 4.4.3 hit and I managed to get a working system > with that following the Arch wiki (mkinitcpio "... intel_agp i915 ..."). > However, after upgrading to 4.4.5 I encounter problems. I don't know if > they're hangs per say. Mostly I get blackscreens on boot. But either way, > visually the only nicely working system I've got on Iris 540 is: > 4.4.3-1-ARCH #1 SMP PREEMPT Fri Feb 26 15:09:29 CET 2016 x86_64 GNU/Linux I realise my timeline is out, but the point remains... *** Bug 94575 has been marked as a duplicate of this bug. *** I can confirm on Dell XPS 13. i915.enable_rc6=0 works. I've tried i915.enable_rc6=1 to see it was a deep sleep problem, but it shows up with i915.enable_rc6=1 as well. I tried turning semaphores 0 and 1 and neither of those helped either. Another bug report mentioned commenting out a couple of lines in the kernel helped him, but it didn't help me on 4.5.0. I'm running Debian Stretch with KDE and can reproduce very quickly by logging in, opening chrome, visit youtube and play a video and set it to full screen. Good luck I am experiencing this as well. Happens on the Dell XPS 13 (2016) w 6th Generation Intel Core i7-6560U (4M Cache, up to 3.2 GHz), Intel® Iris™ Graphics 540. Kernel 4.4.6, also experienced on 4.4.2. Let me know what you need to help. Its super easy to trigger by just playing a video or even just using chrome for more than 5 min. Can confirm i915.enable_rc6=0 fixes, there still get hiccups fi watching a video etc, but no full crashes. Kinda hate doing that to my battery life though. *** Bug 94768 has been marked as a duplicate of this bug. *** I can confirm this issue on a Intel NUC6i5SYH, Iris Graphics 540 with kernel 4.4.6-300.fc23.x86_64 (exact same symptoms, logs ...). I bet a whole lot of people must be affected ... Is this driver supported by Intel themselves or the community? Created attachment 122661 [details] [review] drm/i915/skl: Use WaForceContextSaveRestoreNonCoherent for all revs #24 didn't fix it for me Negative on #24. I heard this might be due to old bios, which my system certainly has.. so verify you have the latest from the vendor (mine is from intel, and no updates available for test hw, so..) Created attachment 122664 [details] [review] drm/i915/skl: Use WaRsDisableCoarsePowerGating for all revs (In reply to Timo Aaltonen from comment #27) > I heard this might be due to old bios, which my system certainly has.. so > verify you have the latest from the vendor I've got a Skylake Dell XPS 13 9350 with the very latest bios from a couple days ago (1.3.3), and the bug still happens on this machine if I remove rc6=0 from my boot line. I'm running patch from comment 28 over the mainline kernel (4.6rc1) No freeze/crash so far even when i stress test it. Thanks Mika! #28 plus #5 from 93491 seem to have fixed glmark2 here, could be that #28 alone would be enough but doesn't hurt to test with both.. Tested as Timo using #28 plus #5 from 93491. seems to fix the issue for me too. has been stable for a few hours now without disabling rc6 Note that I notice sluggishness (my 3y old intel 2D graphics - and CPU rendered graphics on this computer are faster) and display freezes with the fix and DRM enalbed, though this might need a separate bug (not sure if its related or just another bug) (In reply to miticotoby from comment #32) > Tested as Timo using #28 plus #5 from 93491. seems to fix the issue for me > too. has been stable for a few hours now without disabling rc6 seconding miticotoby. applied #28 plus #5 from 93491 working fine for few days now. (without disabling rc6) Compiled kernel 4.6 rc2 drm-intel-nightly with the Mika patch (comment #28) and everything is working fine, no gpu hang at the moment (4 days testing). Why this patch is not merged? Maybe because needs more testing? Thanks Mika. Created attachment 122856 [details] attachment-11432-0.html As far as I understand, the patch disables power gating which is a very bad thing in terms of power usage so this is not a fix, just a temp workaround. On 10 Apr 2016 14:32, <bugzilla-daemon@freedesktop.org> wrote: > *Comment # 35 <https://bugs.freedesktop.org/show_bug.cgi?id=94161#c35> on > bug 94161 <https://bugs.freedesktop.org/show_bug.cgi?id=94161> from Gerard > Farré <gerar.f87@gmail.com> * > > Compiled kernel 4.6 rc2 drm-intel-nightly with the Mika patch (comment #28 <https://bugs.freedesktop.org/show_bug.cgi?id=94161#c28>) and > everything is working fine, no gpu hang at the moment (4 days testing). > Why this patch is not merged? Maybe because needs more testing? > > Thanks Mika. > > ------------------------------ > You are receiving this mail because: > > - You are on the CC list for the bug. > > I can confirm that the patch in comment 28 (Use WaRsDisableCoarsePowerGating) solved the issue on my Intel NUC6i5 with only a moderate increase in power consumption. With an idle desktop using kernel 4.6.0-rc3, the system consumes: 7 Watts without patch, RC6 enabled, frequent crashes 17 Watts with i915.enable_rc6=0, no crashes 9 Watts with patch, no crashes I just used the patch on 4.6-rc4 from upstream source and it works for me! Nevermind, I compiled a binary and my whole computer froze This fixed by, commit d528a6a0f3fd346bd7cc2de611a4149b6ebaab41 Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Tue Apr 5 15:56:16 2016 +0300 drm/i915/skl: Fix rc6 based gpu/system hang Created attachment 133202 [details] [review] Set NEEDS_WaRsDisableCoarsePowerGating for Skylake GT2 GPUs Mika, I'm seeing the same hang and error message on a Dell Latitude E7470 with a GT2 GPU. Would you consider further extending the list of parts that require this fix? I'm testing this patch now, and it seems to work. lspci describes my GPU as: 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 520 [8086:1916] (rev 07) |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 121772 [details] GPU crash dump I have experienced GPU hangs with all kernels after 4.3. I'm running a MS Surface Pro 4. Feb 15 17:27:22 hat kernel: [ 478.912402] [drm] stuck on render ring Feb 15 17:27:22 hat kernel: [ 478.913345] [drm] GPU HANG: ecode 9:0:0x85df9fff, in gnome-shell [1956], reason: Ring hung, action: reset Feb 15 17:27:22 hat kernel: [ 478.913357] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Feb 15 17:27:22 hat kernel: [ 478.913361] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Feb 15 17:27:22 hat kernel: [ 478.913364] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Feb 15 17:27:22 hat kernel: [ 478.913367] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Feb 15 17:27:22 hat kernel: [ 478.913371] [drm] GPU crash dump saved to /sys/class/drm/card0/error Feb 15 17:27:22 hat kernel: [ 478.915833] drm/i915: Resetting chip after gpu hang Feb 15 17:27:24 hat kernel: [ 480.901312] [drm] RC6 on