Bug 90854

Summary: [SKL] gem_ringfill causes hard hang
Product: DRI Reporter: Ben Widawsky <ben>
Component: DRM/IntelAssignee: Mika Kuoppala <mika.kuoppala>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: highest CC: eero.t.tamminen, gary.c.wang, gavin.hindman, gordon.jin, intel-gfx-bugs, nroberts, tjaalton, ypwong
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: SKL i915 features: GEM/Other
Attachments:
Description Flags
Last whisper of dying skl
none
drm/i915:skl: Add WaEnableGapsTsvCreditFix
none
drm/i915:skl: Add WaEnableGapsTsvCreditFix none

Description Ben Widawsky 2015-06-05 03:13:24 UTC
This is reproducible with a piglit test as well. A bit easier:
fbo-depth-array depth-clear

Timo, you can get error state if you disable reset FWIW (also you can reboot the machine still)
i915.reset=0

Assigning to Mika per Jesse's recommendation

We can only reproduce this on D-step silicon with new BIOS. Older platforms do not have the problem.

This was tested on drm-intel-fixes from today (4f47c99a9be7e9b90a7f3c2c4599ea6b7c2ec49d)

Mesa master from today:
c820407ef0aac87546d1a778e169cfa1a915a219
Comment 1 Ben Widawsky 2015-06-05 18:10:43 UTC
We have another report of not being abot to reproduce this on C0 (Tapani). I don't have a BIOS version.
Comment 3 Ben Widawsky 2015-06-18 18:16:01 UTC
I'm assuming you changed this to needinfo to find out if it fixes the hard hang, from me. I can test that, but note that fixing the hard hang without actually fixing the problem (GPU is unusable after reset) isn't very interesting.

In other words, I'll do it when I get around to it. The highest/blocker portion of this bug is still unresolved AFAIK
Comment 4 Gavin Hindman 2015-06-29 03:30:27 UTC
Where are we on this?  Mika, is this back to you?
Comment 5 Ben Widawsky 2015-06-29 17:28:02 UTC
BTW, I am fine with any of the following:
1. Close this and open a new bug for unusable GPU after reset
2. Re-title this bug to address the lack of hard hang now
3. Keep going with this bug as is.
Comment 6 Timo Aaltonen 2015-07-06 17:41:37 UTC
Forgot to reply to the question; no, I can't get an error state even with i915.reset=0
Comment 7 Ben Widawsky 2015-07-09 18:55:33 UTC

-nightly still hard hangs for me on fbo-depth-array

bwidawsk@snipes:~/intel-gfx/drm-intel (drm-intel-nightly $)$ git show
commit 8d18850fa56a8353abcc4073e027b82a370f3d94
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jul 9 19:44:59 2015 +0200

    drm-intel-nightly: 2015y-07m-09d-17h-44m-10s UTC integration manifest

This contains:
commit 7fd2d26921d1dd70732d8765d714ec3a023a3ca9
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Thu Jun 18 12:51:40 2015 +0300

    drm/i915: Reset request handling for gen8+
Comment 8 Mika Kuoppala 2015-07-16 17:06:08 UTC
(In reply to Gavin Hindman from comment #4)
> Where are we on this?  Mika, is this back to you?

To reproduce:

./gem_ringfill --r render

As https://bugs.freedesktop.org/show_bug.cgi?id=89959 also sometimes
show similar hard hang behaviour but usually only kills the gpu,
I have concentrated to that as there is some state to work on.

As the same usual suspects have been cleared, there is slight indication
that this could be duplicate of 89959.
Comment 9 Mika Kuoppala 2015-07-17 17:27:35 UTC
Created attachment 117204 [details]
Last whisper of dying skl

The system hang is typically quiet. But managed to get this much on netconsole on module load before system death.
Comment 10 Rami 2015-08-05 08:34:18 UTC
Reproduced with ./gem_ringfill --r render
Not reproduced with fbo-depth-array depth-clear

Hardware
Platform: SKY LAKE Y A0
CPU : Intel(R) Core(TM) m3-6Y30 CPU @ 0.8GHz 4MB (family: 6, model: 78  stepping: 3)
MCP : SKL-Y  D1  2+2 (ou ULX-D1)
QDF : QVY3 
CPU : SKL D0
Chipset PCH: Sunrise Point LP C1       
CRB : SKY LAKE Y LPDDR3 RVP3 CRB FAB2
Reworks : All Mandatories + FBS02 & FBS03, O-06
Software 
Kernel : drm-intel-nightly 7ac3d6977b359242ecabc0b155edf63cf5404913 4.2.0-rc4 from git://anongit.freedesktop.org/drm-intel
Bios: SKLSE2R1.R00.X093.1507222151
ME FW : 11.0.0.1165
Ksc (EC FW): 1.16
drm: (HEAD, origin/master, origin/HEAD, master) fc083322b0c8a58b51976adf23a582bce8bb75f1 from git://git.freedesktop.org/git/mesa/drm
intel-driver: (HEAD, origin/master, origin/HEAD, master) 611d8ea9d75dc026c203e3ebe53b434769d4587c from git://git.freedesktop.org/git/vaapi/intel-driver
libva: (HEAD, origin/master, origin/HEAD, master) 70b80c0dd2effb4956b208775641f7c68a67a9df from git://git.freedesktop.org/git/vaapi/libva
mesa: (HEAD, origin/master, origin/HEAD, master) 1b2b0e42ce47bfd1fcb5513ed2c23b9bb7a5a5b8 from git://git.freedesktop.org/git/mesa/mesa
xf86-video-intel: (HEAD, origin/master, origin/HEAD, master) 4246c63347290390a2104739c719f5ff6a05a0e2 from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
xserver: (HEAD, origin/master, origin/HEAD, master) ea03e314f98e5d8ed7bf7a508006a3d84014bde5 from git://git.freedesktop.org/git/xorg/xserver
Comment 11 Mika Kuoppala 2015-08-05 08:56:10 UTC
Created attachment 117533 [details] [review]
drm/i915:skl: Add WaEnableGapsTsvCreditFix
Comment 12 Mika Kuoppala 2015-08-05 08:59:35 UTC
Created attachment 117534 [details] [review]
drm/i915:skl: Add WaEnableGapsTsvCreditFix
Comment 13 Mika Kuoppala 2015-08-05 09:00:06 UTC
Please test with: https://bugs.freedesktop.org/attachment.cgi?id=117534
Comment 14 Rami 2015-08-06 14:43:08 UTC
Test passed with success (157.254s). The patch had been already applied in the last kernel.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.