Bug 90685 - [SKL] WebGL GPU hangs
Summary: [SKL] WebGL GPU hangs
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 10.6
Hardware: Other All
: highest critical
Assignee: Ben Widawsky
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
: 90865 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-05-27 19:45 UTC by Joon Jung
Modified: 2015-10-07 03:17 UTC (History)
13 users (show)

See Also:
i915 platform:
i915 features:


Attachments
GPU dump (399.99 KB, text/plain)
2015-05-27 19:54 UTC, Joon Jung
Details
kernel log (228.64 KB, text/plain)
2015-05-27 19:55 UTC, Joon Jung
Details
dmesg for GPU hang (908 bytes, text/plain)
2015-05-29 08:37 UTC, shuo.wang
Details
i915_error_state for GPU hang (2.83 MB, text/plain)
2015-05-29 08:38 UTC, shuo.wang
Details

Description Joon Jung 2015-05-27 19:45:02 UTC
System Environment:
--------------------------
Platform: SKL ChromiumOS
Libdrm: libdrm-2.4.60
Mesa: 10.6.0 - a04b520890c669ce012b4b18165392dcabe0b27b
Kernel: (drm-intel-nightly: 2015y-05m-26d-17h-31m-17s)5267b96e584de7aa76434cc9fefad61169664c2f

Bug detailed description:
-----------------------------
The easiest way is to run a WebGL app on Chromium. A simple 3D box animation can reproduce it. It looks like it is easier to reproduce when the load is demanding, for example, running Aquarium 4000 fishes vs. 50 fishes. If left running, the GPU reset kicks in and recovers but after multiple hangs and recoveries, eventually the browser reports WebGL failure or the system will reboot. 

Reproduce steps:
---------------------------- 
Open Chromium browser and run WebGL Aquarium 4000 fishes.
Comment 1 Joon Jung 2015-05-27 19:54:45 UTC
Created attachment 116092 [details]
GPU dump
Comment 2 Joon Jung 2015-05-27 19:55:03 UTC
Created attachment 116093 [details]
kernel log
Comment 3 shuo.wang 2015-05-29 08:37:47 UTC
Created attachment 116143 [details]
dmesg for GPU hang

I just tested Mesa:3ec18152858fd9aadb398d78d5ad2d2b938507c1

GPU hang and browser is crashed
Comment 4 shuo.wang 2015-05-29 08:38:54 UTC
Created attachment 116144 [details]
i915_error_state for GPU hang

I just tested Mesa:3ec18152858fd9aadb398d78d5ad2d2b938507c1

GPU hang and browser is crashed
Comment 5 shuo.wang 2015-05-29 08:53:24 UTC
I just tried below WebGL demo by the same Mesa commit(3ec18152858fd9aadb398d78d5ad2d2b938507c1), system hang is happen:
Dynamic Cubemap:
http://webglsamples.googlecode.com/hg/dynamic-cubemap/dynamic-cubemap.html
Comment 6 Ben Widawsky 2015-06-16 23:27:31 UTC
*** Bug 90865 has been marked as a duplicate of this bug. ***
Comment 7 Ben Widawsky 2015-06-26 17:27:51 UTC
Does this hang with latest mesa master? Since I pushed the push constants fix, I haven't seen any issues on my platform.
Comment 8 Ben Widawsky 2015-06-29 17:34:24 UTC
Ping
Comment 9 Joon Jung 2015-06-29 18:48:15 UTC
Apology for the delay, Ben.

Is "i965/gen9: Implement Push Constant Buffer workaround" the right one? I've tried on mesa-10.6.0_pre20150611-r1 and still see the GPU hang.
Comment 10 Ben Widawsky 2015-06-29 20:23:02 UTC
I'd like to know if master hangs. There have been a few fixes, and it seems like nobody has tested master in a few weeks.
Comment 11 Joon Jung 2015-06-29 21:21:40 UTC
UI boot up hangs when I pick the latest Mesa. I was hoping for a quick verification. Looks like it will need some porting for ChromiumOS specific.
Comment 12 Joon Jung 2015-07-02 22:19:48 UTC
Managed to fix the UI hang with the recent Mesa(commit f49e51ef44ac6400967731b "nir: remove nir_src_get_parent_instr()).

The GPU hang is still seen.
Comment 13 Rami 2015-08-05 13:46:20 UTC
Info: tested on chrome-browser without error

Tested on SKL 

Hardware
Platform: SKY LAKE Y A0
CPU : Intel(R) Core(TM) m3-6Y30 CPU @ 0.8GHz 4MB (family: 6, model: 78  stepping: 3)
MCP : SKL-Y  D1  2+2 (or ULX-D1)
QDF : QVY3 
CPU : SKL D1
Chipset PCH: Sunrise Point LP C1       
CRB : SKY LAKE Y LPDDR3 RVP3 CRB FAB2
Reworks : All Mandatories + FBS02 & FBS03, O-06
Software 
Kernel : drm-intel-nightly 7ac3d6977b359242ecabc0b155edf63cf5404913 4.2.0-rc4 from git://anongit.freedesktop.org/drm-intel
Bios: SKLSE2R1.R00.X093.1507222151
ME FW : 11.0.0.1165
Ksc (EC FW): 1.16
drm: (HEAD, origin/master, origin/HEAD, master) fc083322b0c8a58b51976adf23a582bce8bb75f1 from git://git.freedesktop.org/git/mesa/drm
intel-driver: (HEAD, origin/master, origin/HEAD, master) 611d8ea9d75dc026c203e3ebe53b434769d4587c from git://git.freedesktop.org/git/vaapi/intel-driver
libva: (HEAD, origin/master, origin/HEAD, master) 70b80c0dd2effb4956b208775641f7c68a67a9df from git://git.freedesktop.org/git/vaapi/libva
mesa: (HEAD, origin/master, origin/HEAD, master) 1b2b0e42ce47bfd1fcb5513ed2c23b9bb7a5a5b8 from git://git.freedesktop.org/git/mesa/mesa
xf86-video-intel: (HEAD, origin/master, origin/HEAD, master) 4246c63347290390a2104739c719f5ff6a05a0e2 from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
xserver: (HEAD, origin/master, origin/HEAD, master) ea03e314f98e5d8ed7bf7a508006a3d84014bde5 from git://git.freedesktop.org/git/xorg/xserver
Comment 14 Akshu Agrawal 2015-08-12 11:02:12 UTC
I am able to reproduce the issue on ChromiumOS with cherry pick of
drm/i915:skl: Add WaEnableGapsTsvCreditFix

Ran one window of webgl aquarium ( 4000 fish) and other with webgl fishietank (1000 fish) for quick reproduction.
Comment 15 Timo Aaltonen 2015-08-12 11:15:56 UTC
yeah, I'm able to reproduce with current nightly too and Trip Under the Moonlight webgl demo on chromium. But it doesn't kill the machine anymore as before.
Comment 16 dog 2015-08-19 00:36:50 UTC
Ben, what is the next step here?
Comment 17 Ben Widawsky 2015-08-19 03:04:46 UTC
Timo, I assume you're okay now (https://bugs.freedesktop.org/show_bug.cgi?id=90425). Please make sure you try mesa master and see if you can reproduce the bug. You can also see the aforementioned bug which Timo diligently bisected to find the commit which fixes it.

FYI:

commit 74fd226e34d0cf5e9ff43174ae69b4a66f5de1ab
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Wed Mar 25 16:52:46 2015 -0700

    i965/skl: Don't use the PMA depth stall workaround
Comment 18 Ben Widawsky 2015-08-19 03:05:45 UTC
dog, as far as I'm concerned there are no more hangs on mesa master with drm-intel-nightly. Anything other than that is an integration/backport issue.
Comment 19 Ben Widawsky 2015-08-19 03:10:14 UTC
(In reply to Ben Widawsky from comment #17)
> Timo, I assume you're okay now
> (https://bugs.freedesktop.org/show_bug.cgi?id=90425). Please make sure you
> try mesa master and see if you can reproduce the bug. You can also see the
> aforementioned bug which Timo diligently bisected to find the commit which
> fixes it.
> 
> FYI:
> 
> commit 74fd226e34d0cf5e9ff43174ae69b4a66f5de1ab
> Author: Ben Widawsky <benjamin.widawsky@intel.com>
> Date:   Wed Mar 25 16:52:46 2015 -0700
> 
>     i965/skl: Don't use the PMA depth stall workaround

The experiment request was to Akshu, not Timo.
Comment 20 Akshu Agrawal 2015-08-25 11:25:41 UTC
Comment#14: Was tested with June 11 tot of mesa and it had the patch 
i965/skl: Don't use the PMA depth stall workaround

With that I am seeing the GPU hang issue still present.
Comment 21 Ben Widawsky 2015-08-25 15:49:02 UTC
(In reply to Akshu Agrawal from comment #20)
> Comment#14: Was tested with June 11 tot of mesa and it had the patch 
> i965/skl: Don't use the PMA depth stall workaround
> 
> With that I am seeing the GPU hang issue still present.

Is it reproducible with drm-intel-nightly also?
Comment 22 Joon Jung 2015-08-25 18:38:21 UTC
Thanks guys for the traction on this problem.

Using yesterday's ChromiumOS ToT(Chromium 47.0.2491.0(Platform 7394)), the behavior is better. It used to take less than 5 minutes to reproduce one using Webgl Aquarium 4000 fishes before. Now we are seeing between 30 ~ 120 minutes to get one. The ToT build has Ben's PMA stall WA fix.

On top of this new ToT, we will try to get the latest drm intel nightly with the latest Mesa built and run the test.
Comment 23 Ben Widawsky 2015-08-25 19:01:08 UTC
After testing nightly, can you please attach an error state if you can get a hang?
Comment 24 Joon Jung 2015-08-25 20:44:34 UTC
Sure, will do.
Comment 25 Gavin Hindman 2015-08-26 04:07:15 UTC
Joon,

ToT has the PMA Stall W/A, but it does not yet have the general SKL GPU hangs, which are still going through integration, so you'll still likely see intermittent hangs there.   Please reproduce against drm-nightly, not Chromium ToT, since this is an upstream bug.
Comment 26 Akshu Agrawal 2015-08-26 08:21:55 UTC
Tested with ToT mesa and nightly and could not reproduce the hang.

Ran WebGL Aquarium (4000 fish) and WebGL fishie tank(1000 fish) simultaneously for more than 2 hrs.
Comment 27 Ben Widawsky 2015-08-26 14:49:27 UTC
Sounds like we're good here. Please re-open if you see issues with the latest versions of kernel/mesa.
Comment 28 appala 2015-09-15 07:37:18 UTC
We are able to reproduce GPU hang on SKL-S (SKL RVP8), is it ok to continue this issue in same thread?
Comment 29 Gavin Hindman 2015-09-15 14:49:43 UTC
This issue was very stepping-dependent.  Are you using the SKL-S equivalent of D1?
Comment 30 Girish 2015-09-28 09:17:44 UTC
We have shared the stepping details we were using. please update on this.
Comment 31 Gavin Hindman 2015-10-01 04:33:05 UTC
per email the test team was using a fairly old Mesa version - Girish, have you been able to retest against Master or the latest Stable?
Comment 32 Ben Widawsky 2015-10-07 03:17:56 UTC
Please re-open if this is still an issue.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.