87564 – [hsw gt1] GPU HANG: ecode 0:0x85dffffc in xbmc

Bug 87564 - [hsw gt1] GPU HANG: ecode 0:0x85dffffc in xbmc

Summary: [hsw gt1] GPU HANG: ecode 0:0x85dffffc in xbmc

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Ben Widawsky
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-12-21 18:55 UTC by Knut Rupprecht
Modified:	2015-10-07 03:19 UTC (History)
CC List:	7 users (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg (50.48 KB, text/plain) 2014-12-21 18:55 UTC, Knut Rupprecht	Details
Error state (438.48 KB, text/plain) 2014-12-21 18:56 UTC, Knut Rupprecht	Details
xbmc debuglog (52.36 KB, text/plain) 2014-12-21 18:58 UTC, Knut Rupprecht	Details
3.13.0-dmesg (56.30 KB, text/plain) 2014-12-22 08:35 UTC, Knut Rupprecht	Details
3.13.0-error.tar.gz (790.97 KB, text/plain) 2014-12-22 08:35 UTC, Knut Rupprecht	Details
3.13.0-kodi.log (91.37 KB, text/plain) 2014-12-22 08:36 UTC, Knut Rupprecht	Details
3.15.0_enable_rc6=0-dmesg (53.97 KB, text/plain) 2014-12-22 08:36 UTC, Knut Rupprecht	Details
3.15.0_enable_rc6=0-error.tar.gz (393.55 KB, text/plain) 2014-12-22 08:37 UTC, Knut Rupprecht	Details
3.15.0_enable_rc6=0-kodi.log (782.31 KB, text/plain) 2014-12-22 08:37 UTC, Knut Rupprecht	Details
3.18.0-dmesg (57.05 KB, text/plain) 2014-12-22 08:37 UTC, Knut Rupprecht	Details
3.18.0-error.tar.gz (776.97 KB, text/plain) 2014-12-22 08:38 UTC, Knut Rupprecht	Details
3.18.0-kodi.log (98.73 KB, text/plain) 2014-12-22 08:38 UTC, Knut Rupprecht	Details
dmidecode (7.75 KB, text/plain) 2014-12-22 08:38 UTC, Knut Rupprecht	Details
Error State (792.09 KB, application/x-gzip) 2014-12-22 23:40 UTC, Knut Rupprecht	Details
Playing clannad.mkv with vaapi and tracing batch and dword (352.81 KB, application/octet-stream) 2014-12-23 19:06 UTC, Peter Frühberger	Details
bw1 Error no video playback only browsing the menu (447.64 KB, application/x-gzip) 2014-12-23 19:15 UTC, Knut Rupprecht	Details
no video playback only browsing the menu va-log1 (495.18 KB, text/plain) 2014-12-23 19:16 UTC, Knut Rupprecht	Details
no video playback only browsing the menu va-log2 (42.65 KB, text/plain) 2014-12-23 19:17 UTC, Knut Rupprecht	Details
limit max PS threads for gt1 (645 bytes, patch) 2014-12-23 19:35 UTC, Ben Widawsky	Details \| Splinter Review
The equivalent mesa patch (872 bytes, patch) 2014-12-23 19:52 UTC, Ben Widawsky	Details \| Splinter Review
max_wm_threads_70-error.tar.gz (488.61 KB, text/plain) 2014-12-23 22:48 UTC, Knut Rupprecht	Details
dmesg including kernel, intel and mesa patch (56.02 KB, text/plain) 2014-12-24 11:04 UTC, Knut Rupprecht	Details
Xorg.0.log including kernel, mesa and intel patches. (22.90 KB, text/plain) 2014-12-24 11:06 UTC, Knut Rupprecht	Details
dump including kernel, intel and mesa patch (1.15 MB, text/plain) 2014-12-24 23:02 UTC, Knut Rupprecht	Details
dmesg including kernel, intel and mesa patch (61.60 KB, text/plain) 2014-12-24 23:03 UTC, Knut Rupprecht	Details
Xorg.log including kernel, intel and mesa patch (22.90 KB, text/plain) 2014-12-24 23:04 UTC, Knut Rupprecht	Details
Same as previous, but for the kernel (951 bytes, patch) 2014-12-24 23:54 UTC, Ben Widawsky	Details \| Splinter Review
Change PS thread count for null render context (kernel) (1.02 KB, patch) 2014-12-24 23:58 UTC, Ben Widawsky	Details \| Splinter Review
This should include Bens latest patch. (1.18 MB, text/plain) 2014-12-25 00:43 UTC, Knut Rupprecht	Details
error dump with all patches and SNA using generic backend (1.49 MB, application/x-gzip) 2014-12-25 09:31 UTC, Knut Rupprecht	Details
dmesg with all patches and SNA using generic backend (57.61 KB, text/plain) 2014-12-25 09:32 UTC, Knut Rupprecht	Details
Xorg-log with all patches and SNA using generic backend (20.96 KB, text/plain) 2014-12-25 09:33 UTC, Knut Rupprecht	Details
Error state with pipecontrol patch (470.96 KB, application/x-gzip) 2014-12-25 22:11 UTC, Knut Rupprecht	Details
Error State mesa 10.4.0 (484.52 KB, application/x-gzip) 2014-12-31 16:01 UTC, Knut Rupprecht	Details
dmesg mesa 10.4.0 (49.34 KB, text/plain) 2014-12-31 16:02 UTC, Knut Rupprecht	Details
Xorg.log mesa 10.4.0 (16.23 KB, text/plain) 2014-12-31 16:03 UTC, Knut Rupprecht	Details
Always initialize streamout buffers (2.05 KB, patch) 2014-12-31 19:33 UTC, Ben Widawsky	Details \| Splinter Review
Error State incl. 'Always initialize streamout buffers' patch (468.17 KB, application/gzip) 2015-01-01 01:33 UTC, Knut Rupprecht	Details
Error dump for 'Always initialize streamout buffers' (473.81 KB, application/x-gzip) 2015-01-01 02:23 UTC, Knut Rupprecht	Details
Also initialize the streamout declaration list (1.69 KB, patch) 2015-01-01 03:07 UTC, Ben Widawsky	Details \| Splinter Review
Error State including 'initialize the streamout declaration list' (503.93 KB, application/x-gzip) 2015-01-01 10:48 UTC, Knut Rupprecht	Details
dmesg including 'initialize the streamout declaration list' (49.83 KB, text/plain) 2015-01-01 10:51 UTC, Knut Rupprecht	Details
Only initialize, don't enable the SOL for null state (1.52 KB, text/plain) 2015-01-02 00:20 UTC, Ben Widawsky	Details
Error State including 'vertex fetcher NULL state' (488.44 KB, application/x-gzip) 2015-01-02 05:14 UTC, Knut Rupprecht	Details
We can't set non-zero streamout for inactive streams (1.50 KB, text/plain) 2015-01-02 05:51 UTC, Ben Widawsky	Details
Error State including 'fix the so_decl_list initialization' (835.77 KB, application/x-gzip) 2015-01-02 10:23 UTC, Knut Rupprecht	Details
Kernel patch - single port dispatch (917 bytes, patch) 2015-01-03 20:25 UTC, Ben Widawsky	Details \| Splinter Review
kernel patch - wait for SBE (1.44 KB, patch) 2015-01-03 20:25 UTC, Ben Widawsky	Details \| Splinter Review
kernel patch - scoreboard even on idle PSD (1.58 KB, patch) 2015-01-03 20:26 UTC, Ben Widawsky	Details \| Splinter Review
enable batch buffer end workaround (1.49 KB, patch) 2015-01-05 00:01 UTC, Ben Widawsky	Details \| Splinter Review
Show Obsolete (3) View All

Description Knut Rupprecht 2014-12-21 18:55:57 UTC

Created attachment 111123 [details]
dmesg

Hi,
I'm using a clean Openelec 4.2.1 installation and just browsing the menus the system hangs for a couple of seconds.

[   74.253274] [drm] stuck on render ring
[   74.254915] [drm] GPU HANG: ecode 0:0x85df3c1d, in xbmc.bin [710], reason: Ring hung, action: reset
[   74.254919] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   74.254922] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   74.254925] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   74.254928] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   74.254931] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   76.250758] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[   80.245684] [drm] stuck on render ring

Comment 1 Knut Rupprecht 2014-12-21 18:56:59 UTC

Created attachment 111124 [details]
Error state

Comment 2 Knut Rupprecht 2014-12-21 18:58:08 UTC

Created attachment 111125 [details]
xbmc debuglog

Comment 3 Chris Wilson 2014-12-21 19:26:24 UTC

The symptoms don't match the recently fixed HSW GT1 bug,

commit 2c550183476dfa25641309ae9a28d30feed14379
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 16 10:02:27 2014 +0000

    drm/i915: Disable PSMI sleep messages on all rings around context switches

but it is worthwhile trying drm-intel-nightly just in case. I presume it is a different bug though...

Comment 4 Knut Rupprecht 2014-12-22 08:32:41 UTC

Also I tried the following kernels and get hangs for each of them:
- 3.13 ubuntu
- 3.15 ubuntu
- 3.15.0 ubuntu with i915.enable_rc6=0
- 3.17 OpenELEC RC3 which includes those gpu hang fixes:
https://github.com/OpenELEC/OpenELEC.tv/blob/openelec-5.0/packages/linux/patches/3.17.7/linux-010-intel-flush-flags.patch
https://github.com/OpenELEC/OpenELEC.tv/blob/openelec-5.0/packages/linux/patches/3.17.7/linux-010-intel-flush-flags.patch

- 3.18
Using ubuntu
http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/current/
http://es.archive.ubuntu.com/ubuntu/pool/main/l/linux-firmware/linux-firmware_1.140_all.deb

I let memtest run once with no errors, and have not updatet to the latest BIOS yet.
The system is:
4GB-Kit G-Skill PC3-12800U CL9
ASRock B85M-ITX
Intel Pentium G3220 Box
WD Purple WD10PURX 1TB

I can upload the logs for 3.15 and 3.17 later.

Comment 5 Knut Rupprecht 2014-12-22 08:35:07 UTC

Created attachment 111149 [details]
3.13.0-dmesg

Comment 6 Knut Rupprecht 2014-12-22 08:35:36 UTC

Created attachment 111150 [details]
3.13.0-error.tar.gz

Comment 7 Knut Rupprecht 2014-12-22 08:36:02 UTC

Created attachment 111151 [details]
3.13.0-kodi.log

Comment 8 Knut Rupprecht 2014-12-22 08:36:51 UTC

Created attachment 111153 [details]
3.15.0_enable_rc6=0-dmesg

Comment 9 Knut Rupprecht 2014-12-22 08:37:09 UTC

Created attachment 111154 [details]
3.15.0_enable_rc6=0-error.tar.gz

Comment 10 Knut Rupprecht 2014-12-22 08:37:29 UTC

Created attachment 111156 [details]
3.15.0_enable_rc6=0-kodi.log

Comment 11 Knut Rupprecht 2014-12-22 08:37:57 UTC

Created attachment 111158 [details]
3.18.0-dmesg

Comment 12 Knut Rupprecht 2014-12-22 08:38:19 UTC

Created attachment 111159 [details]
3.18.0-error.tar.gz

Comment 13 Knut Rupprecht 2014-12-22 08:38:36 UTC

Created attachment 111161 [details]
3.18.0-kodi.log

Comment 14 Knut Rupprecht 2014-12-22 08:38:56 UTC

Created attachment 111162 [details]
dmidecode

Comment 15 Kenneth Graunke 2014-12-22 08:43:30 UTC

CC'ing Ben as he was recently looking at HSW GT1 hangs.

Comment 16 Ben Widawsky 2014-12-22 19:20:44 UTC

Hi Knut. Have you tried the latest drm-intel-nightly? A fix went in from Chris Wilson to address issues like this which were mostly on IVB. It did fix our local HSW GT1 hang though.

If that doesn't work, please try this patch:
http://patchwork.freedesktop.org/patch/39363/

Comment 17 Knut Rupprecht 2014-12-22 23:40:42 UTC

Created attachment 111195 [details]
Error State

Comment 18 Knut Rupprecht 2014-12-23 06:09:05 UTC

Comment on attachment 111195 [details]
Error State

This is including Bens patch.

Comment 19 Peter Frühberger 2014-12-23 19:06:49 UTC

Created attachment 111227 [details]
Playing clannad.mkv with vaapi and tracing batch and dword

As requested on irc by bwidawks

Comment 20 Knut Rupprecht 2014-12-23 19:15:25 UTC

Created attachment 111228 [details]
bw1 Error no video playback only browsing the menu

Comment 21 Knut Rupprecht 2014-12-23 19:16:43 UTC

Created attachment 111229 [details]
no video playback only browsing the menu va-log1

Comment 22 Knut Rupprecht 2014-12-23 19:17:26 UTC

Created attachment 111230 [details]
no video playback only browsing the menu va-log2

Comment 23 Ben Widawsky 2014-12-23 19:35:59 UTC

Created attachment 111233 [details] [review]
limit max PS threads for gt1

Comment 24 Ben Widawsky 2014-12-23 19:52:43 UTC

Created attachment 111235 [details] [review]
The equivalent mesa patch

Comment 25 Knut Rupprecht 2014-12-23 22:48:17 UTC

Created attachment 111244 [details]
max_wm_threads_70-error.tar.gz

Comment 26 Knut Rupprecht 2014-12-24 00:47:47 UTC

The limit_max_PS_threads patches seem to improve the situation. So far I experience far less hangs and I didn't actually see one yet while watching the video. So I assume they are very short.
Before hangs would at least disturb playback and often freeze the picture for several seconds.

Comment 27 Knut Rupprecht 2014-12-24 00:53:00 UTC

A new issue ist that now the error state is empty.
/var/log/Xorg.0.log:
http://paste.ubuntu.com/9607575/
dmesg:
http://paste.ubuntu.com/9607552/
it contains this warning:
[   35.202724] WARNING: CPU: 0 PID: 1175 at drivers/gpu/drm/i915/i915_gem_execbuffer.c:126 eb_lookup_vmas.isra.15+0x373/0x410 [i915]()

Comment 28 Chris Wilson 2014-12-24 07:25:13 UTC

(In reply to Knut Rupprecht from comment #26)
> The limit_max_PS_threads patches seem to improve the situation. So far I
> experience far less hangs and I didn't actually see one yet while watching
> the video. So I assume they are very short.
> Before hangs would at least disturb playback and often freeze the picture
> for several seconds.

Your errorstate still has max threads = 101 in the mesa batch that hung.

Comment 29 Peter Frühberger 2014-12-24 08:11:43 UTC

@Knut:

Please provide dpkg -l |grep 10.1.3, so that I can see which packages did not yet update after your build.

Comment 30 Knut Rupprecht 2014-12-24 11:04:15 UTC

Created attachment 111271 [details]
dmesg including kernel, intel and mesa patch

This time Peter provided the patched mesa packages.

Comment 31 Knut Rupprecht 2014-12-24 11:06:33 UTC

Created attachment 111272 [details]
Xorg.0.log including kernel, mesa and intel patches.

This crash was unrelated (fast forwarded too fast)
[ 5506.063941] show_signal_msg: 75 callbacks suppressed
[ 5506.063944] DVDPlayerVideo[2388]: segfault at 7fa9c09921e0 ip 00007fa9c09921e0 sp 00007fa9c6061ad8 error 15

Comment 32 Knut Rupprecht 2014-12-24 11:19:00 UTC

Didn't the patches change max_threads from 102->70? I'm wondering how it could be at 101.
Could it be that gt1 isn't correctly identified somewhere so the wrong max_threads is used, or max_threads is set somewhere else aswell?

Comment 33 Ben Widawsky 2014-12-24 18:19:10 UTC

Comment on attachment 111244 [details]
max_wm_threads_70-error.tar.gz

This error state did not have the mesa patch in it. It is leading to confusion.

Comment 34 Ben Widawsky 2014-12-24 18:20:31 UTC

Comment on attachment 111271 [details]
dmesg including kernel, intel and mesa patch

We were unable to retrieve the error state from this for some reason:
100540            zeeeh │ [15:17:44] bwidawks, no flames so far :)
100540          fritsch │ [15:18:03] let's see if it survives the mesa compile
100540         bwidawks │ [15:56:11] fritsch, zeeeh no news is good news?
100540            zeeeh │ [15:58:05] bwidawks, No :/ I'm uploading. Fritsch is gone.
100540            zeeeh │ [16:02:21] bwidawks, I think I have to test again, the error file was empty
100540         bwidawks │ [16:02:51] zeeeh: can you pastebin dmesg?
100540            zeeeh │ [16:19:30] bwidawks, http://paste.ubuntu.com/9607552/
100540                  │ [ http://127.0.0.1:46704/3Qh ]
100540         bwidawks │ [16:20:34] zeeeh: still hanging in mesa... I really want the error state now :-)
100540         bwidawks │ [16:20:39] oh wait
100540         bwidawks │ [16:20:46] zeeeh: are you using SNA?
100540         bwidawks │ [16:21:10] i presume that may have the wrong threadcounts too
100540            zeeeh │ [16:21:22] uh whats sna?
100540         bwidawks │ [16:21:38] can you pastebin /var/log/Xorg.0.log?
100540            zeeeh │ [16:22:15] http://paste.ubuntu.com/9607575/
100540                  │ [ http://127.0.0.1:46704/3Qi ]
100540         bwidawks │ [16:22:15] zeeeh: hmm, you're actually hitting a kernel assertion here
100540         bwidawks │ [16:23:10] a strange warning too
100540            zeeeh │ [16:24:19] maybe I made an error installing the new mesa?
100540         bwidawks │ [16:25:06] zeeeh: well, the kernel issue is really strange, I don't want to look at that. If I had the error state, I could tell you
100540            zeeeh │ [16:27:22] bwidawks, I rebooted before this hang, but the error file again is empty

Comment 35 Ben Widawsky 2014-12-24 18:22:44 UTC

Knut, can you confirm once again the following:
1. Reboot the machine.
2. Using both the mesa and vaapi packages Peter built, reproduce the hang.
3. Try to read error state from sysfs.

Comment 36 Knut Rupprecht 2014-12-24 23:02:44 UTC

Created attachment 111301 [details]
dump including kernel, intel and mesa patch

Comment 37 Knut Rupprecht 2014-12-24 23:03:18 UTC

Created attachment 111302 [details]
dmesg including kernel, intel and mesa patch

Comment 38 Knut Rupprecht 2014-12-24 23:04:03 UTC

Created attachment 111303 [details]
Xorg.log including kernel, intel and mesa patch

Comment 39 Ben Widawsky 2014-12-24 23:54:42 UTC

Created attachment 111305 [details] [review]
Same as previous, but for the kernel

The kernel sets up a default context with thread counts. Make sure we obey the PS thread count rules

Comment 40 Ben Widawsky 2014-12-24 23:58:38 UTC

Created attachment 111306 [details] [review]
Change PS thread count for null render context (kernel)

Comment 41 Knut Rupprecht 2014-12-25 00:43:43 UTC

Created attachment 111307 [details]
This should include Bens latest patch.

Comment 42 Ben Widawsky 2014-12-25 03:49:27 UTC

Just to recap on the status:
No patches: 2.5s to BSD hang
intel-vaapi: ??s to mesa hang
intel-vaapi + mesa patch: 3-10m to mesa hang
intel-vappi + mesa patch + kernel patch: 20m to mesa hang
intel-vaapi + mesa + kernel + blt only: being tested

While it could be a false correlation, the patches so far seem to have improved the situation. The SNA fix seems to be in master as well, but Knut wasn't able to test that in the limited time he had, so we went with blt only instead.

Expecting an update from Knut on the blt only DDX. Chris, will blit only never emit 3d state?

Comment 43 Chris Wilson 2014-12-25 09:23:52 UTC

(In reply to Ben Widawsky from comment #42)
> Expecting an update from Knut on the blt only DDX. Chris, will blit only
> never emit 3d state?

AccelMethod "BLT" will never emit any 3D commands.

Comment 44 Chris Wilson 2014-12-25 09:28:48 UTC

(In reply to Knut Rupprecht from comment #41)
> Created attachment 111307 [details]
> This should include Bens latest patch.

It dies before the end of a reasonably long mesa batch, with multiple PS kernels (actually alternating between a pair of kernels) each only using 70 threads. That does imply that it is not the PS kernel itself, but it could still be the PS state hitting a corner condition for the first time.

Comment 45 Knut Rupprecht 2014-12-25 09:31:40 UTC

Created attachment 111316 [details]
error dump with all patches and SNA using generic backend

I started testing at 02:00, hangs occured at 02:09 and 02:10. The next 8 hours it didn't hang, although there have been 2 other errors:

[Do Dez 25 04:58:18 2014] [drm] HPD interrupt storm detected on connector HDMI-A-2: switching from hotplug detection to polling
[Do Dez 25 06:53:36 2014] perf interrupt took too long (2521 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

Comment 46 Knut Rupprecht 2014-12-25 09:32:12 UTC

Created attachment 111317 [details]
dmesg with all patches and SNA using generic backend

Comment 47 Knut Rupprecht 2014-12-25 09:33:13 UTC

Created attachment 111318 [details]
Xorg-log with all patches and SNA using generic backend

Comment 48 Knut Rupprecht 2014-12-25 22:11:13 UTC

Created attachment 111335 [details]
Error state with pipecontrol patch

Including the previous patches and this one:
http://pastebin.com/raw.php?i=p0nZv2bd

It took 15minutes until crash.

Comment 49 Peter Frühberger 2014-12-25 22:13:11 UTC

The vaapi patch above was a write up by Ben. I only made it compile and exported it via git.

Comment 50 Ben Widawsky 2014-12-25 23:17:10 UTC

The error state is once gain hanging in mesa. Seems to me like a failure in the SBE, and then the rest of the pipe dies.

I am not sure mesa doesn't also need an extra pipe control like vaapi. I'm recommending this to try if possible (again a long shot)
http://cgit.freedesktop.org/~bwidawsk/mesa/commit/?h=gt1&id=372aab5291dc65789f596cb849925c1f535741d5

Comment 51 Dan Getz 2014-12-30 19:21:55 UTC

the bug is still here.
getting:
 [drm] stuck on render ring
 :
 :

happens deterministically when surfing to www.google.com/chrome using fresh install Firefox.

on a fresh clean Linux Mint 17.1 install with drm-intel-nightly build kernel (drm-intel-nightly: 2014y-12m-30d-13h-01m-34s). this kernel includes patches mentioned above.

using a ThinkPad X61 with Intel GM965/GL960 Integrated Graphics Controller.

Comment 52 Ben Widawsky 2014-12-30 19:26:02 UTC

Dan, yours is almost certainly a different bug. See https://bugs.freedesktop.org/show_bug.cgi?id=80568 (among many others). As the subject states, this bug is specific to HSW GT1.

Comment 53 Knut Rupprecht 2014-12-31 16:01:59 UTC

Created attachment 111582 [details]
Error State mesa 10.4.0

This is with mesa 10.4.0, drm-intel-nightly including all the patches from this bugtracker. Took about a minute to hang.

Comment 54 Knut Rupprecht 2014-12-31 16:02:58 UTC

Created attachment 111583 [details]
dmesg mesa 10.4.0

Comment 55 Knut Rupprecht 2014-12-31 16:03:35 UTC

Created attachment 111584 [details]
Xorg.log mesa 10.4.0

Comment 56 Ben Widawsky 2014-12-31 19:33:03 UTC

Created attachment 111596 [details] [review]
Always initialize streamout buffers

Another longshot.

As the commit message says, there is garbage in the last error state. As an example:
0x001a4228:      0x79180002: 3DSTATE_SO_BUFFER
0x001a422c:      0x44954ffe:    DWord 1:
                                   SO Buffer Index: 2
                                   SO Buffer Object Control State: 2
                                   Surface Pitch: 4094
0x001a4230:      0x169a840f:    DWord 2:
                                   Surface Base Address: 0x169a840c
0x001a4234:      0xcde4100d:    DWord 3:
                                   Surface End Address: 0xcde4100c

This patch should initialize the SOl state regardless of whether we use xfb. It's probably something we want to put in the kernel null ctx setup, but for now we can just test it in mesa.

Comment 57 Knut Rupprecht 2015-01-01 01:33:05 UTC

Created attachment 111599 [details]
Error State incl. 'Always initialize streamout buffers' patch

Comment 58 Ben Widawsky 2015-01-01 01:48:25 UTC

Comment on attachment 111599 [details]
Error State incl. 'Always initialize streamout buffers' patch

17:45:31        zeeeh │ bwidawks, I made an error, last error doesn't have the new patch.

Comment 59 Knut Rupprecht 2015-01-01 02:23:28 UTC

Created attachment 111600 [details]
Error dump for 'Always initialize streamout buffers'

Comment 60 Ben Widawsky 2015-01-01 03:07:10 UTC

Created attachment 111602 [details] [review]
Also initialize the streamout declaration list

Again I see garbage in the context state. Let's see if we can clean it up and uncover any real issues.

Comment 61 Knut Rupprecht 2015-01-01 10:48:41 UTC

Created attachment 111604 [details]
Error State including 'initialize the streamout declaration list'

Comment 62 Knut Rupprecht 2015-01-01 10:51:51 UTC

Created attachment 111605 [details]
dmesg including 'initialize the streamout declaration list'

The video ran 6 hours over night without a hang, but then it did hang.

Comment 63 Ben Widawsky 2015-01-02 00:20:50 UTC

Created attachment 111629 [details]
Only initialize, don't enable the SOL for null state

Comment 64 Knut Rupprecht 2015-01-02 05:14:54 UTC

Created attachment 111632 [details]
Error State including 'vertex fetcher NULL state'

Including all patches:
It played straight for 4 hours without interaction, then crashed when I navigated the menus for about 2 minutes.

Comment 65 Ben Widawsky 2015-01-02 05:51:26 UTC

Created attachment 111637 [details]
We can't set non-zero streamout for inactive streams

Another screw up in the so_decl initialization patch

Comment 66 Knut Rupprecht 2015-01-02 10:23:26 UTC

Created attachment 111650 [details]
Error State including 'fix the so_decl_list initialization'

Comment 67 adam 2015-01-03 18:17:36 UTC

Similar problem here running XBMC/KODI on an Intel NUC DN2820FYKH (http://ark.intel.com/de/products/78953/Intel-NUC-Kit-DN2820FYKH). I'm running XBMC 13.2 on Ubuntu 14.04 amd64 on two NUCs. There are two different version of the DN2820FYKH. They contain different CPUs, the older contains a N2820 Celeron the newer contains a N2830 Celeron (http://ark.intel.com/de/compare/79052,81071).

Everything works fine on the N2820 model, but on the N2830 model I get these GPU HANGs. I also tried the current OpenELEC 5.0.0, same problem there.

[  102.437084] [drm] stuck on render ring
[  102.443487] [drm] GPU HANG: ecode 0:0x87f73c06, in kodi.bin [548], reason: Ring hung, action: reset
[  102.443500] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  102.443504] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  102.443507] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  102.443510] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  102.443513] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  108.441778] [drm] stuck on render ring
[  108.448158] [drm] GPU HANG: ecode 0:0x85fffffa, in kodi.bin [548], reason: Ring hung, action: reset
[  108.481529] DVDPlayerVideo[635]: segfault at 7f4c00000009 ip 00007f4c7435bb90 sp 00007f4c227fb830 error 4 in libc-2.20.so[7f4c742e3000+194000]

or

[   21.360292] [drm] stuck on render ring
[   21.368137] [drm] GPU HANG: ecode 0:0x87f73c1e, in kodi.bin [578], reason: Ring hung, action: reset
[   21.368152] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   21.368162] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   21.368171] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   21.368180] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   21.368190] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   27.381520] [drm] stuck on render ring
[   27.389472] [drm] GPU HANG: ecode 0:0x85fffffc, in kodi.bin [578], reason: Ring hung, action: reset

Comment 68 Ben Widawsky 2015-01-03 20:25:08 UTC

Created attachment 111699 [details] [review]
Kernel patch - single port dispatch

All the hangs that Knut has sent me are related to the PSD, or SBE interaction. Coincidentally, we use this on IVB GT1.

Comment 69 Ben Widawsky 2015-01-03 20:25:43 UTC

Created attachment 111700 [details] [review]
kernel patch - wait for SBE

Comment 70 Ben Widawsky 2015-01-03 20:26:09 UTC

Created attachment 111701 [details] [review]
kernel patch - scoreboard even on idle PSD

Comment 71 Ben Widawsky 2015-01-04 21:34:41 UTC

Adam, yours is a different bug. You have a Baytrail. Yours is likely:
https://bugs.freedesktop.org/show_bug.cgi?id=88012

Comment 72 Ben Widawsky 2015-01-05 00:01:45 UTC

Created attachment 111740 [details] [review]
enable batch buffer end workaround

Implements a workaround mandated by the spec (for all HSW).

Let's test this with all the thread count patches (mesa, kernel, vaapi)

1. And all the mesa extra mesa patches with the streamout buffer, and so_decl stuff (through https://bugs.freedesktop.org/attachment.cgi?id=111629)

2. Only the mesa thread count patch

I'll take error state by email to avoid clutter. If anything is interesting, I will post it.

Comment 73 Ben Widawsky 2015-03-04 23:19:19 UTC

I have a branch now with both workarounds:
http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=workarounds

Comment 74 Ben Widawsky 2015-10-07 03:19:58 UTC

We never heard back. Please re-open if there is still an issue.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.