103228 – GPU hang in compute shader

Bug 103228 - GPU hang in compute shader

Summary: GPU hang in compute shader

Status:	CLOSED NOTOURBUG

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2017-10-11 17:34 UTC by Steinar H. Gunderson
Modified:	2018-01-04 18:06 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	HSW
i915 features:	GPU hang

Attachments
GPU hang log (3.49 KB, application/gzip) 2017-10-11 17:34 UTC, Steinar H. Gunderson	no flags	Details
View All

Description Steinar H. Gunderson 2017-10-11 17:34:05 UTC

Created attachment 134798 [details]
GPU hang log

Hi,

I'm developing a compression system through compute shaders. The first shader now works fine, but after adding a second shader (even if it's got an empty main()), I've started getting GPU hangs:

[2557115.719316] [drm] GPU HANG: ecode 7:0:0x8fd8ffff, in narabu-encoder [11429], reason: Hang on rcs0, action: reset
[2557115.719318] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[2557115.719319] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[2557115.719320] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[2557115.719321] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[2557115.719321] [drm] GPU crash dump saved to /sys/class/drm/card0/error

This is a Haswell laptop (Lenovo X240), Mesa 17.2.2, kernel 4.13.0. Attached is a gzipped crash dump (error.gz). My current code is available at

  https://storage.sesse.net/haswell-gpu-hang.tar.xz

Make narabu-encoder and run it; it hangs the GPU nearly every time for me, adding 125 ms or so to the measured runtime.

Comment 1 Steinar H. Gunderson 2017-10-11 21:32:20 UTC

Hm, after some research, it seems I cannot rely on compute shader invocations not to stomp on each other, even from the same draw call. If I insert glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT) between the two glDispatchCompute() calls, it stops hanging.

Nevertheless, it's odd that adding an empty shader should cause the problem to appear. I'm leaning towards not-a-bug, though.

Comment 2 Jordan Justen 2017-10-25 23:16:23 UTC

There's a lot of potential state changes that could occur
by changing and dispatching another program, even if it is
empty. Therefore it is at least plausible that the empty
program could somehow trigger the application memory
barrier bug to show itself. Anyway, I'll close this for now.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.