Bug 103228

Summary: GPU hang in compute shader
Product: DRI Reporter: Steinar H. Gunderson <sgunderson>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED NOTOURBUG QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: HSW i915 features: GPU hang
Attachments:
Description Flags
GPU hang log none

Description Steinar H. Gunderson 2017-10-11 17:34:05 UTC
Created attachment 134798 [details]
GPU hang log

Hi,

I'm developing a compression system through compute shaders. The first shader now works fine, but after adding a second shader (even if it's got an empty main()), I've started getting GPU hangs:

[2557115.719316] [drm] GPU HANG: ecode 7:0:0x8fd8ffff, in narabu-encoder [11429], reason: Hang on rcs0, action: reset
[2557115.719318] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[2557115.719319] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[2557115.719320] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[2557115.719321] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[2557115.719321] [drm] GPU crash dump saved to /sys/class/drm/card0/error

This is a Haswell laptop (Lenovo X240), Mesa 17.2.2, kernel 4.13.0. Attached is a gzipped crash dump (error.gz). My current code is available at

  https://storage.sesse.net/haswell-gpu-hang.tar.xz

Make narabu-encoder and run it; it hangs the GPU nearly every time for me, adding 125 ms or so to the measured runtime.
Comment 1 Steinar H. Gunderson 2017-10-11 21:32:20 UTC
Hm, after some research, it seems I cannot rely on compute shader invocations not to stomp on each other, even from the same draw call. If I insert glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT) between the two glDispatchCompute() calls, it stops hanging.

Nevertheless, it's odd that adding an empty shader should cause the problem to appear. I'm leaning towards not-a-bug, though.
Comment 2 Jordan Justen 2017-10-25 23:16:23 UTC
There's a lot of potential state changes that could occur
by changing and dispatching another program, even if it is
empty. Therefore it is at least plausible that the empty
program could somehow trigger the application memory
barrier bug to show itself. Anyway, I'll close this for now.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.