105770 – GPU hangs if a shader uses a barrier and a single-plane rep of a multiplane image

Bug 105770 - GPU hangs if a shader uses a barrier and a single-plane rep of a multiplane image

Summary: GPU hangs if a shader uses a barrier and a single-plane rep of a multiplane i...

Status:	RESOLVED MOVED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/Vulkan/intel (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Intel 3D Bugs Mailing List
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-03-27 19:22 UTC by atomnuker
Modified:	2019-09-18 19:48 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:
i915 features:

Attachments

Description atomnuker 2018-03-27 19:22:07 UTC

Hi,

If a shader uses a barrier after filling in some workgroup-shared memory from a single-plane representation of a multi-plane image (not sure if that's related), the GPU will hang.
> [67542.848596] i915 0000:00:02.0: Resetting rcs0 after gpu hang

The shader looks like:

#version 460
layout (set = 0, binding = 0) uniform sampler2D input_img;
layout (set = 0, binding = 1, rgba8) uniform writeonly image2D output_img;

#define FILTER_RADIUS (ivec2(4, 4))
#define CACHE_SIZE (ivec2(gl_WorkGroupSize) + FILTER_RADIUS*2)
shared vec4 cache[AREA(CACHE_SIZE)];

void main()
{
    ivec2 d;
    const ivec2 pos = ivec2(gl_GlobalInvocationID.xy);
    const ivec2 w = ivec2(gl_WorkGroupSize);
    const ivec2 l = ivec2(gl_LocalInvocationID.xy);

    for (d.y = l.y; d.y < CACHE_SIZE.y; d.y += w.y) {
        for (d.x = l.x; d.x < CACHE_SIZE.x; d.x += w.x) {
            const ivec2 np = pos + d - l - FILTER_RADIUS;
            cache[d.y*CACHE_SIZE.x + d.x] = texture(input_img, np);
        }
    }

    barrier();

    vec4 avg = vec4(0.0f);
    ivec2 start = ivec2(0);
    ivec2 end = FILTER_RADIUS*2 + 1;
    for (d.y = start.y; d.y < end.y; d.y++)
        for (d.x = start.x; d.x < end.x; d.x++)
             avg += cache[(l.y + d.y)*CACHE_SIZE.x + l.x + d.x];

    avg /= (end - start).x * (end - start).y;
    imageStore(output_img, pos, avg);
}

Removing the barrier() will make the shader execute fine (with incorrect output of course).
Using an image2D as an input or sampling it like above makes no difference, the GPU still hangs.
As a test case, compile https://github.com/atomnuker/FFmpeg/tree/exp_vulkan with --enable-vulkan and --enable-libshaderc and run "./ffmpeg_g -init_hw_device "vulkan=vk:0" -i <input> -filter_hw_device vk -vf format=yuv420p,hwupload,unsharp_vulkan -f null -".

Comment 1 atomnuker 2018-03-27 20:23:41 UTC

I should probably also say this only happens on chroma planes, using the luma plane works fine.

Comment 2 Lionel Landwerlin 2018-03-28 09:28:28 UTC

Could give the version of both the kernel & mesa you're running?

Comment 3 atomnuker 2018-03-28 10:09:57 UTC

(In reply to Lionel Landwerlin from comment #2)
> Could give the version of both the kernel & mesa you're running?

Kernel is 4.15.13-1-ARCH #1 SMP PREEMPT Sun Mar 25 11:27:57 UTC 2018 x86_64 GNU/Linux.
Mesa is d7a015cbc6

Comment 4 atomnuker 2018-03-29 03:20:49 UTC

I did some more testing and tried the latest mesa git (025105453a807a76754c6)
I think I know what causes it. Seems like even the simplest of cases can crash the GPU:

#version 460

layout (local_size_x = 16, local_size_y = 16, local_size_z = 1) in;
layout (set = 0, binding = 0, rgba8) uniform readonly image2D input_img;

void main()
{
    ivec2 pos = ivec2(gl_GlobalInvocationID.xy);
    if (pos.y < imageSize(input_img).y)
        barrier();
}

Image resolution is 1280x720. input_img contains plane 1, which is 640x360 (input image has 420 subsampling). I dispatch 40x23x1 workgroups, each with a size of 16x16x1.
If I use (pos.x < imageSize(input_img).x), the shader runs fine. If I remove the condition and always run the barrier, the shader runs fine.
However, if I dispatch 40x22x1 workgroups (1 less vertically), the shader also runs fine.
So the issue seems to be that not running the barrier() on a workgroup which has local position outside of the plane's dimensions (when the plane is a single-plane rep of a multiplane image) but running the barrier() in all other cases, is somehow what makes the GPU crash.

Comment 5 Jason Ekstrand 2018-03-29 04:17:31 UTC

Given the sizes stated, I suspect we may be looking at a SIMD32 issue.

Comment 6 atomnuker 2018-03-29 16:49:03 UTC

(In reply to Jason Ekstrand from comment #5)
> Given the sizes stated, I suspect we may be looking at a SIMD32 issue.

Certainly seems to be the case. Feeding in a 360x360 image (with a 16x16 workgroup size this results in 22.5 workgroups, which I align to the nearest larger integer, so 23x23 workgroups) and mapping the luma plane as input_img still makes it crash, so it's not related to chroma planes only. Feeding in anything divisible by 16 doesn't make the GPU crash.

Comment 7 GitLab Migration User 2019-09-18 19:48:35 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/834.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.