Bug 111241 - Shadertoy shader causing hang
Summary: Shadertoy shader causing hang
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: 19.1
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-28 18:06 UTC by Felix Potthast
Modified: 2019-08-14 12:56 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
tgsi version of the shader (47.00 KB, text/plain)
2019-08-09 16:22 UTC, Pierre-Eric Pelloux-Prayer
Details
nir version (52.81 KB, text/plain)
2019-08-09 16:22 UTC, Pierre-Eric Pelloux-Prayer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Potthast 2019-07-28 18:06:39 UTC
When opening https://www.shadertoy.com/view/3lsXDB on my Desktop PC
with Radeon HD 7870 Graphics card (Pitcairn) i get a freeze.

It works fine on my Laptop with Intel Graphics.

Both systems use Mesa 19.1.3
Comment 1 Pierre-Eric Pelloux-Prayer 2019-08-08 21:34:59 UTC
I could reproduce the issue on a Raven Ridge and a Navi10.

But when using NIR (radeonsi_enable_nir=true env variable) the shader is perfectly usable.
Comment 2 Dieter Nützel 2019-08-09 00:15:33 UTC
RX 580 / NIR
amd-staging-drm-next
Mesa git
Firefox 68.0.1

[42489.228053] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!                                                                                  
[42494.348053] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[42508.171689] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!

[42513.035801] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered

[42556.811021] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!                                                                                  
[42561.418927] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered

[42571.658863] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Comment 3 Pierre-Eric Pelloux-Prayer 2019-08-09 16:22:09 UTC
Here's my understanding of the issue.

This shader uses 2 passes:
 - the first pass has BufferA as input and output and does:

if (first frame)
  // init bufferA content
else
  // do something useful

 - the 2nd pass has BufferA as input and does:

N = texelFetch(bufferA)
for(i=0; i < N; i++)
  // do something


The problem here is the "// init bufferA content": it fails to initialize the buffer content properly, leading to an infinite loop in the 2nd pass.

The exact code is:
   if (iFrame==0) { O -= O; return; }

If one replaces this line with:
   if (iFrame==0) { O = vec4(0.0f); return; }

The shader works fine (you can test the modified version here: https://www.shadertoy.com/view/wtSXzw ).
Comment 4 Pierre-Eric Pelloux-Prayer 2019-08-09 16:22:42 UTC
Created attachment 144993 [details]
tgsi version of the shader
Comment 5 Pierre-Eric Pelloux-Prayer 2019-08-09 16:22:59 UTC
Created attachment 144994 [details]
nir version
Comment 6 Pierre-Eric Pelloux-Prayer 2019-08-14 12:56:48 UTC
I've created a different shadertoy showing the problem: https://www.shadertoy.com/view/Wt2SW1 (but this one doesn't hang the GPU).

The shader for "Buffer A" is:

  0: MOV TEMP[0], SV[0]
  1: MAD TEMP[0].y, SV[0], CONST[0][2].xxxx, CONST[0][2].yyyy
  2: MOV OUT[0], IMM[0].xxxx
  3: USEQ TEMP[1].x, CONST[0][1].xxxx, IMM[1].xxxx
  4: UIF TEMP[1].xxxx
  5:   ADD TEMP[2], TEMP[2], -TEMP[2]
  6: ELSE
  [...]
 13: MOV OUT[0], TEMP[2]
 14: END

TEMP[2] is used before being assigned a value, so I suppose that's what allows LLVM to turn line 5 in:

   v_mov_b32_e32 v3, 0x7fc00000
   v_mov_b32_e32 v2, 0x7fc00000
   v_mov_b32_e32 v1, 0x7fc00000
   v_mov_b32_e32 v0, 0x7fc00000

(ie: output is NaN)

A possible way to fix this is to transform "dst = x - x" operations in "dst = 0" which is what nir does in its nir_opt_algebraic pass.

I've open a MR to fix/discuss this issue: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1681


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.