Bug 105291 - r600 [CEDAR]: GPU stalls when running shadertoy "ladybug"
Summary: r600 [CEDAR]: GPU stalls when running shadertoy "ladybug"
Status: RESOLVED NOTABUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL: https://www.shadertoy.com/view/4tByz3
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-28 14:15 UTC by Gert Wollny
Modified: 2018-03-02 18:03 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg of Radeon HD 6620G hang (78.99 KB, text/plain)
2018-03-01 23:54 UTC, russianneuromancer
Details
Screenshot on CEDAR with gallium HUS (476.92 KB, image/png)
2018-03-02 08:38 UTC, Gert Wollny
Details

Description Gert Wollny 2018-02-28 14:15:28 UTC
Running the shader toy example 

  https://www.shadertoy.com/view/4tByz3

causes a GPU stall on CEDAR:
radeon 0000:06:00.0: ring 0 stalled for more than 10360msec
radeon 0000:06:00.0: GPU lockup (current fence id 0x00000000000005c1 
                     last fence id 0x0000000000000644 on ring 0)
radeon 0000:06:00.0: Saved 4178 dwords of commands on ring 0.
...

[drm:r600_ib_test [radeon]] *ERROR* radeon: fence wait timed out.
[drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing 
                  IB on GFX ring (-110).

The same shader toy runs fine on BARTS.

Considering that this shader runs at 1.6 FPS on BARTS the timeout could also happen because it simply take to long to process a shader stage (i.e. one shader is 32750 DW long (optimized) and has a few long nested loops).
Comment 1 russianneuromancer 2018-03-01 23:54:12 UTC
Created attachment 137739 [details]
dmesg of Radeon HD 6620G hang

Same issue here when running on SUMO iGPU with Linux 4.15.5 and Mesa 18.1 git. Not reproducible on TURKS dGPU. Complete dmesg is attached. If running with any additional debug option is required please let me know.
Comment 2 Roland Scheidegger 2018-03-02 01:07:20 UTC
You could figure out if it simply takes too long by increasing the timeout (albeit that needs a recompile of the kernel module).
If so, I'm not sure what to do really. (Though I think the shader actually does spilling, possibly could eliminate some of it to speed it up, not sure though it's possible).
Comment 3 Gert Wollny 2018-03-02 08:38:03 UTC
Created attachment 137749 [details]
Screenshot on CEDAR with gallium HUS

It is actually not necessary to recompile the module, all is needed is 

  echo "options radeon lockup_timeout=100000" > /etc/modprobe.d/radeon.conf

and then reload the module (or reboot). 

With that the shadertoy runs at 0.086 FPS (see screenshot), which means the shaders are simply to heavy for the hardware to handle within the default timeout of 10s.
Comment 4 Gert Wollny 2018-03-02 08:45:21 UTC
The shader doesn't use many registers, even un-optimized the number is below 100, so spilling doesn't occur and beyond that I don't see any option to optimize it more (apart from rewriting the assembly manually and injecting it). Which means the only thing we could suggest to someone who wants to enjoy this shader on older hardware is to change the timeout like given above. Hence I'm closing this as NOTABUG.
Comment 5 Roland Scheidegger 2018-03-02 18:03:03 UTC
(In reply to Gert Wollny from comment #4)
> The shader doesn't use many registers, even un-optimized the number is below
> 100, so spilling doesn't occur and beyond that I don't see any option to
> optimize it more (apart from rewriting the assembly manually and injecting
> it). Which means the only thing we could suggest to someone who wants to
> enjoy this shader on older hardware is to change the timeout like given
> above. Hence I'm closing this as NOTABUG.

Ah yes I see. I was thinking it was using too many regs because my browser used an older driver version, where translation failed due to too many temp regs (in fact nearly 1000...). But that's no longer the case with more recent driver stack.
That shader really is from the "holy crap" department indeed, and the driver doesn't seem to do anything obviously wrong with it, so I agree hitting the timeout is expected (as a side note, things get really unresponsive here on my HD 5750...).


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.