Bug 38547 - r600g fails shader, tries to run with failed shader, freezes.
Summary: r600g fails shader, tries to run with failed shader, freezes.
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-21 22:04 UTC by David L.
Modified: 2014-04-13 11:48 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
R600_DUMP_SHADERS + errors for first failing shader (113.89 KB, text/plain)
2011-06-21 22:07 UTC, David L.
Details
debug output (12 bytes, text/plain)
2011-06-24 22:16 UTC, David L.
Details
debug output (77.12 KB, application/x-bzip2)
2011-06-24 22:21 UTC, David L.
Details

Description David L. 2011-06-21 22:04:29 UTC
when trying to create a character in EVE Online, after the first few screens I encounter multiple shaders that cannot be translated from TGSI. The error message printed is:
"r600_pipe_shader_create - translation from TGSI failed !"

I've traced this down to:
^ check_and_set_bank_swizzle (-1) from
^ r600_bc_add_alu_type from
^ r600_shader_from_tgsi from
^ r600_pipe_shader_create (opcode: 0x09 "ADD")


Problematically, the application continues running after that, complains about "missing shader" and finally - a whole few seconds later, continuing to draw the loading animation - freezes the GFX card:

[ 4996.188064] radeon 0000:03:00.0: GPU lockup CP stall for more than 10000msec
[ 4996.188072] ------------[ cut here ]------------
[ 4996.188147] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:246 radeon_fence_wait+0x21c/0x2bb [radeon]()
[ 4996.188160] GPU lockup (waiting for 0x00011BDD last fence id 0x00011BD9)

after that I have to kill my X server and restart it.

(So this is basically two bugs, the shader failing, and the driver trying to use a failed shader I assume - though the lockup might also be unrelated)


System information:
mesa 21972c85ea734dbfcf69629c6b0b940efb42d4ba on Linux 2.6.39.1
32-bit chroot on 64-bit host
03:00.0 VGA compatible controller: ATI Technologies Inc RV630 [Radeon HD 2600 Series]


R600_DUMP_SHADERS output for failed shader following in attachment.
Comment 1 David L. 2011-06-21 22:07:02 UTC
Created attachment 48266 [details]
R600_DUMP_SHADERS + errors for first failing shader

(please note that I added more debug statements, so the line numbers are a few lines off in those last 3 error lines)
Comment 2 David L. 2011-06-21 22:19:08 UTC
i hacked up a counter to tell me which instruction is failing, it seems to be

2517:   ADD TEMP[161].x, -TEMP[158].xxxx, CONST[0].wwww

if i'm not totally wrong. (my counter says 2518, but it is an ADD that fails so i think I'm off by one...)
Comment 3 Jerome Glisse 2011-06-22 09:23:14 UTC
Do you have the original glsl shader ? (env var MESA_GLSL="dump" should dump it)

This shader is awfully big, issue is that it use more temporary than hw has, i am guessing we never check for this. A proper compiler might be able to cut down the number of temporary as there is a lot of scalar operation (only one component of a vector use).
Comment 4 David L. 2011-06-22 19:56:23 UTC
(In reply to comment #3)
> Do you have the original glsl shader ? (env var MESA_GLSL="dump" should dump
> it)

I can't dump the shader due to #38584.

Also, I'm sorry, I misidentified the failing instruction. TGSI 0x09 isn't ADD but DP3, and my hacked-in instruction counter was not off by one after all... so the instruction that fails is

2517:   ADD TEMP[161].x, -TEMP[158].xxxx, CONST[0].wwww
2518>>> DP3 TEMP[162].x, TEMP[138].xyzz, -TEMP[153].xyzz
2519:   MOV TEMP[20].w, TEMP[162].xxxx
2520:   ADD TEMP[163].xy, -TEMP[147].yyyy, TEMP[20].xwww

It fails to compile two more shaders before it crashes, those bail out on:

472:   ADD TEMP[150].x, -TEMP[147].xxxx, -CONST[0].wwww
473>>> DP3 TEMP[151].x, TEMP[127].xyzz, -TEMP[142].xyzz
474:   ADD TEMP[152].x, -TEMP[136].yyyy, TEMP[136].xxxx
475:   ADD TEMP[153].x, -TEMP[136].yyyy, TEMP[151].xxxx

and

473:   ADD TEMP[151].x, -TEMP[148].xxxx, -CONST[0].wwww
474>>> DP3 TEMP[152].x, TEMP[128].xyzz, -TEMP[143].xyzz
475:   ADD TEMP[153].x, -TEMP[137].yyyy, TEMP[137].xxxx
476:   ADD TEMP[154].x, -TEMP[137].yyyy, TEMP[152].xxxx

I have, however, also seen it crash on ADDs (opcode 0x08) - can't reproduce that right now though. Is there a register limit at around 150...160?
Comment 5 Jerome Glisse 2011-06-23 06:22:47 UTC
The mesa dump shader should at least print the shader that you already pasted here. Really need the glsl one
Comment 6 David L. 2011-06-24 22:16:49 UTC
Created attachment 48391 [details]
debug output

debug output attached, started on the game screen before the one using the failing shader. i can't quite make out what source belongs to what... look for "#2518" to find the error.

(no idea if the SEGV at the end is related)
Comment 7 David L. 2011-06-24 22:21:15 UTC
Created attachment 48392 [details]
debug output

debug output, really. turns out c++filt doesn't consider its arguments filenames...
Comment 8 Marek Olšák 2014-04-13 11:48:08 UTC
AFAIK, the driver can translate all shaders just fine, so this shouldn't occur anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.