Created attachment 60568 [details] R600_DUMP_SHADERS Hello, I have an application using a rather complex shader with some branching - while/if. Applications fails with r600 driver giving the following error: EE r600_shader.c:140 r600_pipe_shader_create - translation from TGSI failed ! r600_state_common.c:761:r600_draw_vbo: Assertion `0' failed. It seems something goes wrong in the branching section, since it works if I comment it. The same shader works fine using either LIBGL_ALWAYS_SOFTWARE=1 or fglrx. Also, I can remember working it fine with some older revision of R600, unfortunately I don't know which one exactly. I have attached R600_DUMP_SHADERS output. If needed I can also provide links to source code or any other data that may be helpful in debugging. Some relevant parts of glxinfo: OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD RV770 OpenGL version string: 2.1 Mesa 8.1-devel (git-1a33c1b precise-oibaf-ppa) Best regards, Marko
Probably register limit. Shader uses 5 inputs + 8 outputs + 112 temps = 125 registers. I think it should work if you could make it less than 120.
Any suggestions on how to accomplish that? I tried turning my code around and around, but so far with no success. What should help with the register count? On the other hand, why is the register limit set so low? I just tested the program with Mesa 7.11, same hardware and r600 drivers - it works. This seems like a regression. If needed, there are binaries/source available for testing at http://thelarge.org
(In reply to comment #2) > Any suggestions on how to accomplish that? I tried turning my code around and > around, but so far with no success. What should help with the register count? > > On the other hand, why is the register limit set so low? I just tested the > program with Mesa 7.11, same hardware and r600 drivers - it works. This seems > like a regression. > If needed, there are binaries/source available for testing at > http://thelarge.org It's a hardware limit. The compiler in theory should optimize register allocation, but the problem is that r600g still lacks real register allocator. And probably some changes since 7.11 increased register usage in the TGSI IR. I'll see if I can help with that shader somehow, but generally r600g needs a better shader compiler. There is some work in progress on that, but I don't know when it will be completed. Also there is some experimental code that probably could help with that, but currently it works only with evergreen GPUs. If you could use a gpu of the evergreen class (IIRC it's all of 5xxx, some of 6xxx cards), then you might want to try r600_shader_opt and r600_shader_opt_2 branches from the following repo: https://github.com/VadimGirlin/mesa
Created attachment 60642 [details] lorentzTransform function It seems you could replace the following lines in the lorentzTransform function: r.w = g*p.w - v.x*g*p.x - v.y*g*p.y - v.z*g*p.z; r.x = -v.x*g*p.w + (1.0 + gm1*v.x*v.x/v2)*p.x + (gm1*v.x*v.y/v2)*p.y + (gm1*v.x*v.z/v2)*p.z; r.y = -v.y*g*p.w + (gm1*v.y*v.x/v2)*p.x + (1.0 + gm1*v.y*v.y/v2)*p.y + (gm1*v.y*v.z/v2)*p.z, r.z = -v.z*g*p.w + (gm1*v.z*v.x/v2)*p.x + (gm1*v.z*v.y/v2)*p.y + (1.0+gm1*v.z*v.z/v2)*p.z; with vec3 p3 = vec3(p.x, p.y, p.z); float t = dot(v, p3); float t2 = gm1*t/v2 - g*p.w; r = vec4( v*t2 + p3, g * (p.w - t)); Attachment contains the complete text of the modified function with the separate steps of the transformation in the comments. Please check if all steps are correct. Anyway, it shows the direction. Original shader uses 130 regs, 1262 vliw alu instructions on my system. Modified version - 81 reg, 778 instructions.
> vec3 p3 = vec3(p.x, p.y, p.z); > float t = dot(v, p3); > float t2 = gm1*t/v2 - g*p.w; > r = vec4( v*t2 + p3, g * (p.w - t)); > Vadim, this is really great and much appreciated. I went through the steps and it all seems fine to me, also tested some examples and everything works great. This also makes the advanced path in the shader work again without my quirky workarounds in the while loop. I never really took much care with these kind of optimisations, somehow blindly hoping that the compiler will automagically optimise everything. I'll try to be more careful in advance. I'd like to put some comments in the code, like "optimised by Vadim" if that's ok with you or perhaps should I use your real name? Thanks.
(In reply to comment #5) > > vec3 p3 = vec3(p.x, p.y, p.z); > > float t = dot(v, p3); > > float t2 = gm1*t/v2 - g*p.w; > > r = vec4( v*t2 + p3, g * (p.w - t)); > > > > Vadim, this is really great and much appreciated. I went through the steps and > it all seems fine to me, also tested some examples and everything works great. > This also makes the advanced path in the shader work again without my quirky > workarounds in the while loop. I never really took much care with these kind of > optimisations, somehow blindly hoping that the compiler will automagically > optimise everything. I'll try to be more careful in advance. > > I'd like to put some comments in the code, like "optimised by Vadim" if that's > ok with you or perhaps should I use your real name? Thanks. Ah, yes, I forgot to set my full name in the account here. Updated now, so you can use it if you want.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/408.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.