vec3(v.z*nf1+v.y*nf2+v.x*nf3) could be optimized as v.zyx * mat3(nf1, nf2, nf3)
This gives a significant performance improvement at least on r600.
Sorry, that should be mat3(nf1, nf2, nf3) * v.zyx.
Example change from real-world example implementing this optimization:
1.7 - fragment+='vec3 _nrm = normalize(vec3(norm2.z*nf1+norm2.y*nf2+norm2.x*nf3)*2.0-1.0);';
1.8 + fragment+='vec3 _nrm = normalize(mat3(nf3, nf2, nf1)*norm2.xyz*2.0-1.0);';
Removing dependency from i965-perf. Those two expressions produce the same code in the i965 backend.