Given that there is a _mesa_3dnow_transform_points4_2d in the x86-64 asm (using MMX/3DNow! is deprecated in x86-64), it appears that this code was copy-pasted. I wrote a quick patch to change prefetch[w] to prefetcht1, which is more or less the equivalent in SSE. However, I'm not actually sure those prefetches really benefit the code since they appear to be monotonic addresses and hinting only 16 bytes ahead (a cache line is almost always at least 32 bytes) -- maybe that sort of testing is for another day.