Wishbone: Then of course there's code that's not bad as such, in fact it's so good it's almost magical, but quite unintuitive and extremely difficult to understand. Example:
The fast inverse square root algorithm.
rtcvb32: Good general explanation..
Error range of 0.17%, subtraction and multiplications used, but not divides (
which is the slowest). It feels like the main advantage this would have is if you were doing software rendering and a lot of people still didn't have FPU's... Quake1 would hit that requirement, but not sure about Quake 2 and on...
Although for the code shown i'd have probably used a union instead of pointers to override type, although optimization might make that moot...
Actually, according to
http://blog.regehr.org/archives/959 the fastest way would be something like this (f float):
{
int i;
memcpy(&i, &f, 4);
}
(The memcpy() call is optimized into the lease assembly code.)
When it comes to optimization, sometimes compilers can surprise you, so be careful. (Of course, there are other surprises to watch out for, like CPUs and other hardware having unexpected performance characteristics.)