# Float square root approximation

Code
``` 01: float squareRoot(float x) 02: { 03:   unsigned int i = *(unsigned int*) &x; 04: 05:   // adjust bias 06:   i  += 127 << 23; 07:   // approximation of square root 08:   i >>= 1; 09: 10:   return *(float*) &i; 11: } ``` bits.stephan-brumme.com

Explanation The standard IEEE-754 defines the binary layout of floating point numbers.
All major CPUs (better to say: FPUs) follow this standard.

The data type "float" is a 32 bit wide floating point number consisting of a 23 bit mantissa,
an 8 bit exponent and a sign bit. Because the exponent can be negative, too, it is implemented using
a bias of 28-1 = 127. That means, 127 is added to the actual exponent.
Binary exponents below −127 or above +127 cannot be represented by "float" (≈ ±1038 decimal).
To save one bit, the most significant bit of the mantissa is omitted because it is always set to one.

The square root can be rewritten with an exponent: √(x) → x½
When x already is expressed with an exponent, say x → ab, then √(x) → √ ab → ab/2
It seems all we have to do is performing a right shift of the exponent of one bit while taking care of the bias, too.

This works pretty well when the exponent is even but fails when it is odd.
A surprisingly good approximation is to shift both mantissa and exponent.
For odd exponents, the second most significant bit of the mantissa will be set because is it shifted out
of the exponent and moves into the mantissa.

In summary, we have to perform three steps:
1. remove bias from exponent
2. shift everything right by one bit

Since the mantissa covers 23 bits, step 1 is equivalent to subtracting 127*223 .
Vice versa, step 3 comes down to adding 127*223 .
This approximation can be found in the "Intel Software Optimization Cookbook" for example.

Step 1 and 3 can be combined into one operation by observing that the sign bit is always zero
(square root of negative numbers is usually not defined) and just replacing step 1 by step 3
and skipping the former step 3. We get a temporary overflow, thus setting the sign bit, but the logical shift
afterwards (step 2) corrects for it by moving the bit into the exponent and clearing the sign bit.

The improved, admittedly non-intuitive, algorithm looks this:
1. add bias to exponent (line 6)
2. shift everything right by one bit (line 8)

We have to switch from IEEE-754 to integer arithmetic (line 3), of course, and back (line 10).

Restrictions • designed for positive 32 bit floats
• approximation, average deviation of ≈ 5%
- for example: √144 = 12, but squareRoot(144) = 12.5, ≈ 4% off

help me to pay
my server bills

Performance • constant time, data independent

+ Intel® Pentium™ D:
squareRoot: ≈ 2 cycles per number

+ Intel® Core™ 2:
squareRoot: ≈ 2 cycles per number

+ Intel® Core™ i7:
squareRoot: ≈ 2 cycles per number CPU cycles
(full optimization,
lower values are better)

Assembler Output
``` squareRoot: 01:   ; i += 127 << 23 02:   add eax, 3F800000h 03:   ; i >>= 1 04:   shr eax, 1 ``` bits.stephan-brumme.com