Float square root approximation

Code
01: float squareRoot(float x)
02: {
03:   unsigned int i = *(unsigned int*) &x;
04:
05:   // adjust bias
06:   i  += 127 << 23;
07:   // approximation of square root
08:   i >>= 1;
09:
10:   return *(float*) &i;
11: }
bits.stephan-brumme.com

Explanation The standard IEEE-754 defines the binary layout of floating point numbers.
All major CPUs (better to say: FPUs) follow this standard.

The data type "float" is a 32 bit wide floating point number consisting of a 23 bit mantissa,
an 8 bit exponent and a sign bit. Because the exponent can be negative, too, it is implemented using
a bias of 28-1 = 127. That means, 127 is added to the actual exponent.
Binary exponents below −127 or above +127 cannot be represented by "float" (≈ ±1038 decimal).
To save one bit, the most significant bit of the mantissa is omitted because it is always set to one.

The square root can be rewritten with an exponent: √(x) → x½
When x already is expressed with an exponent, say x → ab, then √(x) → √ ab → ab/2
It seems all we have to do is performing a right shift of the exponent of one bit while taking care of the bias, too.

This works pretty well when the exponent is even but fails when it is odd.
A surprisingly good approximation is to shift both mantissa and exponent.
For odd exponents, the second most significant bit of the mantissa will be set because is it shifted out
of the exponent and moves into the mantissa.

In summary, we have to perform three steps:
1. remove bias from exponent
2. shift everything right by one bit
3. add bias to exponent

Since the mantissa covers 23 bits, step 1 is equivalent to subtracting 127*223 .
Vice versa, step 3 comes down to adding 127*223 .
This approximation can be found in the "Intel Software Optimization Cookbook" for example.

Step 1 and 3 can be combined into one operation by observing that the sign bit is always zero
(square root of negative numbers is usually not defined) and just replacing step 1 by step 3
and skipping the former step 3. We get a temporary overflow, thus setting the sign bit, but the logical shift
afterwards (step 2) corrects for it by moving the bit into the exponent and clearing the sign bit.

The improved, admittedly non-intuitive, algorithm looks this:
1. add bias to exponent (line 6)
2. shift everything right by one bit (line 8)

We have to switch from IEEE-754 to integer arithmetic (line 3), of course, and back (line 10).

Restrictions • designed for positive 32 bit floats
• approximation, average deviation of ≈ 5%
  - for example: √144 = 12, but squareRoot(144) = 12.5, ≈ 4% off

These ads
help me to pay
my server bills

Performance • constant time, data independent

+ Intel® Pentium™ D:
squareRoot: ≈ 2 cycles per number

+ Intel® Core™ 2:
squareRoot: ≈ 2 cycles per number

+ Intel® Core™ i7:
squareRoot: ≈ 2 cycles per number
Performance Chart
CPU cycles
(full optimization,
lower values are better)

Assembler Output
squareRoot:
01:   ; i += 127 << 23
02:   add eax, 3F800000h
03:   ; i >>= 1
04:   shr eax, 1
bits.stephan-brumme.com

Download The full source code including a test program is available for download.

References http://en.wikipedia.org/wiki/Methodsofcomputingsquareroots
• Intel, "Intel Software Optimization Cookbook"

More Twiddled Bits
  1. Absolute value of a float
  2. Absolute value of an integer
  3. Approximative inverse of a float
  4. Bit manipulation basics
  5. Bit mask of lowest bit not set
  6. Count bits set in parallel a.k.a. Population Count
  7. Detects zero bytes inside a 32 bit integer
  8. Endianess
  9. Extend bit width
  10. Float inverse square root approximation
  11. Float square root approximation
  12. Is power of two
  13. Minimum / maximum of integers
  14. Parity
  15. Position of lowest bit set
  16. Round up to the next power of two
  17. Sign comparison
  18. Sign of a 32 bit integer
  19. Swap two values
Extra: Javascript bit manipulator

... or go to the index