Float square root approximation

Float square root approximation
Code	`01: float squareRoot(float x) 02: { 03: unsigned int i = (unsigned int) &x; 04: 05: // adjust bias 06: i += 127 << 23; 07: // approximation of square root 08: i >>= 1; 09: 10: return (float) &i; 11: }` bits.stephan-brumme.com

Explanation	The standard IEEE-754 defines the binary layout of floating point numbers. All major CPUs (better to say: FPUs) follow this standard. The data type "float" is a 32 bit wide floating point number consisting of a 23 bit mantissa, an 8 bit exponent and a sign bit. Because the exponent can be negative, too, it is implemented using a bias of 2⁸-1 = 127. That means, 127 is added to the actual exponent. Binary exponents below −127 or above +127 cannot be represented by "float" (≈ ±10³⁸ decimal). To save one bit, the most significant bit of the mantissa is omitted because it is always set to one. The square root can be rewritten with an exponent: √(x) → x^½ When x already is expressed with an exponent, say x → a^b, then √(x) → √ a^b → a^b/2 It seems all we have to do is performing a right shift of the exponent of one bit while taking care of the bias, too. This works pretty well when the exponent is even but fails when it is odd. A surprisingly good approximation is to shift both mantissa and exponent. For odd exponents, the second most significant bit of the mantissa will be set because is it shifted out of the exponent and moves into the mantissa. In summary, we have to perform three steps: 1. remove bias from exponent 2. shift everything right by one bit 3. add bias to exponent Since the mantissa covers 23 bits, step 1 is equivalent to subtracting 1272²³ . Vice versa, step 3 comes down to adding 1272²³ . This approximation can be found in the "Intel Software Optimization Cookbook" for example. Step 1 and 3 can be combined into one operation by observing that the sign bit is always zero (square root of negative numbers is usually not defined) and just replacing step 1 by step 3 and skipping the former step 3. We get a temporary overflow, thus setting the sign bit, but the logical shift afterwards (step 2) corrects for it by moving the bit into the exponent and clearing the sign bit. The improved, admittedly non-intuitive, algorithm looks this: 1. add bias to exponent (line 6) 2. shift everything right by one bit (line 8) We have to switch from IEEE-754 to integer arithmetic (line 3), of course, and back (line 10).

Restrictions	• designed for positive 32 bit floats • approximation, average deviation of ≈ 5% - for example: √144 = 12, but squareRoot(144) = 12.5, ≈ 4% off

These ads help me to pay my server bills

Performance	• constant time, data independent + Intel® Pentium™ D: • squareRoot: ≈ 2 cycles per number + Intel® Core™ 2: • squareRoot: ≈ 2 cycles per number + Intel® Core™ i7: • squareRoot: ≈ 2 cycles per number CPU cycles (full optimization, lower values are better)

Assembler Output	`squareRoot: 01: ; i += 127 << 23 02: add eax, 3F800000h 03: ; i >>= 1 04: shr eax, 1` bits.stephan-brumme.com

Download	The full source code including a test program is available for download.

References	• http://en.wikipedia.org/wiki/Methods_of_computing_square_roots • Intel, "Intel Software Optimization Cookbook"

More Twiddled Bits	Absolute value of a float Absolute value of an integer Approximative inverse of a float Bit manipulation basics Bit mask of lowest bit not set Count bits set in parallel a.k.a. Population Count Detects zero bytes inside a 32 bit integer Endianess Extend bit width Float inverse square root approximation Float square root approximation Is power of two Minimum / maximum of integers Parity Position of lowest bit set Round up to the next power of two Sign comparison Sign of a 32 bit integer Swap two values Extra: Javascript bit manipulator ... or go to the index