
Float square root approximation 

Code  
Explanation 
The standard IEEE754 defines the binary layout of floating point numbers. All major CPUs (better to say: FPUs) follow this standard. The data type "float" is a 32 bit wide floating point number consisting of a 23 bit mantissa, an 8 bit exponent and a sign bit. Because the exponent can be negative, too, it is implemented using a bias of 2^{8}1 = 127. That means, 127 is added to the actual exponent. Binary exponents below −127 or above +127 cannot be represented by "float" (≈ ±10^{38} decimal). To save one bit, the most significant bit of the mantissa is omitted because it is always set to one. The square root can be rewritten with an exponent: √(x) → x^{½} When x already is expressed with an exponent, say x → a^{b}, then √(x) → √ a^{b} → a^{b/2} It seems all we have to do is performing a right shift of the exponent of one bit while taking care of the bias, too. This works pretty well when the exponent is even but fails when it is odd. A surprisingly good approximation is to shift both mantissa and exponent. For odd exponents, the second most significant bit of the mantissa will be set because is it shifted out of the exponent and moves into the mantissa. In summary, we have to perform three steps: 1. remove bias from exponent 2. shift everything right by one bit 3. add bias to exponent Since the mantissa covers 23 bits, step 1 is equivalent to subtracting 127*2^{23} . Vice versa, step 3 comes down to adding 127*2^{23} . This approximation can be found in the "Intel Software Optimization Cookbook" for example. Step 1 and 3 can be combined into one operation by observing that the sign bit is always zero (square root of negative numbers is usually not defined) and just replacing step 1 by step 3 and skipping the former step 3. We get a temporary overflow, thus setting the sign bit, but the logical shift afterwards (step 2) corrects for it by moving the bit into the exponent and clearing the sign bit. The improved, admittedly nonintuitive, algorithm looks this: 1. add bias to exponent (line 6) 2. shift everything right by one bit (line 8) We have to switch from IEEE754 to integer arithmetic (line 3), of course, and back (line 10). 
Restrictions 
• designed for positive 32 bit floats • approximation, average deviation of ≈ 5%  for example: √144 = 12, but squareRoot(144) = 12.5, ≈ 4% off 
These ads help me to pay my server bills 

Performance 
• constant time, data independent + Intel® Pentium™ D: • squareRoot: ≈ 2 cycles per number + Intel® Core™ 2: • squareRoot: ≈ 2 cycles per number + Intel® Core™ i7: • squareRoot: ≈ 2 cycles per number CPU cycles (full optimization, lower values are better) 
Assembler Output 
squareRoot:
bits.stephanbrumme.com

Download  The full source code including a test program is available for download. 
References 
• http://en.wikipedia.org/wiki/Methods_{of}_{computing}_{square}_{roots} • Intel, "Intel Software Optimization Cookbook" 
More Twiddled Bits 
... or go to the index 