
Approximative inverse of a float 

Code  
Explanation 
The standard IEEE754 defines the binary layout of floating point numbers. All major CPUs (or better to say: FPUs) follow this standard. The data type "float" is a 32 bit wide floating point number consisting of a 23 bit mantissa, an 8 bit exponent and a sign bit. Because the exponent can be negative, too, it is implemented using a bias of 2^{8}1 = 127. That means, 127 is added to the actual exponent to keep it always positive. Written in a formula: x = sign * mantissa * 2^{exponent} . Note: To save one bit, the most significant bit of the mantissa is omitted because it is by definition always set to one. Note, too: Binary exponents below −127 or above +127 cannot be represented by "float" (≈ ±10^{38} decimal). The inverse can be written as: inverse(a) → a^{1} or more specific: inverse(a^{b)} → a^{b} . If speed is more important than precision, we can approximate x^{1} by computing sign * mantissa * 2^{exponent} . The only difference to the original formula is the minus in front of exponent. Because the mantissa remains unadjusted, the result will slightly deviate from the true inverse value. In the end, a simple subtraction of the exponent bits, often performed in merely one CPU cycle, does the trick (line 7): exponent → 127 − exponent The constant 0x7F000000 represents 127 shifted left by 23 bits. A brute force search revealed that a few constants expose a better overall accuracy and cut down the error to about 4%. So far, 0x7EEEEEEE (line 8) seems to be the best candidate. My brute force search didn't analyze all possible numbers, that means, there might be even better constants. But you should keep in mind that only for 0x7F000000 we get inverse(1) = 1, which might be important for many algorithms. No other constant can produce the same result for inverse(1). We have to switch from IEEE754 to integer arithmetic (line 4), of course, and back (line 10). 
Restrictions 
• designed for 32 bit IEEE floats • approximation, average deviation of ≈ 8% 
These ads help me to pay my server bills 

Performance 
• constant time, data independent + Intel® Pentium™ D: • inverse/CPU: ≈ 1 cycle per number • inverse/FPU: ≈ 42 cycles per number + Intel® Core™ 2: • inverse/CPU: ≈ 0.75 cycle per number • inverse/FPU: ≈ 34 cycles per number + Intel® Core™ i7: • inverse/CPU: ≈ 0.75 cycle per number • inverse/FPU: ≈ 22 cycles per number CPU cycles (full optimization, lower values are better) 
Assembler Output 
inverse:
bits.stephanbrumme.com

Download  The full source code including a test program is available for download. 
References 
unknown 
More Twiddled Bits 
... or go to the index 