Approximative inverse of a float

01: float inverse(float x)
02: {
03:   // re-interpret as a 32 bit integer
04:   unsigned int *i = (unsigned int*)&x;
06:   // adjust exponent
07:   *i = 0x7F000000 - *i;
08:   //*i = 0x7EEEEEEE - *i;
10:   return x;
11: }

Explanation The standard IEEE-754 defines the binary layout of floating point numbers.
All major CPUs (or better to say: FPUs) follow this standard.

The data type "float" is a 32 bit wide floating point number consisting of a 23 bit mantissa,
an 8 bit exponent and a sign bit. Because the exponent can be negative, too, it is implemented using
a bias of 28-1 = 127. That means, 127 is added to the actual exponent to keep it always positive.
Written in a formula: x = sign * mantissa * 2exponent .

Note: To save one bit, the most significant bit of the mantissa is omitted because it is by definition always set to one.

Note, too: Binary exponents below −127 or above +127 cannot be represented by "float" (≈ ±1038 decimal).

The inverse can be written as: inverse(a) → a-1 or more specific: inverse(ab) → a-b .
If speed is more important than precision, we can approximate x-1 by computing sign * mantissa * 2-exponent .
The only difference to the original formula is the minus in front of exponent.
Because the mantissa remains unadjusted, the result will slightly deviate from the true inverse value.

In the end, a simple subtraction of the exponent bits, often performed in merely one CPU cycle,
does the trick (line 7): exponent → 127 − exponent
The constant 0x7F000000 represents 127 shifted left by 23 bits.

A brute force search revealed that a few constants expose a better overall accuracy and cut down the
error to about 4%. So far, 0x7EEEEEEE (line 8) seems to be the best candidate.
My brute force search didn't analyze all possible numbers, that means, there might be even better constants.
But you should keep in mind that only for 0x7F000000 we get inverse(1) = 1, which might be important for
many algorithms. No other constant can produce the same result for inverse(1).

We have to switch from IEEE-754 to integer arithmetic (line 4), of course, and back (line 10).

Restrictions • designed for 32 bit IEEE floats
• approximation, average deviation of ≈ 8%

These ads
help me to pay
my server bills

Performance • constant time, data independent

+ Intel® Pentium™ D:
inverse/CPU: ≈ 1 cycle per number
inverse/FPU: ≈ 42 cycles per number

+ Intel® Core™ 2:
inverse/CPU: ≈ 0.75 cycle per number
inverse/FPU: ≈ 34 cycles per number

+ Intel® Core™ i7:
inverse/CPU: ≈ 0.75 cycle per number
inverse/FPU: ≈ 22 cycles per number
Performance Chart
CPU cycles
(full optimization,
lower values are better)

Assembler Output
compiler: VS 2008
in: edx ← x
out: eax → result
01:   mov eax, 7F000000H
02:   sub eax, edx

compiler: VS 2008
in: ST(0) ← x
out: ST(0) → result
01:   fld1
02:   fdivrp ST(1), ST(0)

Download The full source code including a test program is available for download.

References unknown

More Twiddled Bits
  1. Absolute value of a float
  2. Absolute value of an integer
  3. Approximative inverse of a float
  4. Bit manipulation basics
  5. Bit mask of lowest bit not set
  6. Count bits set in parallel a.k.a. Population Count
  7. Detects zero bytes inside a 32 bit integer
  8. Endianess
  9. Extend bit width
  10. Float inverse square root approximation
  11. Float square root approximation
  12. Is power of two
  13. Minimum / maximum of integers
  14. Parity
  15. Position of lowest bit set
  16. Round up to the next power of two
  17. Sign comparison
  18. Sign of a 32 bit integer
  19. Swap two values
Extra: Javascript bit manipulator

... or go to the index