Minimum / maximum of integers

Code
01: int select(int x, int y, int ifXisSmaller, int ifYisSmaller)
02: {
03:   int dif= x - y;
04:   // sets bit31 to 0xFFFFFFFF if x<y, else 0x00000000
05:   int bit31 = diff >> 31;
06:
07:   // return ifXisSmaller if x is smaller than y, else y
08:   return (bit31 & (ifXisSmaller ^ ifYisSmaller)) ^ ifYisSmaller;
09: }
10:
11: int minimum(int x, int y)
12: {
13:   // if x<y then return x, else y
14:   return select(x,y,x,y);
15: }
16:
17: int maximum(int x, int y)
18: {
19:   // if x<y then return y, else x
20:   return select(x,y,y,x);
21: }
22:
23: int minimumSimple(int x, int y)
24: {
25:   if (x < y)
26:     return x;
27:   else
28:     return y;
29: }
bits.stephan-brumme.com

Explanation A universal helper function select (line 1) sets bit31 (line 5) to either 0x00000000 or 0xFFFFFFFF.
Then a special property of XOR is used: a XOR b XOR b = a;
In plain English, applying XOR twice with the same number eliminates this number.

If bit31 is 0x00000000, line 8 becomes (0x00000000 AND (ifXisSmaller XOR ifYisSmaller)) XOR ifYisSmaller
and since (zero AND something) = zero, only zero XOR ifYisSmaller is left which is ifYisSmaller.

if bit31 is 0xFFFFFFFF, line 8 becomes (0xFFFFFFFF AND (ifXisSmaller XOR ifYisSmaller)) XOR ifYisSmaller
and since (allBitsSet AND something) = something, (ifXisSmaller XOR ifYisSmaller) XOR ifYisSmaller
is left, which is ifXisSmaller as explained earlier.

The select routine can be used to implement a branch-free version of min (line 11) and max (line 17).

Even though more instructions are needed than if written using a simple "if", modern processor
are much faster because the presented code is branch-free and avoids mis-predictions and pipeline stalls.

Restrictions • designed for 32 bits
• behavior of right shift is not clearly defined, may not work on certain platforms / compilers

These ads
help me to pay
my server bills

Performance • constant time, data independent

+ Intel® Pentium™ D:
maximum: ≈ 3 cycles per number
minimum: ≈ 3 cycles per number
minimumSimple: ≈ 3.5 cycles per number

+ Intel® Core™ 2:
maximum: ≈ 2.5 cycles per number
minimum: ≈ 2.5 cycles per number
minimumSimple: ≈ 2.5 cycles per number

+ Intel® Core™ i7:
maximum: ≈ 3 cycles per number
minimum: ≈ 3 cycles per number
minimumSimple: ≈ 2.25 cycles per number
Performance Chart
CPU cycles
(full optimization,
lower values are better)

Assembler Output
maximum:
01:   ; diff = x - y
02:   mov eax, ecx
03:   sub eax, edx
04:   ; bit31 = diff >> 31
05:   sar eax, 31
06:   ; return (bit31 & (y ^ x)) ^ x
07:   xor ecx, edx
08:   and eax, ecx
09:   xor eax, ecx

minimum:
01:   ; diff = x - y
02:   mov eax, ecx
03:   sub eax, edx
04:   ; bit31 = diff >> 31
05:   sar eax, 31
06:   ; return (bit31 & (x ^ y)) ^ y
07:   xor ecx, edx
08:   and eax, ecx
09:   xor eax, edx

minimumSimple:
01:   cmp ecx, eax
02:   jge $done
03:   mov eax, ecx
04:   $done:
bits.stephan-brumme.com

Download The full source code including a test program is available for download.

References unknown

More Twiddled Bits
  1. Absolute value of a float
  2. Absolute value of an integer
  3. Approximative inverse of a float
  4. Bit manipulation basics
  5. Bit mask of lowest bit not set
  6. Count bits set in parallel a.k.a. Population Count
  7. Detects zero bytes inside a 32 bit integer
  8. Endianess
  9. Extend bit width
  10. Float inverse square root approximation
  11. Float square root approximation
  12. Is power of two
  13. Minimum / maximum of integers
  14. Parity
  15. Position of lowest bit set
  16. Round up to the next power of two
  17. Sign comparison
  18. Sign of a 32 bit integer
  19. Swap two values
Extra: Javascript bit manipulator

... or go to the index