|
Minimum / maximum of integers |
|
Code |
01: int select(int x, int y, int ifXisSmaller, int ifYisSmaller)
bits.stephan-brumme.com
|
Explanation |
A universal helper function select (line 1) sets bit31 (line 5) to either 0x00000000 or 0xFFFFFFFF. Then a special property of XOR is used: a XOR b XOR b = a; In plain English, applying XOR twice with the same number eliminates this number. If bit31 is 0x00000000, line 8 becomes (0x00000000 AND (ifXisSmaller XOR ifYisSmaller)) XOR ifYisSmaller and since (zero AND something) = zero, only zero XOR ifYisSmaller is left which is ifYisSmaller. if bit31 is 0xFFFFFFFF, line 8 becomes (0xFFFFFFFF AND (ifXisSmaller XOR ifYisSmaller)) XOR ifYisSmaller and since (allBitsSet AND something) = something, (ifXisSmaller XOR ifYisSmaller) XOR ifYisSmaller is left, which is ifXisSmaller as explained earlier. The select routine can be used to implement a branch-free version of min (line 11) and max (line 17). Even though more instructions are needed than if written using a simple "if", modern processor are much faster because the presented code is branch-free and avoids mis-predictions and pipeline stalls. |
Restrictions |
• designed for 32 bits • behavior of right shift is not clearly defined, may not work on certain platforms / compilers |
These ads help me to pay my server bills |
|
Performance |
• constant time, data independent + Intel® Pentium™ D: • maximum: ≈ 3 cycles per number • minimum: ≈ 3 cycles per number • minimumSimple: ≈ 3.5 cycles per number + Intel® Core™ 2: • maximum: ≈ 2.5 cycles per number • minimum: ≈ 2.5 cycles per number • minimumSimple: ≈ 2.5 cycles per number + Intel® Core™ i7: • maximum: ≈ 3 cycles per number • minimum: ≈ 3 cycles per number • minimumSimple: ≈ 2.25 cycles per number CPU cycles (full optimization, lower values are better) |
Assembler Output |
maximum:
bits.stephan-brumme.com
|
Download | The full source code including a test program is available for download. |
References |
unknown |
More Twiddled Bits |
... or go to the index |