Position of lowest bit set

Code
01: const unsigned int MultiplyDeBruijnBitPosition[32] =
02: {
03:   // precomputed lookup table
04:   0 1, 28 2, 29, 14, 24 3, 30, 22, 20, 15, 25, 17 4 8,
05:   31, 27, 13, 23, 21, 19, 16 7, 26, 12, 18 6, 11 5, 10 9
06: };
07:
08: unsigned int lowestBitSet(unsigned int x)
09: {
10:   // only lowest bit will remain 1, all others become 0
11:   x  &= -int(x);
12:   // DeBruijn constant
13:   x  *= 0x077CB531;
14:   // the upper 5 bits are unique, skip the rest (32 - 5 = 27)
15:   x >>= 27;
16:   // convert to actual position
17:   return MultiplyDeBruijnBitPosition[x];
18: }
19:
20: unsigned int lowestBitSetSimple(unsigned int x)
21: {
22:   // x=0 is not properly handled by while-loop
23:   if (x == 0)
24:     return 0;
25:
26:   unsigned int result = 0;
27:   while ((x & 1) == 0)
28:   {
29:     x >>= 1;
30:     result++;
31:   }
32:
33:   return result;
34: }
bits.stephan-brumme.com

Explanation A negative number is usually represented by the two-complement: −x = not(x)+1

The formula x & −x = x & (not(x) + 1) deletes all but the lowest set bit (line 11).

That means after line 11 we have a power of 2 and there are obviously only 32 possibilities left.
When multiplied by the DeBruijn constant 0x077CB531 (line 13), the upper 5 bits are unique for each power of 2.
A small lookup table resolves the actual position of the lowest bit set (line 1 to 6, line 17).
The overall performance of this algorithm is mainly determined by the lookup table and
the numbers given below are based on full cache hits.

Note: DeBruijn sequences are extensively explained by Wikipedia, too.

The simple version performs surprisingly well for randomly distributed input:
50% of all integers are odd and their result will be 0. Another 25% result will be 1, therefore the loop terminates pretty early.
When numbers with long spans of zeros are processed, timings become significantly worse.
The observed break-even was lowestBitSet(32) on a Pentium D, your mileage may vary.

The fastest way on almost all modern x86/x64 CPU bases on the built-in BSF instruction.

Restrictions • designed for 32 bits

These ads
help me to pay
my server bills

Performance + Intel® Pentium™ D:
lowestBitSet: ≈ 18 cycles per number
• simple(1,2): ≈ 4 cycles per number
• simple(mix): ≈ 39 cycles per number
• BSF: ≈ 3 cycles per number

+ Intel® Core™ 2:
lowestBitSet: ≈ 10.5 cycles per number
• simple(1,2): ≈ 2.5 cycles per number
• simple(mix): ≈ 21 cycles per number
• BSF: ≈ 1 cycle per number

+ Intel® Core™ i7:
lowestBitSet: ≈ 10.5 cycles per number
• simple(1,2): ≈ 2.5 cycles per number
• simple(mix): ≈ 21 cycles per number
• BSF: ≈ 1 cycle per number
Performance Chart
CPU cycles
(full optimization,
lower values are better)

Assembler Output
lowestBitSet:
compiler: Visual C++ 2008
in: ecx ← x
out: eax → result
01:   ; x &= -x
02:   mov ecx, eax
03:   neg ecx
04:   and ecx, eax
05:   ; x *= 0x077CB531
06:   imul ecx, 077CB531h
07:   ; x >>= 27
08:   shr ecx, 27
09:   ; return MultiplyDeBruijnBitPosition[x];
10:   mov eax, MultiplyDeBruijnBitPosition[ecx*4]

lowestBitSetSimple:
compiler: Visual C++ 2008
in: ecx ← x
out: eax → result
01:   ; result = 0
02:   xor eax, eax
03:   ; if (x == 0) return 0
04:   test ecx, ecx
05:   je $finish
06:   ; while ((x & 1) == 0)
07:   test cl, 1
08:   jne $finish
09:   $scanLoop:
10:   ; x >>= 1
11:   shr ecx, 1
12:   ; result++
13:   inc eax
14:   test cl, 1
15:   je $scanLoop
16:   $finish:
bits.stephan-brumme.com

Download The full source code including a test program is available for download.

References http://graphics.stanford.edu/~seander/bithacks.html

More Twiddled Bits
  1. Absolute value of a float
  2. Absolute value of an integer
  3. Approximative inverse of a float
  4. Bit manipulation basics
  5. Bit mask of lowest bit not set
  6. Count bits set in parallel a.k.a. Population Count
  7. Detects zero bytes inside a 32 bit integer
  8. Endianess
  9. Extend bit width
  10. Float inverse square root approximation
  11. Float square root approximation
  12. Is power of two
  13. Minimum / maximum of integers
  14. Parity
  15. Position of lowest bit set
  16. Round up to the next power of two
  17. Sign comparison
  18. Sign of a 32 bit integer
  19. Swap two values
Extra: Javascript bit manipulator

... or go to the index