Position of lowest bit set

Position of lowest bit set
Code	01: const unsigned int MultiplyDeBruijnBitPosition[32] = 02: { 03: // precomputed lookup table 04: 0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 05: 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9 06: }; 07: 08: unsigned int lowestBitSet(unsigned int x) 09: { 10: // only lowest bit will remain 1, all others become 0 11: x &= -int(x); 12: // DeBruijn constant 13: x *= 0x077CB531; 14: // the upper 5 bits are unique, skip the rest (32 - 5 = 27) 15: x >>= 27; 16: // convert to actual position 17: return MultiplyDeBruijnBitPosition[x]; 18: } 19: 20: unsigned int lowestBitSetSimple(unsigned int x) 21: { 22: // x=0 is not properly handled by while-loop 23: if (x == 0) 24: return 0; 25: 26: unsigned int result = 0; 27: while ((x & 1) == 0) 28: { 29: x >>= 1; 30: result++; 31: } 32: 33: return result; 34: } bits.stephan-brumme.com

Explanation	A negative number is usually represented by the two-complement: −x = not(x)+1 The formula x & −x = x & (not(x) + 1) deletes all but the lowest set bit (line 11). That means after line 11 we have a power of 2 and there are obviously only 32 possibilities left. When multiplied by the DeBruijn constant 0x077CB531 (line 13), the upper 5 bits are unique for each power of 2. A small lookup table resolves the actual position of the lowest bit set (line 1 to 6, line 17). The overall performance of this algorithm is mainly determined by the lookup table and the numbers given below are based on full cache hits. Note: DeBruijn sequences are extensively explained by Wikipedia, too. The simple version performs surprisingly well for randomly distributed input: 50% of all integers are odd and their result will be 0. Another 25% result will be 1, therefore the loop terminates pretty early. When numbers with long spans of zeros are processed, timings become significantly worse. The observed break-even was lowestBitSet(32) on a Pentium D, your mileage may vary. The fastest way on almost all modern x86/x64 CPU bases on the built-in BSF instruction.

Restrictions	• designed for 32 bits

These ads help me to pay my server bills

Performance	+ Intel® Pentium™ D: • lowestBitSet: ≈ 18 cycles per number • simple(1,2): ≈ 4 cycles per number • simple(mix): ≈ 39 cycles per number • BSF: ≈ 3 cycles per number + Intel® Core™ 2: • lowestBitSet: ≈ 10.5 cycles per number • simple(1,2): ≈ 2.5 cycles per number • simple(mix): ≈ 21 cycles per number • BSF: ≈ 1 cycle per number + Intel® Core™ i7: • lowestBitSet: ≈ 10.5 cycles per number • simple(1,2): ≈ 2.5 cycles per number • simple(mix): ≈ 21 cycles per number • BSF: ≈ 1 cycle per number CPU cycles (full optimization, lower values are better)

Assembler Output	lowestBitSet: 01: ; x &= -x 02: mov ecx, eax 03: neg ecx 04: and ecx, eax 05: ; x = 0x077CB531 06: imul ecx, 077CB531h 07: ; x >>= 27 08: shr ecx, 27 09: ; return MultiplyDeBruijnBitPosition[x]; 10: mov eax, MultiplyDeBruijnBitPosition[ecx4] lowestBitSetSimple: 01: ; result = 0 02: xor eax, eax 03: ; if (x == 0) return 0 04: test ecx, ecx 05: je $finish 06: ; while ((x & 1) == 0) 07: test cl, 1 08: jne $finish 09: $scanLoop: 10: ; x >>= 1 11: shr ecx, 1 12: ; result++ 13: inc eax 14: test cl, 1 15: je $scanLoop 16: $finish: bits.stephan-brumme.com

Download	The full source code including a test program is available for download.

References	http://graphics.stanford.edu/~seander/bithacks.html

More Twiddled Bits	Absolute value of a float Absolute value of an integer Approximative inverse of a float Bit manipulation basics Bit mask of lowest bit not set Count bits set in parallel a.k.a. Population Count Detects zero bytes inside a 32 bit integer Endianess Extend bit width Float inverse square root approximation Float square root approximation Is power of two Minimum / maximum of integers Parity Position of lowest bit set Round up to the next power of two Sign comparison Sign of a 32 bit integer Swap two values Extra: Javascript bit manipulator ... or go to the index