|
Position of lowest bit set |
|
Code |
01: const unsigned int MultiplyDeBruijnBitPosition[32] =
bits.stephan-brumme.com
|
Explanation |
A negative number is usually represented by the two-complement: −x = not(x)+1 The formula x & −x = x & (not(x) + 1) deletes all but the lowest set bit (line 11). That means after line 11 we have a power of 2 and there are obviously only 32 possibilities left. When multiplied by the DeBruijn constant 0x077CB531 (line 13), the upper 5 bits are unique for each power of 2. A small lookup table resolves the actual position of the lowest bit set (line 1 to 6, line 17). The overall performance of this algorithm is mainly determined by the lookup table and the numbers given below are based on full cache hits. Note: DeBruijn sequences are extensively explained by Wikipedia, too. The simple version performs surprisingly well for randomly distributed input: 50% of all integers are odd and their result will be 0. Another 25% result will be 1, therefore the loop terminates pretty early. When numbers with long spans of zeros are processed, timings become significantly worse. The observed break-even was lowestBitSet(32) on a Pentium D, your mileage may vary. The fastest way on almost all modern x86/x64 CPU bases on the built-in BSF instruction. |
Restrictions |
• designed for 32 bits |
These ads help me to pay my server bills |
|
Performance |
+ Intel® Pentium™ D: • lowestBitSet: ≈ 18 cycles per number • simple(1,2): ≈ 4 cycles per number • simple(mix): ≈ 39 cycles per number • BSF: ≈ 3 cycles per number + Intel® Core™ 2: • lowestBitSet: ≈ 10.5 cycles per number • simple(1,2): ≈ 2.5 cycles per number • simple(mix): ≈ 21 cycles per number • BSF: ≈ 1 cycle per number + Intel® Core™ i7: • lowestBitSet: ≈ 10.5 cycles per number • simple(1,2): ≈ 2.5 cycles per number • simple(mix): ≈ 21 cycles per number • BSF: ≈ 1 cycle per number CPU cycles (full optimization, lower values are better) |
Assembler Output |
lowestBitSet:
bits.stephan-brumme.com
|
Download | The full source code including a test program is available for download. |
References |
http://graphics.stanford.edu/~seander/bithacks.html |
More Twiddled Bits |
... or go to the index |