|
Detects zero bytes inside a 32 bit integer |
|
Code |
01: bool hasZeroByte(unsigned int x)
bits.stephan-brumme.com
|
Explanation |
One of the most prominent example of detecting zeroes is strlen. To improve performance many bytes are examined in parallel. The expression x − 0x01010101 sets the highest bit of a byte if a single input byte is within the input set { 0x00, 0x81, 0x82, 0x83, ..., 0xFF } because the result will be { 0xFF, 0x80, 0x81, 0x82, ..., 0xFE } On the other hand, ~x sets the highest bit of a byte if a single input byte is within the input set { 0x00, 0x01, 0x02, 0x03, ..., 0x7F } because the result will be { 0xFF, 0xFE, 0xFD, 0xFC, ..., 0x80 } The only value contained in both input sets is 0x00, exactly what we are looking for. That means, only for x = 0 the highest bit is still set after ANDing: (x − 0x01010101) & ~x Please do not be confused by the fact that a result ≠ 0 actually indicates a zero byte. Then, all bits except for the highest bit of each byte are reset to zero using AND 0x80808080. The resulting value is equal to zero if all (!) of these highest bits are zero. If at least one is still set to 1, caused by a zero byte, the comparison ≠ 0 gives "true". hasZeroByteSimple shows the common approach to this problem. |
Restrictions |
• designed for 32 bits, easily extendable to 64 bits |
These ads help me to pay my server bills |
|
Performance |
• constant time, data independent + Intel® Pentium™ D: • hasZeroByte: ≈ 6 cycles per number • hasZeroByteSimple: ≈ 7 cycles per number + Intel® Core™ 2: • hasZeroByte: ≈ 3 cycles per number • hasZeroByteSimple: ≈ 3 cycles per number + Intel® Core™ i7: • hasZeroByte: ≈ 3.5 cycles per number • hasZeroByteSimple: ≈ 3 cycles per number CPU cycles (full optimization, lower values are better) |
Assembler Output |
hasZeroByte:
bits.stephan-brumme.com
|
Download | The full source code including a test program is available for download. |
References |
newsgroup posting of Alan Mycroft on April 27, 1987 |
More Twiddled Bits |
... or go to the index |