# Parity

Code
``` 01: bool parity(unsigned int x) 02: { 03:   x ^= x >> 16; 04:   x ^= x >>  8; 05:   x ^= x >>  4; 06:   x &= 0x0F; 07:   return ((0x6996 >> x) & 1) != 0; 08: } 09: 10: bool parity2(unsigned int x) 11: { 12:   x ^= x >> 1; 13:   x ^= x >> 2; 14:   x &= 0x011111111; 15:   x *= 0x011111111; 16:   return ((x >> 28) & 1) != 0; 17: } ``` bits.stephan-brumme.com

Explanation Bit parity tells whether a given input contains an odd number of 1s.
When the input is subdivided into several chunks, the total parity can be determined by computing the
parity of the sub parities. Might sound weird, so here is a formula to make it clear:

parity(A and B) = parity(parity(A) and parity(B))

If we break down the 32 bits into chunks of 4 bits then the problem is greatly simplified. XOR is the way to go:

(0 XOR 0) → 0
(0 XOR 1) → 1
(1 XOR 0) → 1
(1 XOR 1) → 0

That basically means that the parity of two bits can be computed with a XOR operation.
Even better, XOR can be applied to the full value to compute the parities of multiple bit pairs in parallel.

first algorithm (parity, line 1 to line 8):

The code first "merges" bits 0 − 15 with bits 16 − 31 using a right shift and XOR (line 3).
Including line 4 and line 5 reduce the input width is reduced from 32 bits to 8 bits by repeatedly applying shifted XORs.
We are only interested in the lower bits. That means, after step 1 the lower 16 bits are of importance,
then the lower 8, then the lowest 4. After having applied XOR three times, we delete all higher bits (line 6).

We are left with sixteen possible values for x. Their parity is:
parity(15) → parity(1111b) → 4 bits set → even
parity(14) → parity(1110b) → 3 bits set → odd
parity(13) → parity(1101b) → 3 bits set → odd
parity(12) → parity(1100b) → 2 bits set → even
parity(11) → parity(1011b) → 3 bits set → odd
parity(10) → parity(1010b) → 2 bits set → even
parity( 9) → parity(1001b) → 2 bits set → even
parity( 8) → parity(1000b) → 1 bits set → odd
parity( 7) → parity(0111b) → 3 bits set → odd
parity( 6) → parity(0110b) → 2 bits set → even
parity( 5) → parity(0101b) → 2 bits set → even
parity( 4) → parity(0100b) → 1 bits set → odd
parity( 3) → parity(0011b) → 2 bits set → even
parity( 2) → parity(0010b) → 1 bits set → odd
parity( 1) → parity(0001b) → 1 bits set → odd
parity( 0) → parity(0000b) → 0 bits set → even

If we encode even by 0 and odd by 1 beginning with parity(15) then we get 0110 1001 0110 1001 = 0x6996,
which is the magic number found in line 7. The shift moves the relevant bit to bit 0. Then everything except
for bit 0 is masked out. In the end, we get 0 for even and 1 for odd, exactly as desird.

second algorithm (parity2, line 10 to line 17):

This time, parity is computed bottom-up instead of top-down.
Using a similar XOR technique, line 12 and line 13 compute the parity of each four-bit-block and put it
into to lowest bit of each block. All other bits become garbage and need to be cleared (line 14).

Multiplying by 0x011111111 has an interesting property if we label the four-bit-block A, B, C, D, E, F, G, H:
A, B, C, D, E, F, G, H → A+B+C+D+E+F+G+H, B+C+D+E+F+G+H, C+D+E+F+G+H, D+E+F+G+H, E+F+G+H, F+G+H, G+H, H
Obviously the highest four-bit-block is what we are looking for. If the sum of multiple parities is odd,
then the overall parity is odd as well (vice versa for even).
The right shift followed by AND (line 16) returns the lowest bit which gives us the parity.

Restrictions • designed for 32 bits
parity: insert "x ^= x >> 32;" before line 3 for 64 bits
parity: remove line 3 for 16 bits
parity: remove line 3 and line 4 for 8 bits
parity2: for 64 bits, replace 0x011111111 by 0x01111111111111111 and shift right by 60 instead of 28

help me to pay
my server bills

Performance • constant time, data independent

+ Intel® Pentium™ D:
parity: ≈ 13 cycles per number
parity2: ≈ 6 cycles per number

+ Intel® Core™ 2:
parity: ≈ 12 cycles per number
parity2: ≈ 5 cycles per number

+ Intel® Core™ i7:
parity: ≈ 11.5 cycles per number
parity2: ≈ 5.25 cycles per number CPU cycles
(full optimization,
lower values are better)

Assembler Output
``` parity: 01:   ; x ^= x >> 16 02:   mov ecx, eax 03:   shr ecx, 16 04:   xor eax, ecx 05:   ; x ^= x >> 8 06:   mov edx, eax 07:   shr edx, 8 08:   xor eax, edx 09:   ; x ^= x >> 4 10:   mov ecx, eax 11:   shr ecx, 4 12:   xor ecx, eax 13:   ; x &= 0x0F 14:   and ecx, 0000000Fh 15:   ; return ((0x6996 >> x) & 1) != 0 16:   mov eax, 00006996h 17:   sar eax, cl 18:   and eax, 1 parity2: 01:   ; x ^= x >> 1 02:   mov eax, ecx 03:   shr eax, 1 04:   xor ecx, eax 05:   ; x ^= x >> 2 06:   mov eax, ecx 07:   shr eax, 2 08:   xor eax, ecx 09:   ; x &= 0x11111111 10:   and eax, 11111111h 11:   ; x *= 0x11111111 12:   imul eax, 11111111h 13:   ; return (x >> 28) & 1 14:   shr eax, 28 15:   and eax, 1 ``` bits.stephan-brumme.com