I am currently exploring ways to further improve the speed of the hamming distance calculation between `np.int8`

arrays.

```
a=np.array([ -113, 53, 107, 118, -110, -45, -41, -90, -48, 103, 88, -52, 35, -52, -104, 61, -69, -95, 49, 57, 108, -37, 37, 60, 37, 103, -2, 44, -26, 98, 117, -106, -7, -102, -33, -54, -33, 110, 68, 39, 105, 13, -86, 65, 23, -27, -128, 50, 19, 115, -41, 33, 9, 24, -28, -77, 77, -64, 96, 100, -13, 24, 4, 62, 60, -20, -74, 12, 28, 95, -95, 120, 13, -75, 49, -51, -32, 108, 18, 87, 106, 126, 98, 82, -90, -87, 11, 4, 124, -27, -52, -42, 96, 72, 69, -90 ], dtype=np.int8)
b=np.array([ 79, 21, 31, 87, -72, -105, -105, -86, 48, 71, 112, -36, 35, -11, -104, 93, -79, -94, 11, 41, -4, -39, 37, 56, 39, -25, -2, 45, -28, -14, -39, -120, -4, -125, 91, 74, -41, 108, 68, 46, 105, 13, -82, -128, 71, -59, -128, 115, 58, 115, -13, 40, 107, 124, -4, -45, 110, -23, 10, -12, -77, -40, -124, 51, 60, 104, -76, 6, -108, 63, -95, 116, 4, -92, 49, 77, -30, -116, 19, 127, 75, 94, 74, -34, 36, -23, 1, 36, 89, -59, -52, -12, 96, -24, 95, 36 ], dtype=np.int8)
```

I have benchmarked a set of functions between two arrays using `%timeit -n 100000`

.

Function:

```
def hamming_distance_1(a, b):
return sum(ch1 != ch2 for ch1, ch2 in zip(a, b))
```

```
16.5 µs ± 588 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```

Function:

```
def hamming_distance_2(a, b):
return np.sum(a != b)
```

```
7.68 µs ± 189 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```

Function:

```
def hamming_distance_3(a, b):
return np.count_nonzero(a!=b)
```

```
912 ns ± 9.57 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```

Function:

```
def hamming_distance_4(a, b):
return len(np.bitwise_xor(a,b).nonzero()[0])
```

```
776 ns ± 7.88 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```

Function:

```
@numba.njit(fastmath=True)
def hamming_distance_5(a, b):
return np.count_nonzero(a!=b)
```

```
The slowest run took 5.05 times longer than the fastest. This could mean that an intermediate result is being cached.
575 ns ± 513 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```

Function:

```
@numba.njit(fastmath=True)
def hamming_distance_6(a, b):
return len(np.bitwise_xor(a,b).nonzero()[0])
```

```
The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached.
535 ns ± 457 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```

Are there reasonable measures that can be taken using numba to further improve the performance of the hamming distance calculation?

I have tested using `parallel`

, `nopython`

, and `nogil`

but can not seem to get significant further improvement.

Any input would be greatly appreciated.