Numba with cross product is 10x slow than numba with loop

hi, it’s very common that pure-loop code performs better than using array methods or numpy functions.

Earlier this week we had another example: Using @njit with numpy.tensordot