How to best compile numpy.linalg.slogdet(A)?

I am learning how to use numba more in a numpy & scipy-based computation. The step numpy.linalg.slogdet(A) seems to be the second most time-consuming line (, slightly faster than this ), where A is a matrix of float64, different sizes in every run, ranging mostly from 1x1 to 300x300, a few times up to 600x600. I ran some %timeit-tests shown in the plot below and compiling clearly helps. Though only for matrix-sizes < 100x100.

From what I read so far, I think this is because the internal numpy-vectorization is already efficient. Is this correct? If not, how should I interpret the %timeit-results for sizes > 100x100?

from numpy.linalg import slogdet
from numba import njit

def logdet(A):
    sign, logabsdet = slogdet(A)
    result = sign*logabsdet
    return result

@njit(fastmath=False, parallel=False)
def logdet_numba(A):
    sign, logabsdet = slogdet(A)
    result = sign*logabsdet
    return result

@njit(fastmath=True, parallel=False)
def logdet_numba_fm(A):
    sign, logabsdet = slogdet(A)
    result = sign*logabsdet
    return result

ldbest = []
ldnbest = []
ldnfmbest = []
sizes = np.array(
    (2, 3, 4, 5, 6, 8, 10, 12, 14, 16, 20, 24, 28, 32, 36, 40, 45, 50, 55, 60, 65, 
     70, 80, 90, 100, 110, 120, 130, 150, 170, 200, 250, 300, 400, 500, 750, 1000)
for n in sizes:
    a = np.random.randn(n,n)
    A = a.T@a
    rld = %timeit -o -q logdet(A)
    rldn = %timeit -o -q logdet_numba(A)
    rldnfm = %timeit -o -q logdet_numba_fm(A)

s = 12
plt.scatter(sizes, ldbest, s=s, label='logdet(A)');plt.yscale('log');plt.xscale('log')
plt.scatter(sizes, ldnbest, s=s, label='logdet(A) with @njit(fastmath=False, parallel=False)');plt.yscale('log');plt.xscale('log')
plt.scatter(sizes, ldnfmbest,s=s,label='logdet(A) with @njit(fastmath=True, parallel=False)' );plt.yscale('log');plt.xscale('log')
plt.xlabel('matrix-size n, in nxn');plt.ylabel('seconds');plt.title('');plt.grid();plt.legend();

Hi @ofk123

The reason why both function are almost equally fast is because they are mostly the same. slogdet is implemented via a LU decomposition which uses this Lapack routine. And it is this part of the function that is responsible for almost all of the runtime for larger arrays.

Thanks alot @sschaer!
That makes sense, from what I understand.
I will use this function then.

Let me know if anyone sees a better way to compile the function, or a better calculation in general.