Optimizing Code Further, CUDA Jit?

gmarkall · July 23, 2020, 9:29am

For NumPy operations that are not supported in the CUDA target, they need to be rewritten in terms of loops and operations on individual elements. I haven’t had time to try and translate all of the above, but a couple of ideas for starting points might be (noting that the below code is untested, but illustrates the general idea):

numpy.linalg.inv: It looks like your matrix is 3x3, so you could use a function like this

@njit
def invert_3x3_matrix(m, res):
    a = m[0, 0]
    b = m[0, 1]
    c = m[0, 2]
    d = m[1, 0]
    e = m[1, 1]
    f = m[1, 2]
    g = m[2, 0]
    h = m[2, 1]
    i = m[2, 2]

    D = 1.0 / (a*(e*i - f*h) - b*(d*i - f*g) + c*(d*h - e*g))

    a1 = D * (e*i - f*h)
    b1 = D * (c*h - b*i)
    c1 = D * (b*f - c*e)
    d1 = D * (f*g - d*i)
    e1 = D * (a*i - c*g)
    f1 = D * (c*d - a*f)
    g1 = D * (d*h - e*g)
    h1 = D * (b*g - a*h)
    i1 = D * (a*e - b*d)

    res[0, 0] = a1
    res[0, 1] = b1
    res[0, 2] = c1
    res[1, 0] = d1
    res[1, 1] = e1
    res[1, 2] = f1
    res[2, 0] = g1
    res[2, 1] = h1
    res[2, 2] = i1

Then call it by declaring a local array for the result, e.g.:

invertCsensor = cuda.local.array((3, 3), np.float64)
invert_3x3_matrix(C_sensor_n, invertCsensor)

np.dot: For a dot product, you could rewrite like:

calcVal = 0.0
for i in range(len(leftSide)):
    calcVal += leftSide[i] * rightSide[i]

I hope this helps give the idea - I think you’ll also need to rewrite conj with a loop as well.

Topic		Replies	Views
CPU vs GPU version Numba	2	475	July 28, 2020
Inverse Matrix on GPU Support: How do I do ...?	3	1618	September 15, 2021
Usage of CUDA Python, Linear Algebra on GPU and Computational Code Community Support	7	3749	December 31, 2021
CUDA.jit - Higher Order Convolution Optimizations (Volterra Operator) Support: How do I do ...?	1	449	October 5, 2020
Cuda vs CPU maintenance Community Support	1	505	June 15, 2020

Optimizing Code Further, CUDA Jit?

Related topics