NotImplementedError: No definition for lowering static_getitem(List(int64, True), Literal[int](0)) -> int64

newlearner_CUDA · February 5, 2024, 6:06pm

I am so confused about this error and I really need someone’s help. I would greatly appreciate your assistance.

This is my code:

@cuda.jit
def opti_field_line(point1, point2, z, u_final):
    h = 540
    w = 960
    wavelength = 0.000532
    PS = 0.0108
    PI = 3.14159265358979323846
    a, b = cuda.grid(2)  # 2D grid
    p1_x = point1[0]  # Store the values outside the loop
    p2_x = point2[0]

    if a < h and b < w:
        if b > p1_x/PS+w/2 and b < p2_x/PS+w/2:
            if b < (p1_x+math.sqrt(wavelength*z/PS))/PS+w/2 and b > (p1_x-math.sqrt(wavelength*z/PS))/PS+w/2:
                if b < (p2_x+math.sqrt(wavelength*z/PS))/PS+w/2 and b > (p2_x-math.sqrt(wavelength*z/PS))/PS+w/2:
                    t1 = math.sqrt(PI/wavelength/z) * abs(p1_x-(b-w/2)*PS)
                    t2 = math.sqrt(PI/wavelength/z) * abs(p2_x-(b-w/2)*PS)
                    u_final[a, b] += 1/4 * (1+1j) * math.sqrt(2*wavelength*z) * 1/z * cmath.exp(1j*2*PI/wavelength*z) * cmath.exp(1j*PI/wavelength/z*(point1[1]-(a-h/2)*PS)**2) * (1/(1j-1) * math.sqrt(2/PI) * ((2*t1-1/5*t1**5+1/108*t1**9) + 1j*(-2/3*t1**3+1/21*t1**7+1/660*t1**11)) + 1/(1j-1) * math.sqrt(2/PI) * ((2*t2-1/5*t2**5+1/108*t2**9) + 1j*(-2/3*t2**3+1/21*t2**7+1/660*t2**11)))
                else: 
                    t1 = math.sqrt(PI/wavelength/z) * abs(p1_x-(b-w/2)*PS)
                    t2 = math.sqrt(PI/wavelength/z) * abs(p2_x-(b-w/2)*PS)
                    u_final[a, b] += 1/4 * (1+1j) * math.sqrt(2*wavelength*z) * 1/z * cmath.exp(1j*2*PI/wavelength*z) * cmath.exp(1j*PI/wavelength/z*(point1[1]-(a-h/2)*PS)**2) * (1/(1j-1) * math.sqrt(2/PI) * ((2*t1-1/5*t1**5+1/108*t1**9) + 1j*(-2/3*t1**3+1/21*t1**7+1/660*t1**11)) + -1*math.copysign(1, t2)+ (1-1j)/2 * math.sqrt(2/PI) * cmath.exp(-1j*t2**2)/t2)
            else: 
                if b < (p2_x+math.sqrt(wavelength*z/PS))/PS+w/2 and b > (p2_x-math.sqrt(wavelength*z/PS))/PS+w/2:
                    t1 = math.sqrt(PI/wavelength/z) * abs(p1_x-(b-w/2)*PS)
                    t2 = math.sqrt(PI/wavelength/z) * abs(p2_x-(b-w/2)*PS)
                    u_final[a, b] += 1/4 * (1+1j) * math.sqrt(2*wavelength*z) * 1/z * cmath.exp(1j*2*PI/wavelength*z) * cmath.exp(1j*PI/wavelength/z*(point1[1]-(a-h/2)*PS)**2) * (-1*math.copysign(1, t1)+ (1-1j)/2 * math.sqrt(2/PI) * cmath.exp(-1j*t1**2)/t1 + 1/(1j-1) * math.sqrt(2/PI) * ((2*t2-1/5*t2**5+1/108*t2**9) + 1j*(-2/3*t2**3+1/21*t2**7+1/660*t2**11)))
                else: 
                    t1 = math.sqrt(PI/wavelength/z) * abs(p1_x-(b-w/2)*PS)
                    t2 = math.sqrt(PI/wavelength/z) * abs(p2_x-(b-w/2)*PS)
                    u_final[a, b] += 1/4 * (1+1j) * math.sqrt(2*wavelength*z) * 1/z * cmath.exp(1j*2*PI/wavelength*z) * cmath.exp(1j*PI/wavelength/z*(point1[1]-(a-h/2)*PS)**2) * (-1*math.copysign(1, t1)+ (1-1j)/2 * math.sqrt(2/PI) * cmath.exp(-1j*t1**2)/t1 + -1*math.copysign(1, t2)+ (1-1j)/2 * math.sqrt(2/PI) * cmath.exp(-1j*t2**2)/t2)
        else:
            if b < (p1_x+math.sqrt(wavelength*z/PS))/PS+w/2 and b > (p1_x-math.sqrt(wavelength*z/PS))/PS+w/2:
                if b < (p2_x+math.sqrt(wavelength*z/PS))/PS+w/2 and b > (p2_x-math.sqrt(wavelength*z/PS))/PS+w/2:
                    t1 = math.sqrt(PI/wavelength/z) * abs(p1_x-(b-w/2)*PS)
                    t2 = math.sqrt(PI/wavelength/z) * abs(p2_x-(b-w/2)*PS)
                    u_final[a, b] += 1/4 * (1+1j) * math.sqrt(2*wavelength*z) * 1/z * cmath.exp(1j*2*PI/wavelength*z) * cmath.exp(1j*PI/wavelength/z*(point1[1]-(a-h/2)*PS)**2) * (1/(1j-1) * math.sqrt(2/PI) * ((2*t2-1/5*t2**5+1/108*t2**9) + 1j*(-2/3*t2**3+1/21*t2**7+1/660*t2**11)) - 1/(1j-1) * math.sqrt(2/PI) * ((2*t1-1/5*t1**5+1/108*t1**9) + 1j*(-2/3*t1**3+1/21*t1**7+1/660*t1**11)))
                else: 
                    t1 = math.sqrt(PI/wavelength/z) * abs(p1_x-(b-w/2)*PS)
                    t2 = math.sqrt(PI/wavelength/z) * abs(p2_x-(b-w/2)*PS)
                    u_final[a, b] += 1/4 * (1+1j) * math.sqrt(2*wavelength*z) * 1/z * cmath.exp(1j*2*PI/wavelength*z) * cmath.exp(1j*PI/wavelength/z*(point1[1]-(a-h/2)*PS)**2) * (-1*math.copysign(1, t2)+ (1-1j)/2 * math.sqrt(2/PI) * cmath.exp(-1j*t2**2)/t2 - 1/(1j-1) * math.sqrt(2/PI) * ((2*t1-1/5*t1**5+1/108*t1**9) + 1j*(-2/3*t1**3+1/21*t1**7+1/660*t1**11)))
            else:
                if b < (p2_x+math.sqrt(wavelength*z/PS))/PS+w/2 and b > (p2_x-math.sqrt(wavelength*z/PS))/PS+w/2:
                    t1 = math.sqrt(PI/wavelength/z) * abs(p1_x-(b-w/2)*PS)
                    t2 = math.sqrt(PI/wavelength/z) * abs(p2_x-(b-w/2)*PS)
                    u_final[a, b] += 1/4 * (1+1j) * math.sqrt(2*wavelength*z) * 1/z * cmath.exp(1j*2*PI/wavelength*z) * cmath.exp(1j*PI/wavelength/z*(point1[1]-(a-h/2)*PS)**2) * (1/(1j-1) * math.sqrt(2/PI) * ((2*t2-1/5*t2**5+1/108*t2**9) + 1j*(-2/3*t2**3+1/21*t2**7+1/660*t2**11)) + math.copysign(1, t1)+ (1-1j)/2 * math.sqrt(2/PI) * cmath.exp(-1j*t1**2)/t1)
                else:
                    t1 = math.sqrt(PI/wavelength/z) * abs(p1_x-(b-w/2)*PS)
                    t2 = math.sqrt(PI/wavelength/z) * abs(p2_x-(b-w/2)*PS)
                    u_final[a, b] += 1/4 * (1+1j) * math.sqrt(2*wavelength*z) * 1/z * cmath.exp(1j*2*PI/wavelength*z) * cmath.exp(1j*PI/wavelength/z*(point1[1]-(a-h/2)*PS)**2) * (-1*math.copysign(1, t2)+ (1-1j)/2 * math.sqrt(2/PI) * cmath.exp(-1j*t2**2)/t2 + math.copysign(1, t1)+ (1-1j)/2 * math.sqrt(2/PI) * cmath.exp(-1j*t1**2)/t1)

h = 540
w = 960
TPB = 16
u_final = np.zeros((h, w), dtype=np.complex128)
threadsperblock = (TPB, TPB)
blockspergrid_x = math.ceil(u_final.shape[0]/threadsperblock[0])
blockspergrid_y = math.ceil(u_final.shape[1]/threadsperblock[1])
blockspergrid = (blockspergrid_x, blockspergrid_y)

u_final_gpu = cuda.to_device(u_final)
opti_field_line[blockspergrid, threadsperblock]([-1, -1], [1, -1], 100, u_final_gpu)

This is error:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/numba/core/lowering.py in lower_expr(self, resty, expr)
   1323                 # raise NotImplementedError if the types aren't supported
-> 1324                 impl = self.context.get_function("static_getitem", signature)
   1325                 return impl(self.builder,

38 frames
NotImplementedError: No definition for lowering static_getitem(List(int64, True), Literal[int](0)) -> int64

During handling of the above exception, another exception occurred:

NumbaRuntimeError                         Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/numba/core/runtime/context.py in _require_nrt(self)
     40     def _require_nrt(self):
     41         if not self._enabled:
---> 42             raise errors.NumbaRuntimeError("NRT required but not enabled")
     43 
     44     def _check_null_result(func):

NumbaRuntimeError: Failed in cuda mode pipeline (step: native lowering)
NRT required but not enabled
During: lowering "p1_x = static_getitem(value=point1, index=0, index_var=$const38.13, fn=<built-in function getitem>)" at <ipython-input-35-c57b7822beb9> (9)

newlearner_CUDA · February 5, 2024, 6:08pm

It runs pretty well in CPU mode. But I really want to transfer this to GPU to speed up the code. The equations to calculate each pixel’s value is really long and it contains 8 different situations. It is complex number also. Hope someone could help with this

DannyWeitekamp · February 5, 2024, 10:37pm

Seems like the error is related to passing point1 and point2 as lists. You might try passing numpy arrays or tuples instead. I’m not familiar with what does and doesn’t work with cuda, but I suppose it would make sense that cuda doesn’t like the List type since it is resizable. GPUs tend to want predictable preallocted memory to properly apply SIMD operations over, so it makes sense if all of the ListType wasn’t implemented. Additional you’ll probably eck out a little extra speed by avoiding lists since they don’t unbox into numba particularly efficiently at the moment.

newlearner_CUDA · February 5, 2024, 11:16pm

Thank you so much!!! According to your suggestion, I changed the point1 and point2 to np.array type. The code can run perfectly and fast right now. Appreciate your help again!

Topic		Replies	Views
NumbaRuntimeError: Failed in cuda mode pipeline NRT required but not enabled Support: What is this error message?	0	133	January 23, 2024
Error when multiplying device array elements with a scaler (mul(float32, array(float32, 0d, C))) Support: What is this error message?	4	602	September 17, 2022
Trying to implement Compare and swap element Development	1	712	March 22, 2021
Error: TypeError: 'DeviceFunctionTemplate' object is not callable Support: What is this error message?	0	481	November 22, 2021
String Argument in CUDA Signature Support: How do I do ...?	4	358	June 27, 2022

NotImplementedError: No definition for lowering static_getitem(List(int64, True), Literal[int](0)) -> int64

Related Topics