Assigning NumPy Arrays to a typed.List

I need to assign NumPy arrays to each element of a Numba typed.List and create a jagged/ragged list of NumPy arrays. I can successfully append NumPy arrays to the typed.List with:

from numba.typed import List
from numba import njit
import numba as nb

@njit()
def some_func(x, num):
    for i in range(num):
        x.append(np.random.rand(2 * i))


x = List.empty_list(nb.float64[:])
some_func(x, 5)

However, given that we know that the size of x will be, say, 5, I thought that I’d be able to construct the empty typed List ahead of time and then assign an array to each element of the List by replacing the inner for-loop of the function with an indexed assignment instead:

for i in range(num):
    x[i] = np.random.rand(2 * i)

But it isn’t clear how I should construct x ahead of time. Maybe:

x = List.empty_list(nb.float64[:], 
                    nb.float64[:], 
                    nb.float64[:], 
                    nb.float64[:], 
                    nb.float64[:])

but this obviously doesn’t work. Any suggestions would be greatly appreciated!

How about:

In [7]: from numba.typed import List

In [8]: import numpy as np

In [9]: List([np.zeros(0), np.zeros(0), np.zeros(0), np.zeros(0), np.zeros(0)])
Out[9]: ListType[array(float64, 1d, C)]([[], [], [], [], [], ...])

Thank you, @esc! That worked. Just elaborating in case it may be helpful for others:

from numba.typed import List
from numba import njit
import numpy as np

@njit(fastmath=True)
def some_func(x, num):
    for i in range(num):
        x[i] = np.random.rand(2 * i)

num = 5
x = List([np.empty(0) for _ in range(num)])
some_func(x, num)

@esc I am now trying to pass this numba.typed.List to dask but receiving a:

TypeError: ('Could not serialize object of type List.', '[[], [], [], [], [], [], [], [], [], []]')

I’m not sure if this should be resolved in dask or in numba but I’ve posted a question to the dask issue tracker. Here is an example that reproduces the undesirable behaviour:

import numpy as np
from dask.distributed import Client, LocalCluster
from numba.typed import List
from numba import njit

@njit()
def _some_func(T, A):
    # Do something with T and A
    # Maybe fill A with something like
    # for i in range(len(T)):
    #     A[i] = T[ : i]
    return A


if __name__ == "__main__":
    dask_cluster = LocalCluster(n_workers=2, threads_per_worker=2)
    T = np.random.rand(10)
    n = T.shape[0]
    A = List([np.empty(0, dtype=np.float64) for _ in range(n)])

    with Client(dask_cluster) as dask_client:
        T_future = dask_client.scatter(T, broadcast=True, hash=False)
        A_future = dask_client.scatter(A, broadcast=True, hash=False)

        future = dask_client.submit(
            _some_func,
            T_future,
            A_future,
        )

        results = dask_client.gather(future)
        dask_client.cancel(T_future)
        dask_client.cancel(A_future)

Would you happen to have any suggestions to overcome this?

My guess would be that the typed List can’t be pickled perhaps. I am not sure there is a good workaround.

Yeah, that’s what I understood as well but I don’t have enough knowledge to serialize it. At the end of the day, I’m trying to created a jagged list of numpy arrays and figured that typed.List would work. However, it looks like I’m going to need to use a slightly different data structure (maybe a simple Python list of numpy arrays - this seems to be okay in dask).

Maybe you need Awkward Array: Documentation — Awkward Array documentation

Thank you @esc. I did come across it in my initial exploration but, alas, we trying to reduce/minimize our dependencies since this will be part of a package that is used by many people.