I need to assign NumPy arrays to each element of a Numba typed.List and create a jagged/ragged list of NumPy arrays. I can successfully append NumPy arrays to the typed.List with:
from numba.typed import List
from numba import njit
import numba as nb
@njit()
def some_func(x, num):
for i in range(num):
x.append(np.random.rand(2 * i))
x = List.empty_list(nb.float64[:])
some_func(x, 5)
However, given that we know that the size of x will be, say, 5, I thought that I’d be able to construct the empty typed List ahead of time and then assign an array to each element of the List by replacing the inner for-loop of the function with an indexed assignment instead:
for i in range(num):
x[i] = np.random.rand(2 * i)
But it isn’t clear how I should construct x ahead of time. Maybe:
x = List.empty_list(nb.float64[:],
nb.float64[:],
nb.float64[:],
nb.float64[:],
nb.float64[:])
but this obviously doesn’t work. Any suggestions would be greatly appreciated!
In [7]: from numba.typed import List
In [8]: import numpy as np
In [9]: List([np.zeros(0), np.zeros(0), np.zeros(0), np.zeros(0), np.zeros(0)])
Out[9]: ListType[array(float64, 1d, C)]([[], [], [], [], [], ...])
Thank you, @esc! That worked. Just elaborating in case it may be helpful for others:
from numba.typed import List
from numba import njit
import numpy as np
@njit(fastmath=True)
def some_func(x, num):
for i in range(num):
x[i] = np.random.rand(2 * i)
num = 5
x = List([np.empty(0) for _ in range(num)])
some_func(x, num)
@esc I am now trying to pass this numba.typed.List to dask but receiving a:
TypeError: ('Could not serialize object of type List.', '[[], [], [], [], [], [], [], [], [], []]')
I’m not sure if this should be resolved in dask or in numba but I’ve posted a question to the dask issue tracker. Here is an example that reproduces the undesirable behaviour:
import numpy as np
from dask.distributed import Client, LocalCluster
from numba.typed import List
from numba import njit
@njit()
def _some_func(T, A):
# Do something with T and A
# Maybe fill A with something like
# for i in range(len(T)):
# A[i] = T[ : i]
return A
if __name__ == "__main__":
dask_cluster = LocalCluster(n_workers=2, threads_per_worker=2)
T = np.random.rand(10)
n = T.shape[0]
A = List([np.empty(0, dtype=np.float64) for _ in range(n)])
with Client(dask_cluster) as dask_client:
T_future = dask_client.scatter(T, broadcast=True, hash=False)
A_future = dask_client.scatter(A, broadcast=True, hash=False)
future = dask_client.submit(
_some_func,
T_future,
A_future,
)
results = dask_client.gather(future)
dask_client.cancel(T_future)
dask_client.cancel(A_future)
Would you happen to have any suggestions to overcome this?
Yeah, that’s what I understood as well but I don’t have enough knowledge to serialize it. At the end of the day, I’m trying to created a jagged list of numpy arrays and figured that typed.List would work. However, it looks like I’m going to need to use a slightly different data structure (maybe a simple Python list of numpy arrays - this seems to be okay in dask).
Thank you @esc. I did come across it in my initial exploration but, alas, we trying to reduce/minimize our dependencies since this will be part of a package that is used by many people.