How to pass a Numpy array of lists in @guvectorize function?

Hi all,

I have written several functions with @vectorize/guvectorize, it works very well. The performance is very good.
Now, I would like to pass as an argument a numpy array of lists…and I know it is not supported by Numba as it is an array of pyobjects.
In my code, I need an array that is able to contain data with different lengths. I am implementing a Physics problem where objects (molecules) can have a different number of components (eg H2O, C4H6O2, …) and I would like all this to be contained in a Numpy array. Each element of the array represents a molecule with its various components. So far, the idea that I had is to use an array declared as follows:
array = np.zeros(size, dtype = object)

This allows me to do something like this:
array[0] = [2,1]
array[1] = [4,6,2]

Each list corresponds to the components of a molecule.
However, now I need to pass this array as an argument of a @guvectorize function, and I now that Numba does not accept this type of data (pyobject).
Any idea? A work-around?
Or any alternative to replace this array of lists by something that could be accepted by Numba?

Many thanks in advance :-).
Christophe

Hi @xtof2020, would you be able to use a certain number to designate no data? Then you can normalize all the arrays to the same length. 2D numpy arrays are passable into @guvectorize.

For example, below I use -1.0 to designate no data in a simple sum function across a 2D array:

import numpy as np
from numba import guvectorize

@guvectorize(['f8[:,:], f8[:]'], '(m,n) -> ()')
def sum(array2d, result):
    m, n = array2d.shape
    tmp_result = 0.0
    for i in range(m):
        tmp = array2d[i]
        tmp = tmp[np.where(tmp != -1.0)]
        for val in tmp:
            tmp_result += val
    result[0] = tmp_result

nodata = -1.0
a = np.array((1.0, 2.0, 3.0, nodata))
b = np.array((1.0, 2.0, nodata, nodata))
c = np.array((1.0, nodata, nodata, nodata))
abc = np.vstack((a, b, c))

sum(abc)
10.0

BR,
Ryan

hi @xtof2020, if you are not able to create 2d arrays from your data, as suggested by @ryanchien, I recommend you look into Awkward array (https://github.com/scikit-hep/awkward-1.0). It was designed precisely for dealing with arrays that contain arrays of different lengths. It’s compatible with numba, so you can pass awkward arrays into jitted functions.

Do you really need guvectorize? Numba works well with explicit loops, you don’t need to “vectorize” to get speed. I don’t know if awkward array works with guvectorize.

Luk

Hi @ryanchien and @luk-f-a.
Thanks for your suggestions. Actually, since I sent my post yesterday, I have been working on an approach using 2d arrays, as suggested @ryanchien. I think that it is the simplest way. Since the data (uint) represents the number of atoms of a given type, I guess that using 0 as nodata would make it.
Nevertheless, the awkward arrays look also interesting. I will have a look at it because other parts of my code could use it…

BR
Xtof

1 Like