How to pass a Numpy array of lists in @guvectorize function?

xtof2020 · October 15, 2020, 12:32pm

Hi all,

I have written several functions with @vectorize/guvectorize, it works very well. The performance is very good.
Now, I would like to pass as an argument a numpy array of lists…and I know it is not supported by Numba as it is an array of pyobjects.
In my code, I need an array that is able to contain data with different lengths. I am implementing a Physics problem where objects (molecules) can have a different number of components (eg H2O, C4H6O2, …) and I would like all this to be contained in a Numpy array. Each element of the array represents a molecule with its various components. So far, the idea that I had is to use an array declared as follows:
array = np.zeros(size, dtype = object)

This allows me to do something like this:
array[0] = [2,1]
array[1] = [4,6,2]

Each list corresponds to the components of a molecule.
However, now I need to pass this array as an argument of a @guvectorize function, and I now that Numba does not accept this type of data (pyobject).
Any idea? A work-around?
Or any alternative to replace this array of lists by something that could be accepted by Numba?

Many thanks in advance :-).
Christophe

ryanchien · October 15, 2020, 5:20pm

Hi @xtof2020, would you be able to use a certain number to designate no data? Then you can normalize all the arrays to the same length. 2D numpy arrays are passable into @guvectorize.

For example, below I use -1.0 to designate no data in a simple sum function across a 2D array:

import numpy as np
from numba import guvectorize

@guvectorize(['f8[:,:], f8[:]'], '(m,n) -> ()')
def sum(array2d, result):
    m, n = array2d.shape
    tmp_result = 0.0
    for i in range(m):
        tmp = array2d[i]
        tmp = tmp[np.where(tmp != -1.0)]
        for val in tmp:
            tmp_result += val
    result[0] = tmp_result

nodata = -1.0
a = np.array((1.0, 2.0, 3.0, nodata))
b = np.array((1.0, 2.0, nodata, nodata))
c = np.array((1.0, nodata, nodata, nodata))
abc = np.vstack((a, b, c))

sum(abc)
10.0

BR,
Ryan

luk-f-a · October 15, 2020, 7:04pm

hi @xtof2020, if you are not able to create 2d arrays from your data, as suggested by @ryanchien, I recommend you look into Awkward array (https://github.com/scikit-hep/awkward-1.0). It was designed precisely for dealing with arrays that contain arrays of different lengths. It’s compatible with numba, so you can pass awkward arrays into jitted functions.

Do you really need guvectorize? Numba works well with explicit loops, you don’t need to “vectorize” to get speed. I don’t know if awkward array works with guvectorize.

Luk

xtof2020 · October 16, 2020, 7:15am

Hi @ryanchien and @luk-f-a.
Thanks for your suggestions. Actually, since I sent my post yesterday, I have been working on an approach using 2d arrays, as suggested @ryanchien. I think that it is the simplest way. Since the data (uint) represents the number of atoms of a given type, I guess that using 0 as nodata would make it.
Nevertheless, the awkward arrays look also interesting. I will have a look at it because other parts of my code could use it…

BR
Xtof

Topic		Replies	Views
Numba Array of Lists Support: How do I do ...?	5	1797	July 1, 2021
Passing a list of numpy arrays into np.array with numba Support: How do I do ...?	2	6496	October 16, 2020
Using Numpy arrays to @guvectorize function Support: How do I do ...?	0	156	March 6, 2024
List of different sized arrays as a parameter Community Support	5	2532	June 18, 2020
Is it possible to pass a np.recarray to a guvectorized function? Support: How do I do ...?	1	377	July 23, 2022

How to pass a Numpy array of lists in @guvectorize function?

Related topics