Trouble with Numba's guvectorize: how to define an output dimension not present in input?

I’m new to Numba and am attempting to vectorize a function. I’ve gone through the Generalized Universal Function API documentation but haven’t succeeded yet.

My function processes a 2D data matrix, another 2D matrix of (x, y) coordinates, and a 1D array for a coordinate pair (x0, y0). It creates a submatrix X using the local neighborhood of (x0, y0), decomposes it, and returns the first n_out singular values. My goal is to modify this function to accept an xy array of shape (N x 2).

import numpy as np
import numba

np.random.seed(4)
data = np.random.normal(size=(20, 5))
XY = np.random.normal(size=(20, 2))


@numba.njit(fastmath=True)
def euclidean_distance(XY, xy):
    """Compute the Euclidean distance between two arrays."""
    dist = np.zeros(XY.shape[0])
    for r in range(XY.shape[0]):
        d = 0
        for c in range(XY.shape[1]):
            d += (xy[c] - XY[r, c]) ** 2
        dist[r] = d
    return np.sqrt(dist)


@numba.njit(fastmath=True)
def vectorize_me(data, XY, xy, n_out: int = 3):
    distance = euclidean_distance(XY, xy)
    X = data[distance < 0.5]
    U, s, Vt = np.linalg.svd(X)
    return s[:n_out]


xy_sample = XY[0]
vectorize_me(data, XY, xy_sample, n_out=3)

While a simple loop with @numba.njit(parallel=True) could work for 2D coordinate pairs, I’m aiming to use xarray’s apply_ufunc, which I believe requires a guvectorize approach.

My current attempt looks like this:

@numba.guvectorize(
    [
        (
            numba.float64[:, :],
            numba.float64[:, :],
            numba.float64[:, :],
            numba.int64,
            numba.float64[:, :],
        )
    ],
    "(n,m),(n,o),(o),()->(n_out)",
)
def i_am_vectorized(data, XY, xy, n_out, result):
    distance = euclidean_distance(XY, xy)
    X = A[distance < 0.5]
    s = np.linalg.svd(X)
    for i in range(n_out):
        result[i] = s[i]


i_am_vectorized(A, XY, XY, n_out)

which results in a NameError: undefined output symbols: n_out.

However, citing from the GUF API mentioned above:

Typically, the size of all core dimensions in an output will be determined by the size of a core dimension with the same label in an input array. This is not a requirement, and it is possible to define a signature where a label comes up for the first time in an output, although some precautions must be taken when calling such a function. (emphasis is mine)

So I don’t really understand what I’m doing wrong :thinking:

Just realized that there is an open issue regarding this… :see_no_evil:

I used the workaround suggested in the open issue:

@numba.guvectorize(
    [
        (
            numba.float64[:, :],
            numba.float64[:, :],
            numba.float64[:, :],
            numba.int64,
            numba.float64[:, :],
        )
    ],
    "(n,m),(n,o),(o),(n_out)->(n_out)",
)
def i_am_vectorized(data, XY, xy, n_out, result):
    distance = euclidean_distance(XY, xy)
    X = data[distance < 0.5]
    U, s, Vt = np.linalg.svd(X)
    for i in range(n_out.shape[0]):
        result[i] = s[i]


n_out = 3
dum_n_out = np.ones(n_out)
i_am_vectorized(data, XY, XY, dum_n_out)

However, I now get

TypeError: type and shape signature mismatch for arg #3

What am I doing wrong here?

Hi @nicrie

Can you share the full code so others can run it? As it is known, it is quite hard to guess what you are doing. For example, A is not defined anywhere.

Unfortunately, I don’t know much about guvectorize but here is what I notice:
Are you also aware that np.linalg.svd, as you use it, returns a tuple of arrays? So with result[i] = s[i] you would try to set an array to which a scalar belongs. According to your string signature, result is one-dimensional, right? Also, your signature string does not match the type signature. As the error says, the third argument is defined as numba.float64[:, :], but the string signature says (o). I see similar problems with the fourth and fifth arguments.

Apologies @sschaer, and thank you for reviewing my issue! I posted the last message in haste yesterday and overlooked some inconsistencies with variable names. In my tests, I used compute_uv=False in np.linalg.svd, which is why there was only one return value. I’ve since corrected the response to align with my initial post.

You’re right about numba.float64[:, :]! I mistakenly combined core and loop dimensions, but I’ve now updated it to numba.float64[:]. I’ve also corrected arg4 and arg5 based on your feedback. Now it seems to work! :slight_smile:
Thanks again for your help!

For completeness, here’s the corrected code:

@numba.guvectorize(
    [
        (
            numba.float64[:, :],
            numba.float64[:, :],
            numba.float64[:],
            numba.int64[:],
            numba.float64[:],
        )
    ],
    "(n,m),(n,o),(o),(n_out)->(n_out)",
)
def i_am_vectorized(data, XY, xy, n_out, result):
    distance = euclidean_distance(XY, xy)
    X = data[distance < 2.5]
    U, s, Vt = np.linalg.svd(X)
    for i in range(n_out.shape[0]):
        result[i] = s[i]


n_out = 3
dum_n_out = np.ones(n_out, dtype=np.int64)
i_am_vectorized(data, XY, XY, dum_n_out)
``

Not fully related, but I thought it wouldn’t hurt to cross link another issue reference here. This concerns the use of literal integers in the shape definition: 'Bad token in signature' with @guvectorize · Issue #6690 · numba/numba · GitHub