Using guvectorize inside a jitted function

A beautiful feature of guvectorize in Numba is that it allows you to define a function that operates on arrays element-wise across different dimensions using the axis parameter. This flexibility enables you to write a single function that can handle arrays of varying shapes and sizes efficiently.

However, the limitation of guvectorize is that it cannot be used directly within a jitted function. This is because Numba’s JIT compilation relies on static type inference, while guvectorize requires dynamic dispatch to handle different data types and shapes. Therefore, using guvectorize inside a jitted function results in errors.

There is an open issue “Ensure that jit, vectorize and guvectorize functions can mutually call each other” opened on Sep 3, 2016. The latest status change in this issue was to remove “highpriority” on Nov 8, 2022. The implementation of this feature seems to be difficult and time consuming.

If you want to use a guvectorize function within a jitted function, you would need to refactor and duplicate the code to adapt it to a format that can be used with Numba’s jit decorator.
Is there any other more efficient and maintainable workaround?
Can you i.e. provide type hinting within the outer jit function to make use of a guvectorize as an inner function?

Here is a basic example:


import numpy as np
from numba import guvectorize, jit

@guvectorize(['(float64[:], int64, float64[:])'], '(n),()->(n)')
def guvec(a: np.ndarray, add: int, out: np.ndarray):
    """Generate ufunc."""
    for i in range(a.size):
        out[i] = a[i] + add

@jit('float64[:](float64[:], int64)')
def jit_guvec(a: np.ndarray, add: int) -> np.ndarray:
    """Use ufunc as inner func in jit."""
    return guvec(a, add)

# NumbaWarning:
# Compilation is falling back to object mode WITH looplifting enabled because
# Function "jit_guvec" failed type inference due to: Untyped global name 'guvec':
#     Cannot determine Numba type of <class 'numba.np.ufunc.gufunc.GUFunc'>

Just FYI, I am currently working on pull request #8984 that adds support for calling @guvectorize functions inside @jit

2 Likes

That is great to hear. I appreciate your effort.
May I ask you for some further information regarding the topic of guvectorize?

I love the concept of guvectorize or numpy ufuncs in general because it enables you to write a single function that can handle arrays of varying shapes and sizes efficiently. It is very convenient as a user, it makes the code leaner and easier to maintain.

But there are limitations that restrict the flexibility and ease of use when using guvectorize compared to the jit compilation approach in Numba. guvectorize functions cannot be used within jitted functions, they produce positional arguments instead of keyword arguments, and they do not support default arguments. These limitations can impact the modularity, readability, and reusability of code that relies on guvectorize functions.

Issue 1: guvectorize functions cannot be used within jitted functions
guvectorize functions are defined as ufuncs, which are specialized functions for performing element-wise operations on arrays. These ufuncs have a different calling convention and type signature compared to regular functions, which makes it challenging for Numba to infer their types accurately. Therefore, using a guvectorize function within a jitted function results in a compilation error.
→ soon to be history :wink:

Issue 2: positional arguments instead of keyword arguments
When using guvectorize, it produces positional arguments instead of keyword arguments. This means that the arguments need to be passed in the order defined in the function signature, and you cannot use keyword arguments to specify the values. This limitation can make the code less readable and more error-prone, especially when dealing with functions that have multiple arguments and/or optional arguments.

Issue 3: no support for default arguments
Another limitation of guvectorize is that it does not support default arguments. This limitation can be inconvenient when you want to reuse the same guvectorize function with different default values or when you want to make certain arguments optional.

I would really like to use guvectorize more often because it is a beautiful concept that is very beneficial for a user (maybe not for the developer).
Would you agree with the above mentioned description of limitations or are there already solutions in place which I may have not discovered already.

Below are some basic code snippets showing issues 1-3.

> 
> import numpy as np
> import numba as nb
> 
> 
> @nb.guvectorize(['(float64[:], float64, float64[:])'], '(n),()->(n)')
> def add(a: np.ndarray, plus: float, out: np.ndarray):
>     """Add a floating number to an array element-wise."""
>     for i in range(a.size):
>         out[i] = a[i] + plus
> 
> 
> # *********************************************************************
> # Issue 1: guvectorize function can not be used in jitted function
> # *********************************************************************
> 
> @nb.jit(['float64[:](float64[:], float64)'])
> def jit_guvec(a: np.ndarray, plus: float) -> np.ndarray:
>     """Use ufunc as inner func in jit."""
>     return add(a, plus)
> # NumbaWarning:
> # Compilation is falling back to object mode WITH looplifting enabled because
> # Function "jit_guvec" failed type inference due to: Untyped global name 'add':
> # Cannot determine Numba type of <class 'numba.np.ufunc.gufunc.GUFunc'>
> 
> 
> # *********************************************************************
> # Issue 2: guvectorize produces positional instead of keyword arguments
> # *********************************************************************
> 
> a = np.arange(10.)
> print(a)
> # [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
> print(add(a, 10))
> # [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
> print(add(a=a, plus=10))
> # TypeError: add() takes from 2 to 3 positional arguments but 0 were given
> 
> 
> # *********************************************************************
> # Issue 3: No default arguments in guvectorize
> # *********************************************************************
> 
> @nb.njit(['float64[:](float64[:], float64)',
>           'float64[:](float64[:], Omitted(None))',
>           'float64[:](float64[:], none)'])
> def add_default_njit(a: np.ndarray, plus: float = None) -> np.ndarray:
>     """Create an array and add a number."""
>     out = np.empty_like(a)
>     plus = plus or 10.0
>     for i in range(a.size):
>         out[i] = a[i] + plus
>     return out
> 
> a = np.arange(10.)
> print(a)
> # [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
> print(add_default_njit(a, plus=100))
> # [100. 101. 102. 103. 104. 105. 106. 107. 108. 109.]
> print(add_default_njit(a, plus=None))
> # [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
> print(add_default_njit(a))
> # [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
> # njit is OK
> 
> @nb.guvectorize([
>     '(float64[:], float64, float64[:])',
>     '(float64[:], Omitted(None), float64[:])',
>     '(float64[:], none, float64[:])'],
>     '(n),()->(n)')
> def add_default_guv(a: np.ndarray, plus: float = None, out: np.ndarray = None):
>     """Create an array and add a number."""
>     plus = plus or 10.0
>     for i in range(a.size):
>         out[i] = a[i] + plus
> # NumbaNotImplementedError: omitted(default=None) cannot be represented as a
> # NumPy dtype

Regarding issue 2 & 3 you mention; I usually still wrap a guvectorize function in a normal Python function to alleviate some of those things. There are additional issues like preserving docstrings, (Python) type hints etc which are useful for things like automatic generation of API documentation or other things requiring the introspection that plain Python provides.

1 Like

You suggest (if required) wrapping the guvectorize function in a regular Python function to overcome some challenges associated with using guvectorize like docstrings and type hints, which are essential for tasks like generating API documentation. You can deal with optional parameters in the Python wrapper function instead of the ufunc, too.

That is a very good point and a practical solution.

Below is an example:

import numpy as np
from numba import guvectorize


@guvectorize(['(float64[:], float64, float64[:])'], '(n),()->(n)', target='cpu')
def square_root(x: np.ndarray, plus: float, result: np.ndarray = None) -> None:
    """Compute the square root of each element in the input array.

    This function is a numpy ufunc generated using numba.guvectorize.

    Args:
        x (numpy.ndarray): Input array to be computed. Shape: (n)
        plus (float): Add a scalar.
        result (numpy.ndarray, optional): Output array to store computed values. Shape: (n)

    Returns:
        If "result" is provided then "result" will be modified inplace.
        The type annotation for the function's output is None. Nevertheless,
        numpy ufuncs return an output array anyway.

    Examples:
        x = np.array([1.0, 4.0, 9.0, 16.0])
        result = np.zeros_like(x)
        square_root(x, 0, result)
        print(result)
        # [1. 2. 3. 4.]
    """
    for i in range(x.size):
        result[i] = np.sqrt(x[i]) + plus


def awesome_square_root(x: np.ndarray, plus: float = 10) -> np.ndarray:
    """Compute the square root of x in an even more awesome way.

    Wrapper for function square_root with more precise type hinting and optional
    parameter.

    Args:
        x (numpy.ndarray): Input array to be computed. Shape: (n)
        plus (float, optional): Add a scalar. Defaults to 10.

    Returns:
        numpy.ndarray: Awesome output array Shape: (n)

    Examples:
        x = np.array([1.0, 4.0, 9.0, 16.0])
        result = awesome_square_root(x)
        print(result)
        # [11. 12. 13. 14.]
    """
    return square_root(x, plus)

That is also a useful method if you have multiple optional parameters within the function signature and
want to avoid evaluating all possible scenarios.
It already gets messy pretty quickly with just 2 optional parameters…

@jit(['float64[:](float64[:], int64, int64)',
      'float64[:](float64[:], int64, Omitted(None))',
      'float64[:](float64[:], int64, none)',
      'float64[:](float64[:], Omitted(None), int64)',
      'float64[:](float64[:], Omitted(None), Omitted(None))',
      'float64[:](float64[:], Omitted(None), none)',
      'float64[:](float64[:], none, int64)',
      'float64[:](float64[:], none, Omitted(None))',
      'float64[:](float64[:], none, none)'])
def some_function(a: np.ndarray, x: int = None, y: int = None) -> np.ndarray:
    x = x or 99
    y = y or -99
    ...

To avoid this the wrapper function uses the default/optional parameters while the jitted function gets all parameters provided by the wrapper. In that case the required number of numba signatures can be reduced, too.
The problem here is that you would have to specify default parameters in multiple locations if you want to reuse ufuncs or jitted funcs in other jitted functions.
That can be challenging for maintaining your code.

Thank you for the proposal @Rutger .

Yes there’s definitely some drawbacks as well, especially regarding maintenance, depending on the size of your code base and frequency of updating etc. I would also pass a **kwargs along so you preserve some of the keywords that you’ll get for free with guvectorize. Personally I especially use the axis=x keyword a lot, it’s only relevant if you want to to be flexible on which axis you apply the function.

IIRC the njit parameter “attaches” the original python function to the py_func attribute of the decorated result. But guvectorized & vectorize don’t, which is why I wrap. Perhaps not the prettiest solution, but it gets things done for me.

1 Like

Passing the axis keyword parameter to ufuncs is like magic. Love it (example 1).
Unfortunately user defines parameters are no keyword parameters.
I was not aware of the pyfunc attribute. Thank you for mentioning (example 2 & 3).

import numpy as np
from numba import guvectorize, vectorize, jit

# ****************************************************************
# Example 1: Passing axis in **kwargs to preserve keywords.
# ****************************************************************

@guvectorize(['void(float64[:], float64[:])'], '(n)->(n)')
def csum(arr, out):
    out[:] = np.cumsum(arr)

def awesome_csum(arr, **kwargs):
    return csum(arr, **kwargs)

arr = np.arange(1., 5.).reshape((2, 2))
out_ax0 = awesome_csum(arr, axis=0)
out_ax1 = awesome_csum(arr, axis=1)
print('input:', '\n', arr, '\n')
print('csum axis 0:', '\n', out_ax0, '\n')
print('csum axis 1:', '\n', out_ax1, '\n')

# input:
#  [[1. 2.]
#  [3. 4.]]

# csum axis 0:
#  [[1. 2.]
#  [4. 6.]]

# csum axis 1:
#  [[1. 3.]
#  [3. 7.]]


# ****************************************************************
# Example 2:py_func no attribute of vectorize => AttributeError
# ****************************************************************

@vectorize(['float64(float64, float64)'])
def add_vectorized(a: float, b: float):
    return a + b

def add_wrapper(a: float, b: float):
    return add_vectorized.py_func(a, b)

arr = np.arange(1., 5.).reshape((2, 2))
add_vectorized(arr, arr)
# array([[2., 4.],
#        [6., 8.]])

add_wrapper(2, 3)
# AttributeError: 'DUFunc' object has no attribute 'py_func'


# ****************************************************************
# Example 3:py_func is attribute of jitted function => SUCCESS
# ****************************************************************

@jit
def add_jit(a: float, b: float):
    return a + b

def add_jit_wrapper(a: float, b: float):
    return add_jit.py_func(a, b)

# Example usage
result_jit = add_jit(2, 3)
result_wrapper = add_jit_wrapper(2, 3)
print(result_jit)
print(result_wrapper)
# 5
# 5
1 Like

Hello @guilherme, is the work on pull request #8984 that adds support for calling @guvectorize functions inside @jit also dealing with the related topic of calling @guvectorize functions inside `@guvectorize?

Hi @Oyibo, yes! The following works in #8984

from numba import njit, guvectorize
import numpy as np


@guvectorize('(n),()->(n)')
def foo(a, b, res):
    for i in range(a.shape[0]):
        res[i] = a[i] + b


@guvectorize('(n),()->(n)')
def bar(a, b, res):
    foo(a, b, res)

@njit
def baz(a, b, res):
    bar(a, b, res)


a = np.arange(5)
b = 3
res = np.zeros_like(a)
baz(a, b, res)
print(res)
2 Likes

Hey @guilherme,

That is amazing. I can’t wait to use it.
Great job!!!