Implementing NumPy Functions using objmode

Hey all,

The “objmode” context-manager in Numba allows using Python objects and functionalities that are not supported in nopython mode. This includes calling NumPy functions that may be hard to implement efficiently or are not available as Numba implementation at all.

Some highly optimized and widely used NumPy functions, such as “np.sort”, could potentially benefit from being wrapped using objmode to improve performance compared to their current implementation in Numba.

Certainly there are reasons, why this method is not being used.
Why are certain NumPy functions not implemented to be directly callable within “objmode”?
What limitations or considerations prevent support for calling NumPy functions in “objmode” (i.e. releasing GIL)?

Some of the drawbacks of using objmode in any capacity are that switching in and out of objmode has pretty high overhead (objects need to be boxed/unboxed to translate between numba native data-structures and python objects), and last I checked objmode isn’t cacheable which can be extremely annoying in large projects.

I feel like the better question is why not inline numpy’s implementations directly instead of reimplementing them for numba. I’m not a dev so I don’t have a great answer for that, but I suspect that a big part of it is just that reimplementing the numpy functions helps numba do whole program-like optimizations like loop lifting… although I wonder if the same thing could be achieved by compiling numpy with clang and just injecting in its llvm. I imagine for most things some glue code would still be needed to interface between things.

Long story short objmode is a convenient escape hatch feature, but probably not something that should be used in standard implementations. Relying upon it to implement any standard functionality in numba would kind of be a bandaide fix for something potentially more elegant, and faster. Of course not all of those elegant faster things have been implemented yet so I guess some functions are in an awkward in between place.

1 Like

Hey @DannyWeitekamp,

You’ve brought up some really excellent points. Thank you very much.

Your observation about “objmode” serving as a band-aid solution is spot on. I am not sure if it would be technically possible to use it for this purpose. Nevertheless, I am curious.

If you are reimplementing NumPy’s functions, you’re often competing with highly optimized code.
Without sufficient resources, this will be a difficult task.
It would be great to have a mechanism that integrates NumPy functions directly without requiring a reimplementations. LLVM-based optimizations as you suggested would be elegant and performant.

Regarding “objmode”, the overhead of an implementation using “objmode” doesn’t seem to be too high if you are just using NumPy functions working on arrays.
You utilize the underlying memory views rather than the Python objects.
I haven’t verified if caching works with “objmode”. You can set “cache=True” but it might not work as expected. Using an “objmode” implementation within a parallel loop could be an issue due to the Global Interpreter Lock (GIL). But this might change in future Python versions.
There might be more issues preventing “objmode” to be an option.

Nevertheless, here is an example comparing NumPy’s sorting algorithm to Numba’s implementation and an “objmode” implementation. The overhead appears to be manageable, and performance remains similar for both smaller and larger arrays.

import timeit
import numpy as np
from numba import njit, objmode, types
from numba.extending import overload

# Numpy sort
def sort_np(a):
    return np.sort(a)

# Numba sort implementation
@njit
def sort_nb(a):
    return np.sort(a)

# overload numpy sort in objmode
def sort_nb_objmode(a):
    pass

@njit(cache=True)
def sort_in_objmode(a):
    out = np.empty_like(a)
    with objmode:
        out[:] = sort_np(a)
    return out

# Overloading np.sort
@overload(sort_nb_objmode)
def sort_nb_objmode_ovl(a):
    # Reject non-ndarray types
    if not isinstance(a, types.Array):
        raise TypeError("Only accepts NumPy ndarray")
    return lambda a: sort_in_objmode(a)

@njit
def apply_sort_nb_objmode(a):
    return sort_nb_objmode(a)

# warmup
a = np.random.uniform(-100, 100, size=10)
sort_nb(a)
apply_sort_nb_objmode(a)

# Comparison
def measure(size, repeat=10, number=5):
    # Prepare globals dictionary
    global a
    a = np.random.uniform(-100, 100, size=size)
    
    
    # Measure and print mean time for sort_np
    mean_time_np = sum(timeit.repeat('sort_np(a)', globals=globals(), repeat=repeat, number=number)) / repeat
    print(f"sort_np: {mean_time_np:.6f} seconds (mean of {repeat} runs, size={size})")
    
    # Measure and print mean time for sort_nb
    mean_time_nb = sum(timeit.repeat('sort_nb(a)', globals=globals(), repeat=repeat, number=number)) / repeat
    print(f"sort_nb: {mean_time_nb:.6f} seconds (mean of {repeat} runs, size={size})")
    
    # Measure and print mean time for apply_sort_nb_objmode
    mean_time_objmode = sum(timeit.repeat('apply_sort_nb_objmode(a)', globals=globals(), repeat=repeat, number=number)) / repeat
    print(f"sort_om: {mean_time_objmode:.6f} seconds (mean of {repeat} runs, size={size})")
    print()

# Testing with different sizes
measure(size=    1_000)
measure(size=   10_000)
measure(size=  100_000)
measure(size=1_000_000)

# sort_np: 0.000183 seconds (mean of 10 runs, size=1000)
# sort_nb: 0.000200 seconds (mean of 10 runs, size=1000)
# sort_om: 0.000241 seconds (mean of 10 runs, size=1000)

# sort_np: 0.002837 seconds (mean of 10 runs, size=10000)
# sort_nb: 0.003400 seconds (mean of 10 runs, size=10000)
# sort_om: 0.002847 seconds (mean of 10 runs, size=10000)

# sort_np: 0.036968 seconds (mean of 10 runs, size=100000)
# sort_nb: 0.044801 seconds (mean of 10 runs, size=100000)
# sort_om: 0.036775 seconds (mean of 10 runs, size=100000)

# sort_np: 0.433108 seconds (mean of 10 runs, size=1000000)
# sort_nb: 0.527904 seconds (mean of 10 runs, size=1000000)
# sort_om: 0.464917 seconds (mean of 10 runs, size=1000000)