Hey @DannyWeitekamp,
You’ve brought up some really excellent points. Thank you very much.
Your observation about “objmode” serving as a band-aid solution is spot on. I am not sure if it would be technically possible to use it for this purpose. Nevertheless, I am curious.
If you are reimplementing NumPy’s functions, you’re often competing with highly optimized code.
Without sufficient resources, this will be a difficult task.
It would be great to have a mechanism that integrates NumPy functions directly without requiring a reimplementations. LLVM-based optimizations as you suggested would be elegant and performant.
Regarding “objmode”, the overhead of an implementation using “objmode” doesn’t seem to be too high if you are just using NumPy functions working on arrays.
You utilize the underlying memory views rather than the Python objects.
I haven’t verified if caching works with “objmode”. You can set “cache=True” but it might not work as expected. Using an “objmode” implementation within a parallel loop could be an issue due to the Global Interpreter Lock (GIL). But this might change in future Python versions.
There might be more issues preventing “objmode” to be an option.
Nevertheless, here is an example comparing NumPy’s sorting algorithm to Numba’s implementation and an “objmode” implementation. The overhead appears to be manageable, and performance remains similar for both smaller and larger arrays.
import timeit
import numpy as np
from numba import njit, objmode, types
from numba.extending import overload
# Numpy sort
def sort_np(a):
return np.sort(a)
# Numba sort implementation
@njit
def sort_nb(a):
return np.sort(a)
# overload numpy sort in objmode
def sort_nb_objmode(a):
pass
@njit(cache=True)
def sort_in_objmode(a):
out = np.empty_like(a)
with objmode:
out[:] = sort_np(a)
return out
# Overloading np.sort
@overload(sort_nb_objmode)
def sort_nb_objmode_ovl(a):
# Reject non-ndarray types
if not isinstance(a, types.Array):
raise TypeError("Only accepts NumPy ndarray")
return lambda a: sort_in_objmode(a)
@njit
def apply_sort_nb_objmode(a):
return sort_nb_objmode(a)
# warmup
a = np.random.uniform(-100, 100, size=10)
sort_nb(a)
apply_sort_nb_objmode(a)
# Comparison
def measure(size, repeat=10, number=5):
# Prepare globals dictionary
global a
a = np.random.uniform(-100, 100, size=size)
# Measure and print mean time for sort_np
mean_time_np = sum(timeit.repeat('sort_np(a)', globals=globals(), repeat=repeat, number=number)) / repeat
print(f"sort_np: {mean_time_np:.6f} seconds (mean of {repeat} runs, size={size})")
# Measure and print mean time for sort_nb
mean_time_nb = sum(timeit.repeat('sort_nb(a)', globals=globals(), repeat=repeat, number=number)) / repeat
print(f"sort_nb: {mean_time_nb:.6f} seconds (mean of {repeat} runs, size={size})")
# Measure and print mean time for apply_sort_nb_objmode
mean_time_objmode = sum(timeit.repeat('apply_sort_nb_objmode(a)', globals=globals(), repeat=repeat, number=number)) / repeat
print(f"sort_om: {mean_time_objmode:.6f} seconds (mean of {repeat} runs, size={size})")
print()
# Testing with different sizes
measure(size= 1_000)
measure(size= 10_000)
measure(size= 100_000)
measure(size=1_000_000)
# sort_np: 0.000183 seconds (mean of 10 runs, size=1000)
# sort_nb: 0.000200 seconds (mean of 10 runs, size=1000)
# sort_om: 0.000241 seconds (mean of 10 runs, size=1000)
# sort_np: 0.002837 seconds (mean of 10 runs, size=10000)
# sort_nb: 0.003400 seconds (mean of 10 runs, size=10000)
# sort_om: 0.002847 seconds (mean of 10 runs, size=10000)
# sort_np: 0.036968 seconds (mean of 10 runs, size=100000)
# sort_nb: 0.044801 seconds (mean of 10 runs, size=100000)
# sort_om: 0.036775 seconds (mean of 10 runs, size=100000)
# sort_np: 0.433108 seconds (mean of 10 runs, size=1000000)
# sort_nb: 0.527904 seconds (mean of 10 runs, size=1000000)
# sort_om: 0.464917 seconds (mean of 10 runs, size=1000000)