NumbaPendingDeprecationWarning reflected list

steff · August 31, 2021, 5:46am

Am trying to njit the following function but getting a NumbaPendingDeprecationWarning which I am not sure how to best resolve. Your help would be most appreciated.

import numpy as np
from typing import List, Union

@njit()
def minmax(data: np.array, thresholds: List[Union[float, float]]) -> np.array:
    x = 0
    while x < len(data):
        if data[x] > 0:
            if data[x] > thresholds[0]:
                data[x] = thresholds[0]
            elif data[x] < 1:
                data[x] = 1
        elif data[x] < 0:
            if data[x] < thresholds[1]:
                data[x] = thresholds[1]
            elif data[x] > -1:
                data[x] = -1
        x += 1
    return data

data = np.random.uniform(low=-3, high=3, size=(1000,))
minmax(data, [2., -2.])

gives me the following traceback:

/Users/steffen/mambaforge/lib/python3.9/site-packages/numba/core/ir_utils.py:2119: NumbaPendingDeprecationWarning: 
Encountered the use of a type that is scheduled for deprecation: type 'reflected list' found for argument 'thresholds' of function 'minmax'.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types

File "../../../var/folders/xt/b71lsdgd3g1dhycp2fw80fy80000gn/T/ipykernel_92025/2811979115.py", line 3:
<source missing, REPL/exec in use?>

  warnings.warn(NumbaPendingDeprecationWarning(msg, loc=loc))

also tried with from numba.typed import List with same result.
Python 3.9.6
Numba 0.53.1
Numpy 1.21.2

Rutger · August 31, 2021, 6:42am

It has to do with converting a Python list to a Numba List, so when converting your input arguments to Numba. You current call uses a “plain” Python list as the argument: minmax(data, [2., -2.]).

It should be resolved by calling the function with something like:

input_list = numba.typed.List([2., 2.])
minmax(data, input_list)

This automatically casts to float64 from the Python floats, but if the values in the List are already Numba datatypes, Numba will use that. So in order to use float32 for example, you could use:

input_list = numba.typed.List([numba.float32(2.), numba.float32(-2.)])

Off-topic:
Btw, your type annotation wont do anything regarding this issue, but it’s fine to have it of course. And I don’t think Union[float, float], does anything, since both are the same type. A Python List has a dynamic size (by definition), so doing List[float] is enough to specify it’s type. For something immutable like a tuple you can specify the size with Tuple[float, float] for size two, versus Tuple[float, ...] for “any” size, etc.

steff · August 31, 2021, 7:05am

Thanks @Rutger. works and makes sense regarding the typing.
It is just not quite as fast as I hoped. any suggestions for further improvements would be appreciated.

Hannes · August 31, 2021, 9:42am

Hi @steff

first things first: @Rutger already said this, but I think it does not hurt to make this clear once more:

Numba does not care at all about your type annotations. (but they are of course considered good style by many)
numba.typed.List is an actual container, whereas typing.List is a type. They are very different things, but easy to mix up

Beyond that (not meant to be harsh):

I don’t think minmax is an appropriate name for that function (given what it does).
The function seems to have a few logical “gaps” (e.g. data[x] == 0) Unless the lack of processing the intentional
len(data) might not do what you expect once data has more than 1 dimension
It looks like for cleaner code, that while loop could be a for loop?
When judging the speed of your function, keep in mind that the first call will trigger compilation and requires extra time compared to all subsequent calls
Your function is both mutating data in place and returning it, I personally find that a bit odd, but okay

With that said, here are a few ideas of how one could deal with this in alternative ways (timings listed below)

import numpy as np
from numba import njit # Missing import in original MWE
from numba import vectorize, prange
# Union[float, float] = float -> Union does nothing here
# There is no need for thresholds to be a List, let's be more general
from typing import Sequence


@njit
def minmax_orig(data: np.array, thresholds: Sequence[float]) -> np.array:
    x = 0
    while x < len(data):
        if data[x] > 0:     # What happens if data[x] == 0 ??
            if data[x] > thresholds[0]: 
                data[x] = thresholds[0]
            elif data[x] < 1: # What happens if a threshold is in (-1,1)?
                data[x] = 1
        elif data[x] < 0:
            if data[x] < thresholds[1]:
                data[x] = thresholds[1]
            elif data[x] > -1:
                data[x] = -1
        x += 1
    return data

@njit
def minmax_scalar_args(data: np.array, lower_thr: float, upper_thr: float) -> np.array:
    x = 0
    while x < len(data):
        if data[x] > 0:
            if data[x] > lower_thr:
                data[x] = lower_thr
            elif data[x] < 1:
                data[x] = 1
        elif data[x] < 0:
            if data[x] < upper_thr:
                data[x] = upper_thr
            elif data[x] > -1:
                data[x] = -1
        x += 1
    return data

@njit(parallel=True)
def minmax_parfor(data: np.array, lower_thr: float, upper_thr: float) -> np.array:
    for x in prange(len(data)):
        if data[x] > 0:
            if data[x] > lower_thr:
                data[x] = lower_thr
            elif data[x] < 1:
                data[x] = 1
        elif data[x] < 0:
            if data[x] < upper_thr:
                data[x] = upper_thr
            elif data[x] > -1:
                data[x] = -1
    return data

@vectorize
def minmax_vec(data: np.array, lower_thr: float, upper_thr: float) -> np.array:
    if data > 0:
        if data > lower_thr:
            data = lower_thr
        elif data < 1:
            data = 1
    elif data < 0:
        if data < upper_thr:
            data = upper_thr
        elif data > -1:
            data = -1
    return data


data = np.random.uniform(low=-3, high=3, size=(1_000_000,))
small_data = data[:10]

# Trigger compilation for all testcases
minmax_orig(small_data, [2., -2.])
minmax_orig(small_data, (2., -2.))
minmax_scalar_args(small_data, 2., -2.)
minmax_parfor(small_data, 2., -2.)
minmax_vec(small_data, 2., -2.)

from timeit import timeit

for size in [10, 1_000, 1_000_000]:
    print(30*"+")
    print(f"{size} elements:\n")

    for name, expr in [
        ("Original implementation with list arg", lambda: minmax_orig(data[:size], [-2., 2.])),
        ("Original implementation with tuple arg", lambda: minmax_orig(data[:size], (-2., 2.))),
        ("Implementation with scalar threshold arg", lambda: minmax_scalar_args(data[:size], -2., 2.)),
        ("Parfor loop with scalar threshold arg", lambda: minmax_parfor(data[:size], -2., 2.)),
        ("Numba vectorised scalar threshold arg", lambda: minmax_vec(data[:size], -2., 2.))
    ]:
        t = timeit(expr, number=100)
        print(name.ljust(50), f"{t:.4e} s")

OUTPUT (timings are of course specific to my machine):

/home/hapahl/anaconda3/lib/python3.8/site-packages/numba/core/ir_utils.py:2031: NumbaPendingDeprecationWarning:
Encountered the use of a type that is scheduled for deprecation: type 'reflected list' found for argument 'thresholds' of function 'minmax_orig'.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types

File "numba_discourse_880.py", line 10:
@njit
def minmax_orig(data: np.array, thresholds: Sequence[float]) -> np.array:
^

  warnings.warn(NumbaPendingDeprecationWarning(msg, loc=loc))
++++++++++++++++++++++++++++++
10 elements:

Original implementation with list arg              5.9060e-04 s
Original implementation with tuple arg             6.4200e-05 s
Implementation with scalar threshold arg           6.2100e-05 s
Parfor loop with scalar threshold arg              2.2654e-03 s
Numba vectorised scalar threshold arg              3.5370e-04 s
++++++++++++++++++++++++++++++
1000 elements:

Original implementation with list arg              1.6321e-03 s
Original implementation with tuple arg             2.3860e-04 s
Implementation with scalar threshold arg           2.4910e-04 s
Parfor loop with scalar threshold arg              2.1563e-03 s
Numba vectorised scalar threshold arg              6.4270e-04 s
++++++++++++++++++++++++++++++
1000000 elements:

Original implementation with list arg              4.5491e-01 s
Original implementation with tuple arg             4.3003e-01 s
Implementation with scalar threshold arg           4.2444e-01 s
Parfor loop with scalar threshold arg              9.4272e-02 s
Numba vectorised scalar threshold arg              4.6147e-01 s

Hope that helps a little

steff · August 31, 2021, 10:05am

Thank you for your elaborate answer @Hannes. Much appreciated. All good points indeed. The mutating nature I was also contemplating. Suppose it isn’t best practise, hence will change it. data[x] == 0 is intentionally not considered as it doesn’t need changing. prange with for loop was an alternative I also considered but the data is always one-dimensional.

Hannes · August 31, 2021, 10:07am

Glad it helps - one note of caution though: I am scratching my head right now, whether the mutability may affect the timings, since the data is only random on the first pass, after that all the functions are working on a dataset that’s already been processed…

steff · September 1, 2021, 2:36am

sure. conscious of this. its weird that using prange with parallel=True doesn’t work for me.

UnsupportedRewriteError: Failed in nopython mode pipeline (step: convert to parfors)
Overwrite of parallel loop index

File "../../../var/folders/xt/b71lsdgd3g1dhycp2fw80fy80000gn/T/ipykernel_92025/67808132.py", line 23:
<source missing, REPL/exec in use?>

Using a Mac M1. No biggie though. Speed is sufficient for research and in production the data size is limited anyway.

stuartarchibald · September 1, 2021, 9:36am

@steff which Numba version and Python version are you using to see that UnsupportedRewriteError? Thanks.

steff · September 1, 2021, 9:48am

Hi @stuartarchibald please see above mentioned

stuartarchibald · September 1, 2021, 11:10am

Thanks @steff

Is it the code in NumbaPendingDeprecationWarning reflected list - #4 by Hannes that’s triggering it? I can’t seem to reproduce locally.

steff · September 2, 2021, 12:10am

@stuartarchibald was my bad. works now. sorry to have wasted your time.

stuartarchibald · September 2, 2021, 10:56am

@steff no worries, many thanks for re-testing it and reporting back, much appreciated.

Topic		Replies	Views
Problem with Typed List function argument Community Support	1	689	October 12, 2023
Why numba decorator cause 10 times slower run Support: How do I do ...?	3	632	August 31, 2022
Best practices for using read-only Python lists Community Support	4	2044	January 12, 2022
Performance of typed.List outside of jit functions Community Support	20	730	June 5, 2024
typed.List(Tuple(..)) inside njitted func does not compile Community Support	5	1229	October 16, 2021

NumbaPendingDeprecationWarning reflected list

Related topics