Seeing different behavior with python 3.8.2 vs. python 3.7.5

notto · March 11, 2021, 11:46am

I have this code that works just fine on one of my computers (MacBook Pro with python 3.7.5, numpy 1.19.4, and numba 0.52.0) but produces a different result on my other computer (Macbook Pro with python 3.8.2, numpy 1.20.1, and numba 0.52.0). With the former setup, this code produces a numpy-array with 8 elements, with the latter setup it produces and array with one element. The 8 element version is what I consider correct.
To make the situation even more strange, by just including a print-statement at the end of the function (commented out below) I get the code to produce the result I want on both computers. As far as I understand, that print statement should not override the @jit(nopython=True).

-Any pointers?

For the ones interested, the code is part of a metric for a type causal inference (uplift modeling).

import numpy as np
from numba import jit

@jit(nopython=True)
def _qini_points(data_class,
                 data_score,
                 data_group):
    """Auxiliary function for qini_coefficient(). Returns the
    points on the qini-curve.

    Args:
    data_class (numpy.array([bool]))
    data_score (numpy.array([float]))
    data_group (numpy.array([bool])): True indicates that sample
     belongs to the treatment-group.
    """
    # Order data in descending order:
    data_idx = np.argsort(data_score)[::-1]
    data_class = data_class[data_idx]
    data_score = data_score[data_idx]
    data_group = data_group[data_idx]

    # Set initial values for counters etc:
    qini_points = []
    # Normalization factor (N_t / N_c):
    n_factor = np.sum(data_group) / np.sum(~data_group)
    control_goals = 0
    treatment_goals = 0
    score_previous = np.finfo(np.float32).min
    tmp_n_samples = 1  # Set to one to allow division in first iteration
    tmp_treatment_goals = 0
    tmp_control_goals = 0
    for item_class, item_score, item_group in\
            zip(data_class, data_score, data_group):
        if score_previous != item_score:
            # If we have a 'new score', handle the samples
            # currently stored as counts...
            for i in range(1, tmp_n_samples + 1):
                # Turns out adding the zeroeth item is pointless.
                # Oh, well... it does not affect a thing.
                tmp_qini_point = (treatment_goals + i * tmp_treatment_goals /
                                  tmp_n_samples) -\
                    (control_goals + i * tmp_control_goals /
                     tmp_n_samples) * n_factor
                qini_points.append(tmp_qini_point)
            # Add tmp items to vectors before resetting them
            treatment_goals += tmp_treatment_goals
            control_goals += tmp_control_goals
            # Reset counters
            tmp_n_samples = 0
            tmp_treatment_goals = 0
            tmp_control_goals = 0
            score_previous = item_score
        # Add item to counters:
        tmp_n_samples += 1
        tmp_treatment_goals += int(item_group) * item_class
        tmp_control_goals += int(~item_group) * item_class

    # Handle remaining samples:
    for i in range(1, tmp_n_samples + 1):
        tmp_qini_point = (treatment_goals + i * tmp_treatment_goals /
                          tmp_n_samples) -\
            (control_goals + i * tmp_control_goals /
             tmp_n_samples) * n_factor
        qini_points.append(tmp_qini_point)

    # Make list into np.array:
    # Toggling the print function here will make the code work again.
    # print(len(qini_points))
    qini_points = np.array(qini_points)
    return qini_points

# Test function:
data_class = np.array([True, False, False, True, True, False, True])
data_score = np.array([0.1, 0.2, 0.2, 0.2, 0.5, 0.6, 0.7])
data_group = np.array([True, False, True, False, False, True, True])

tmp = _qini_points(data_class, data_score, data_group)
print(tmp)

esc · March 12, 2021, 8:01pm

@notto thank you for submitting this on the Numba discourse. We recently noticed some issues with Numba and Numpy 1.20 NumPy 1.20 numerical regressions · Issue #6812 · numba/numba · GitHub – can you try with Python 3.8.2 and numpy 1.19.4 to check if this might be Numpy or Python version related?

notto · March 13, 2021, 7:01am

I created a separate virtualenv with the same versions and packages (python 3.8.2, numba 0.52.0) except the numpy of which I installed the 1.19.4 version as you suggested. I still see the same erroneous behavior. I also tried the above with numba 0.53.0 and the problem still persists. So it seems the problem is python related.
-Hope this helps!

esc · March 13, 2021, 10:46am

@notto thank you for following up on this and thank you for also testing 0.53.0. This may very well be a bug. The next step will be to condense the example to what is known as a “minimum reproducer”, i.e. a as-short-as-possible snippet, without all the domain specific computations, to trigger only this behaviour. If you have the time and inclination, please do feel free to attempt this. Otherwise, one of our developers will probably take a closer look next week.

notto · March 15, 2021, 10:26am

Allright. I think I got it:

import numpy as np
from numba import jit

@jit(nopython=True)
def test():
    tmp = []
    for i in range(3):
        tmp.append(i)
    for i in range(1):
        tmp.append(i)
    tmp = np.array(tmp)
    return tmp


tmp = test()
print(tmp)

Commenting out and uncommenting the line “@jit(nopython=True)” results in different behavior. It seems that with Numba, the second for-loop is emptying the tmp-variable and just creating a list with one item rather than appending the item to the existing list. This now with Python 3.8.2, NumPy 1.20.1, and Numba 0.52.0. And to be precise, I am not seeing this with Python 3.7.5.

I am thinking this is a bug.
-If you agree and decide to fix it, how do I get the news of this issue having been resolved?

Thank you for the pointers!

esc · March 15, 2021, 10:56am

@notto excellent! Great work! I can confirm your findings and that the reproducer works.

On Python 3.7 this outputs:

[0 1 2 0]

Whereas on 3.8 it outputs:

[0]

Thank you for finding this and isolating it. The next step will be to transfer this to the Numba issue tracker where it will receive further scrutiny and be labeled as a bug. This is also the ticket which you can then watch and follow to be notified once the issue is resolved. Obviously the workaround is simple in the reproducer case, but I would tend to argue that this is something that should be fixed.

esc · March 15, 2021, 10:59am

Imported to Github issue tracker here: reflected list behaves differently on Python 3.7 and 3.8 · Issue #6825 · numba/numba · GitHub

notto · March 15, 2021, 12:50pm

Thanks for the link!
This is a structure I would often use to deal with remaining samples in some temporary variables after exiting some loop, so I would hope that this gets fixed. But that is up to you.
-Good luck!

esc · March 15, 2021, 2:54pm

@notto you could try using numba.typed.List as an alternative if you require this specific pattern:

https://numba.readthedocs.io/en/stable/reference/pysupported.html#typed-list

Topic		Replies	Views
Helping test Numba 0.53.0 RC Announcements	36	2399	March 11, 2021
Strange numba behaviour Community Support	4	503	June 23, 2021
List() with NUMBA_DISABLE_JIT=0 Community Support	1	500	May 27, 2021
TypingError in nopython with np.concatenate within QGIS Support: What is this error message?	7	746	June 2, 2021
Numba JIT becoming slower with List Support: How do I do ...?	3	560	April 13, 2022

Seeing different behavior with python 3.8.2 vs. python 3.7.5

Related topics