I have this code that works just fine on one of my computers (MacBook Pro with python 3.7.5, numpy 1.19.4, and numba 0.52.0) but produces a different result on my other computer (Macbook Pro with python 3.8.2, numpy 1.20.1, and numba 0.52.0). With the former setup, this code produces a numpy-array with 8 elements, with the latter setup it produces and array with one element. The 8 element version is what I consider correct.
To make the situation even more strange, by just including a print-statement at the end of the function (commented out below) I get the code to produce the result I want on both computers. As far as I understand, that print statement should not override the @jit(nopython=True).
-Any pointers?
For the ones interested, the code is part of a metric for a type causal inference (uplift modeling).
import numpy as np
from numba import jit
@jit(nopython=True)
def _qini_points(data_class,
data_score,
data_group):
"""Auxiliary function for qini_coefficient(). Returns the
points on the qini-curve.
Args:
data_class (numpy.array([bool]))
data_score (numpy.array([float]))
data_group (numpy.array([bool])): True indicates that sample
belongs to the treatment-group.
"""
# Order data in descending order:
data_idx = np.argsort(data_score)[::-1]
data_class = data_class[data_idx]
data_score = data_score[data_idx]
data_group = data_group[data_idx]
# Set initial values for counters etc:
qini_points = []
# Normalization factor (N_t / N_c):
n_factor = np.sum(data_group) / np.sum(~data_group)
control_goals = 0
treatment_goals = 0
score_previous = np.finfo(np.float32).min
tmp_n_samples = 1 # Set to one to allow division in first iteration
tmp_treatment_goals = 0
tmp_control_goals = 0
for item_class, item_score, item_group in\
zip(data_class, data_score, data_group):
if score_previous != item_score:
# If we have a 'new score', handle the samples
# currently stored as counts...
for i in range(1, tmp_n_samples + 1):
# Turns out adding the zeroeth item is pointless.
# Oh, well... it does not affect a thing.
tmp_qini_point = (treatment_goals + i * tmp_treatment_goals /
tmp_n_samples) -\
(control_goals + i * tmp_control_goals /
tmp_n_samples) * n_factor
qini_points.append(tmp_qini_point)
# Add tmp items to vectors before resetting them
treatment_goals += tmp_treatment_goals
control_goals += tmp_control_goals
# Reset counters
tmp_n_samples = 0
tmp_treatment_goals = 0
tmp_control_goals = 0
score_previous = item_score
# Add item to counters:
tmp_n_samples += 1
tmp_treatment_goals += int(item_group) * item_class
tmp_control_goals += int(~item_group) * item_class
# Handle remaining samples:
for i in range(1, tmp_n_samples + 1):
tmp_qini_point = (treatment_goals + i * tmp_treatment_goals /
tmp_n_samples) -\
(control_goals + i * tmp_control_goals /
tmp_n_samples) * n_factor
qini_points.append(tmp_qini_point)
# Make list into np.array:
# Toggling the print function here will make the code work again.
# print(len(qini_points))
qini_points = np.array(qini_points)
return qini_points
# Test function:
data_class = np.array([True, False, False, True, True, False, True])
data_score = np.array([0.1, 0.2, 0.2, 0.2, 0.5, 0.6, 0.7])
data_group = np.array([True, False, True, False, False, True, True])
tmp = _qini_points(data_class, data_score, data_group)
print(tmp)