Pearson correlation using Numba?

I have a long dict of (str, np.array) pairs like the following

es = {'A': np.array([0, 0, 1, 1, 0, 1]), 'B': np.array([1, 0, 1, 0, 0, 1])}

I want to create a kernel function that takes two arrays of float64 and returns float64, here’s how the code I am using

from itertools import permutation
import numpy as np
from numba import vectorize

// es is a very long dictionary, and all array values are of the same length 1778 consist of zeros and ones.

es = {'A': np.array([0, 0, 1, 1, 0, 1]), 'B': np.array([1, 0, 1, 0, 0, 1])}

@vectorize(['float64(float64, float64)'], target='cuda')
def pearson_cor(e1, e2):
    return np.corrcoef(e1, e2)[0,1]	

result = {f"{e1}, {e2}": pearson_cor(es[e1], es[e2]) for e1, e2 in permutation(es.keys(), 2)}

I understand that this is not ideal because the last line will call pearson_cor sequentially. However, this is what I get when trying to run this code (in jupyter lab on windows 10):

LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
No definition for lowering <built-in method np_corrcoef_impl of _dynfunc._Closure object at 0x000002876AD18828>(float64, float64, omitted(default=True)) -> array(float64, 2d, C)

File "<ipython-input-67-0513ea800c4c>", line 4:
def pearson_cor(ec1, ec2):
    <source elided>
    return np.corrcoef(ec1, ec2)[0,1]
    ^

During: lowering "$10call_method.4 = call $4load_method.1(ec1, ec2, func=$4load_method.1, args=[Var(ec1, <ipython-input-67-0513ea800c4c>:4), Var(ec2, <ipython-input-67-0513ea800c4c>:4)], kws=(), vararg=None)" at <ipython-input-67-0513ea800c4c> (4)

How can this be fixed and (hopefully) improved?

hi!

I cannot test with a GPU, but the following works on a CPU

from numba import njit

@njit
def pearson_cor(e1, e2):
    return np.corrcoef(e1, e2)[0,1]

If I had to guess, vectorize is not the right decorator, because correlation is not an operation that it’s defined on scalars, which the decorator can upscale to vectors. It’s an operation that it’s defined naturally on vectors.

1 Like

Hi @7kemZmani,

In general NumPy is largely unsupported on the GPU targets, details for why and which bits are supported for CUDA are here: http://numba.pydata.org/numba-doc/latest/cuda/cudapysupported.html#numpy-support.

I think @luk-f-a 's comment about @vectorize is correct, np.corrcoef would be better written using the @guvectorize decorator as it can operate on more than one element at a time. Docs: http://numba.pydata.org/numba-doc/latest/user/vectorize.html#the-guvectorize-decorator

You may also find cuPy does what you need? https://docs-cupy.chainer.org/en/stable/reference/generated/cupy.corrcoef.html#cupy.corrcoef

Hope this helps?

1 Like