Pearson correlation using Numba?

7kemZmani · July 4, 2020, 8:27pm

I have a long dict of (str, np.array) pairs like the following

es = {'A': np.array([0, 0, 1, 1, 0, 1]), 'B': np.array([1, 0, 1, 0, 0, 1])}

I want to create a kernel function that takes two arrays of float64 and returns float64, here’s how the code I am using

from itertools import permutation
import numpy as np
from numba import vectorize

// es is a very long dictionary, and all array values are of the same length 1778 consist of zeros and ones.

es = {'A': np.array([0, 0, 1, 1, 0, 1]), 'B': np.array([1, 0, 1, 0, 0, 1])}

@vectorize(['float64(float64, float64)'], target='cuda')
def pearson_cor(e1, e2):
    return np.corrcoef(e1, e2)[0,1]	

result = {f"{e1}, {e2}": pearson_cor(es[e1], es[e2]) for e1, e2 in permutation(es.keys(), 2)}

I understand that this is not ideal because the last line will call pearson_cor sequentially. However, this is what I get when trying to run this code (in jupyter lab on windows 10):

LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
No definition for lowering <built-in method np_corrcoef_impl of _dynfunc._Closure object at 0x000002876AD18828>(float64, float64, omitted(default=True)) -> array(float64, 2d, C)

File "<ipython-input-67-0513ea800c4c>", line 4:
def pearson_cor(ec1, ec2):
    <source elided>
    return np.corrcoef(ec1, ec2)[0,1]
    ^

During: lowering "$10call_method.4 = call $4load_method.1(ec1, ec2, func=$4load_method.1, args=[Var(ec1, <ipython-input-67-0513ea800c4c>:4), Var(ec2, <ipython-input-67-0513ea800c4c>:4)], kws=(), vararg=None)" at <ipython-input-67-0513ea800c4c> (4)

How can this be fixed and (hopefully) improved?

luk-f-a · July 4, 2020, 8:41pm

hi!

I cannot test with a GPU, but the following works on a CPU

from numba import njit

@njit
def pearson_cor(e1, e2):
    return np.corrcoef(e1, e2)[0,1]

If I had to guess, vectorize is not the right decorator, because correlation is not an operation that it’s defined on scalars, which the decorator can upscale to vectors. It’s an operation that it’s defined naturally on vectors.

stuartarchibald · July 6, 2020, 9:14am

Hi @7kemZmani,

In general NumPy is largely unsupported on the GPU targets, details for why and which bits are supported for CUDA are here: http://numba.pydata.org/numba-doc/latest/cuda/cudapysupported.html#numpy-support.

I think @luk-f-a 's comment about @vectorize is correct, np.corrcoef would be better written using the @guvectorize decorator as it can operate on more than one element at a time. Docs: http://numba.pydata.org/numba-doc/latest/user/vectorize.html#the-guvectorize-decorator

You may also find cuPy does what you need? https://docs-cupy.chainer.org/en/stable/reference/generated/cupy.corrcoef.html#cupy.corrcoef

Hope this helps?

Topic		Replies	Views
How to use np.sum with multiple arrays and axis? Support: How do I do ...?	0	434	March 14, 2023
Numba crashing IPython kernel/python interpreter Community Support	3	1063	July 19, 2021
Avoid multiple copies of large numpy array in closure? Community Support	2	299	July 23, 2023
Issue 3164 still not working Support: How do I do ...?	0	303	December 3, 2020
Lexsort-like operation using Numba Cuda? Community Support	0	45	April 2, 2024

Pearson correlation using Numba?

Related Topics